Big Tech’s AI Models Struggle to Meet New EU AI Rules
Since the new EU AI Act came into effect in August, some of the most prominent AI models from Big Tech have struggled to meet the required standards, according to a recent evaluation.
LatticeFlow AI, ETH Zurich, and INSAIT developed a framework to test whether generative AI models comply with the new regulations. Based on data reviewed by Reuters, several leading models, including those from companies like OpenAI, Meta, Anthropic, and AliBaba, fell short in critical areas of the new rules.
The Swiss startup's tool, called the Large Language Model (LLM) Checker, evaluated these AI systems across various categories, assigning them scores between 0 and 1. This early evaluation sheds light on the gaps AI models need to close to fully comply with the EU AI Act, which will be implemented over the next two years. Companies that fail to follow these rules could face fines as high as €35 million or 7% of their annual global revenue.
Reuters reported that the LatticeFlow results showed an average score of 0.75 or higher across all tested models. However, this still highlights areas that need improvement. The most noticeable weaknesses were in preventing discriminatory outputs and ensuring cybersecurity.
For instance, OpenAI's GPT 3.5 Turbo scored 0.46 and AliBaba’s Qwen1.5 72B Chat scored 0.37 on discrimination tests. On the cybersecurity front, Meta’s Llama 2 13B Chat scored 0.42, and Mistral’s 8x7B Instruct scored 0.38, exposing vulnerabilities in these areas.
The EU sees this evaluation tool as the first step toward ensuring that AI models comply with upcoming regulations. This means tech giants must act quickly to address these issues if they want to avoid hefty fines and build AI that meets the high standards of safety and fairness set by the EU.