Google unveils Gemini 3.1 Flash-Lite, its fastest Gemini 3 model so far
Just two weeks after introducing Gemini 3.1 Pro — its most powerful AI model yet — Google is back with something new. On Tuesday, the company announced Gemini 3.1 Flash-Lite, now the quickest model in the Gemini 3 lineup. And while speed is clearly the headline here, the price is turning heads too. At $0.25 per million input tokens and $1.50 per million output tokens, it’s the most budget-friendly Gemini 3 model Google has released so far.
This model isn’t built to “run the show” or coordinate complex AI agents. Instead, it’s designed for heavy-duty, high-volume tasks — the kind developers deal with every day, like processing massive datasets or handling large-scale translation work. It’s currently available in preview through the Gemini API in Google AI Studio and Vertex AI. Regular users won’t find it inside the Gemini consumer app, which may disappoint some curious fans eager to try it out.
Benchmarks
Compared to its predecessor, Gemini 2.5 Flash-Lite, the new version costs more (the older model started at $0.10/$0.40 per million tokens). But that extra cost brings a noticeable jump in performance. It’s one of those moments where you can almost feel Google saying, “Yes, it’s pricier — but look what it can do.”
In benchmark tests, Gemini 3.1 Flash-Lite generally performs better than Gemini 2.5 Flash — a model many developers still rely on — and it does so at a lower price point than some of its direct competitors.
When stacked up against models like GPT-5 mini and Claude 4.5 Haiku, the story is similar: Google’s new model holds its ground and often comes out ahead. Grok 4.1 Fast may be cheaper, but it lags behind in speed. Google says Gemini 3.1 Flash-Lite can generate up to 363 tokens per second. That’s technically three tokens per second slower than 2.5 Flash-Lite — a tiny dip that most people won’t even notice — yet it’s still two to five times faster than competing models. For developers working at scale, that kind of speed can genuinely feel like a relief.
Like other Gemini models, Flash-Lite also shines in multimodal tasks, meaning it handles combinations of text, images, and other data types especially well. On the Arena.ai Leaderboard, it scored 1432 Elo points — putting it alongside many respected open-weight models and older commercial systems. For Google, that’s a quiet but confident flex.
Interestingly, Google didn’t share any agent-specific benchmark results this time. That seems intentional. Flash-Lite isn’t meant to manage teams of AI agents or orchestrate complex decision-making systems. It’s built for volume — repetitive, large-scale tasks that need to be done quickly and efficiently.
Developers also get control over how much “thinking time” the model uses. Through the API, they can adjust the reasoning level for each task. This matters because more reasoning usually means more tokens — and more tokens mean higher costs. For companies running huge workloads, that flexibility could make a real financial difference. It’s the kind of feature that feels practical rather than flashy.
What’s especially surprising is that Google launched the Flash-Lite version first this time. In the past, the company typically introduced a more powerful (and more expensive) Flash model before releasing the lighter version — or skipped Flash-Lite altogether. This shift suggests Google may be paying closer attention to developers who care deeply about speed, scale, and cost.
All in all, Gemini 3.1 Flash-Lite doesn’t try to be the most dramatic or headline-grabbing AI model. Instead, it feels focused and purposeful — built for people who just need things to work fast, reliably, and without breaking the bank. And in today’s AI race, that kind of practicality might be exactly what many developers have been hoping for.

