Developers Rejoice: Gemini 1.5 Pro Expands to a 2M Token Context Window

In education and workspace news this week, Google has unveiled several updates for developers, including a significant upgrade for Gemini 1.5 Pro, which now supports a 2 million token context window.

At Google I/O 2024, Google introduced Gemini 1.5 Pro, featuring an impressive 2 million token context window. This allows the model to process up to 2 hours of video, 22 hours of audio, over 60,000 lines of code, and more than 1.4 million words. Currently, Gemini Advanced with 1.5 Pro offers half of this capacity. Following a private preview, this powerful tool is now accessible to all developers.

Given that just six minutes of video requires over 100,000 tokens and large codebases can exceed 1 million tokens, Gemini 1.5 Pro is designed to handle extensive tasks. This includes bug detection in massive codebases, finding information across vast research libraries, or analyzing lengthy audio and video content, helping organizations achieve new milestones.

Already, Gemini 1.5 Pro is being utilized by a variety of organizations, including a fast food chain, a financial institution, an insurance company, and a sports company for analyzing player swings.

In addition, Gemini 1.5 Flash is now generally available. It boasts a 1 million token context window, low latency, and competitive pricing. It is ideal for retail chat agents, document processing, and research agents that need to synthesize large repositories of information.

Google compares Gemini 1.5 Flash to GPT-3.5 Turbo, highlighting its advantages:

- A 1 million token context window, about 60 times larger than GPT-3.5 Turbo's

- On average, 40% faster processing for 10,000 character inputs

- Up to 4 times lower input costs with context caching for inputs over 32,000 characters

Furthermore, Google announced the global availability of Gemma 2 in 9 billion and 27 billion parameter sizes.

Simultaneously, Imagen 3 is launching in preview for Vertex AI customers with early access. Compared to Imagen 2, it offers:

- Over 40% faster generation for rapid prototyping and iteration

- Improved prompt understanding and instruction-following

- Photo-realistic generation of groups of people

- Enhanced control over text rendering within images

For instance, Imagen 3 can generate a photorealistic image from the prompt: “a photorealistic image of a woman’s hand reaching up to touch a dandelion seed head, a field of dandelions stretching to the horizon, with the phrase ‘Sometimes letting go is the bravest act’ written in delicate cursive above the hand.”