ChatGPT Brings Real-Time Video Capabilities to Life: A Game-Changer for AI Conversations

After months of anticipation, OpenAI has finally introduced real-time video capabilities to ChatGPT, a feature it first teased nearly seven months ago. This exciting update brings a whole new level of interactivity, allowing users to engage with the AI in more dynamic ways.

A New Era for Advanced Voice Mode

On Thursday, during a live demonstration, OpenAI revealed that its Advanced Voice Mode, a feature already known for its human-like conversational abilities, now comes equipped with vision. With this upgrade, users subscribed to ChatGPT Plus, Team, or Pro can simply point their phones at objects, and ChatGPT will respond almost instantly, interpreting what it sees.

Imagine pointing your phone at a plant and having ChatGPT identify it or explain its care requirements. This feature feels like having a knowledgeable friend in your pocket, ready to assist in real-time.

Enhanced Screen Understanding

Advanced Voice Mode with vision doesn’t stop at real-world objects—it can also “see” what’s on your device’s screen. Through screen sharing, ChatGPT can help navigate complex settings or even provide step-by-step solutions to tricky math problems. It’s like having a tech-savvy companion who never gets impatient.

To use these features, accessing them is straightforward:

Tap the voice icon next to the chat bar.
Tap the video icon at the bottom left to start video mode.
For screen sharing, tap the three-dot menu and select “Share Screen.”

A Gradual Rollout

The rollout begins Thursday and is expected to conclude within a week. However, not everyone will have immediate access. OpenAI announced that subscribers to ChatGPT Enterprise and Edu will need to wait until January, while users in the EU, Switzerland, Iceland, Norway, and Liechtenstein currently have no timeline for availability.

A Peek into the Future

In a recent demo on CNN’s “60 Minutes,” OpenAI President Greg Brockman showcased the capabilities of Advanced Voice Mode with vision. Anderson Cooper tested the feature by drawing body parts on a blackboard. Impressively, ChatGPT recognized his sketches.

OpenAI employees demo ChatGPT’s Advanced Voice Mode with vision during a livestream.

“The location is spot on,” ChatGPT commented when Cooper drew a brain. “The brain is right there in the head. As for the shape, it’s a good start. The brain is more of an oval.”

However, the AI isn’t flawless. During the same demonstration, it made an error solving a geometry problem, showing it can still “hallucinate” occasionally.

A Bumpy Journey to Launch

The journey to this milestone hasn’t been smooth. OpenAI initially promised Advanced Voice Mode would roll out in April but faced delays, reportedly due to premature announcements before the feature was ready. Even when it debuted earlier this fall, the visual analysis component was missing. OpenAI has since worked diligently to refine the experience and expand its availability.

Competition Heats Up

OpenAI isn’t the only player in the game. Rivals like Google and Meta are developing similar capabilities. This week, Google revealed Project Astra, a real-time video-analyzing AI, to select testers on Android.

A Touch of Festivity

Alongside this major update, OpenAI also launched a playful feature called Santa Mode. This festive addition lets users enjoy ChatGPT with Santa’s voice, accessible through a snowflake icon in the app.

The Human Touch in AI

With these advancements, ChatGPT is becoming more than just a chatbot; it’s a tool that feels increasingly human. From helping with daily tasks to adding a bit of seasonal cheer, it’s clear OpenAI is striving to make AI not only smarter but also more relatable.

As AI continues to evolve, features like these remind us of the emotional connection technology can foster—bridging the gap between machines and the people they’re designed to help.