models Major

OpenAI Launches GPT-4o

Summary

OpenAI released GPT-4o ("omni"), a natively multimodal model that could process and generate text, audio, and images with dramatically reduced latency, making real-time voice conversation with an AI feel natural for the first time. The model was made available free to all ChatGPT users, marking a significant shift in access strategy.

What Happened

On May 13, 2024, OpenAI launched GPT-4o at a Spring Update event. The "o" stood for "omni," reflecting the model's native ability to process and reason across text, audio, and vision simultaneously — rather than piping different modalities through separate models.

The most dramatic improvement was in voice interaction: GPT-4o could respond to audio input in as little as 232 milliseconds, approaching human conversational speed. The model could detect and respond to emotional tone, handle interruptions, and produce expressive speech output including laughter, singing, and dramatic delivery.

OpenAI made GPT-4o available to all ChatGPT users, including those on the free tier — a significant departure from the previous approach of reserving the most capable models for paying subscribers. The API was priced at half the cost of GPT-4 Turbo, making it both the most capable and the cheapest flagship model OpenAI had offered.

The live demo showcased the model tutoring a student in math via camera, translating conversations in real-time, and narrating scenes from a phone's camera feed. The voice capabilities were demonstrated by OpenAI researchers having a natural, flowing conversation with the model.

Why It Matters

GPT-4o represented a meaningful shift in what AI interaction could feel like. Previous AI assistants had perceptible latency, mechanical voices, and limited ability to handle the nuance of real conversation. GPT-4o's near-instant responses and expressive voice brought AI interaction closer to the science fiction ideal — and closer to replacing human interactions in customer service, tutoring, and companionship applications.

The decision to make GPT-4o free for all users was a competitive move that changed the market dynamic: previously, OpenAI's best models were behind a paywall, giving free-tier alternatives (like Google's Gemini in Search) a chance to capture users. By making GPT-4o free, OpenAI aimed to maintain its user base against growing competition.

The voice capabilities would generate controversy just days later, when comparisons to Scarlett Johansson's character in the film "Her" triggered a legal dispute over one of the demo voices.

Tags

#multimodal #real-time-voice #frontier-model #omni-model