OpenAI introduces the GPT-4o-Omni model that now powers ChatGPT

June 29, 2024

OpenAI on Monday announced a new flagship generative AI model it calls GPT-4o — the “o” stands for “omni” and refers to the model’s ability to process text, speech and video. GPT-4o will be rolled out “iteratively” across the company’s developer and consumer products over the next few weeks.

Mira Murati, CTO of OpenAI, said GPT-4o offers “GPT-4 level intelligence” but improves on GPT-4’s capabilities in multiple modalities and media.

“GPT-4o reasons about language, text and image,” Murati said during a streamed presentation at OpenAI’s offices in San Francisco on Monday. “And that’s incredibly important because we’re looking at the future of interaction between us and machines.”

GPT-4 Turbo, OpenAI’s “leading and most advanced” model to date, was trained on a combination of images and text, and could analyze images and text to perform tasks like extracting text from images or even describing the contents of those images. But GPT-4o adds language to the mix.

What does this enable? A variety of things.

GPT-4o significantly improves the experience of using OpenAI’s AI-powered chatbot ChatGPT. The platform has long offered a speech mode that transcribes the chatbot’s responses using a text-to-speech model, but GPT-4o extends this mode even further, allowing users to interact with ChatGPT more like an assistant.

For example, users can ask ChatGPT, powered by GPT-4o, a question and interrupt ChatGPT as it responds. The model delivers “real-time” responsiveness, says OpenAI, and can even perceive nuances in a user’s voice and generate voices in “a range of different emotional styles” (including singing) in response.

GPT-4o also improves ChatGPT’s image processing capabilities. Given a photo – or a desktop screen – ChatGPT can now quickly answer related questions, from “What’s happening in this software code?” to “What brand of shirt is this person wearing?”

The ChatGPT desktop app in use for a coding task.

These features will evolve in the future, Murati says. While today GPT-4o can display and translate an image of a menu in another language, in the future the model could allow ChatGPT to, for example, “watch” a live sports game and explain the rules to you.

“We know that these models are getting more and more complex, but we want the interaction experience to actually be more natural and simple and not have you focus on the UI at all, just focus on collaborating with ChatGPT,” Murati said. “Over the last few years, we’ve been very focused on improving the intelligence of these models… But this is the first time we’re really taking a big step forward in terms of usability.”

According to OpenAI, GPT-4o is also more multilingual, offering improved performance in around 50 languages. And in OpenAI’s API and Microsoft’s Azure OpenAI Service, GPT-4o is twice as fast, half as expensive, and has higher rate limits than GPT-4 Turbo, the company said.

Currently, the voice feature is not part of the GPT-4o API for all customers. OpenAI points out the risk of misuse and plans to initially offer support for GPT-4o’s new audio features to “a small group of trusted partners” in the coming weeks.

GPT-4o is available starting today in ChatGPT’s free tier and for subscribers to OpenAI’s premium ChatGPT Plus and Team plans, with “5x higher” message limits. (OpenAI notes that ChatGPT automatically switches to GPT-3.5, an older and less powerful model, when users reach the rate limit.) The improved ChatGPT voice experience, based on GPT-4o, will be available in alpha for Plus users in about a month, along with enterprise-focused options.

In related news, OpenAI announced it is releasing an updated ChatGPT UI on the web with a new, more “conversational” home screen and message layout, as well as a desktop version of ChatGPT for macOS that allows users to ask questions via a keyboard shortcut or take and discuss screenshots. ChatGPT Plus users will get access to the app first starting today, and a Windows version will arrive later in the year.

In addition, the GPT Store, OpenAI’s library and creation tools for third-party chatbots based on its AI models, is now available to users of ChatGPT’s free tier. And free users can use ChatGPT features that were previously paid, such as a save feature that allows ChatGPT to “remember” settings for future interactions, upload files and photos, and search the web for answers to current questions.

We’re launching an AI newsletter! Sign up here to have it in your inbox starting June 5th.