close
close

OpenAI delays enhanced voice feature for ChatGPT, which some people have compared to the movie Her

OpenAI delays enhanced voice feature for ChatGPT, which some people have compared to the movie Her

The artificial intelligence (AI) startup introduced the voice option at a product launch event in May for GPT-4o, an updated version of its GPT-4 model that can better process text, audio and images in real time.

In a statement, OpenAI said it had originally planned to make the voice feature available to a small group of paying ChatGPT Plus subscribers in late June, but concluded that another month was needed to “achieve the minimum requirement for launch.”

“We are improving the model’s ability to detect and reject certain content,” the company said Tuesday. “We are also working to improve the user experience and prepare our infrastructure to scale to millions of users while maintaining real-time response.”

The delay represents a potential setback for OpenAI, which is trying to stay ahead in an increasingly crowded field of AI competitors. The company had introduced a more limited option for ChatGPT to talk to users last year, but the new feature promised to be faster and paired with powerful image recognition capabilities to make the chatbot a far more useful and dynamic conversationalist.

At the launch event, OpenAI staff took to the stage to demonstrate how ChatGPT responds almost instantly to requests, such as solving a math problem on a piece of paper held in front of a researcher’s smartphone camera.

Some viewers compared the tool to the virtual AI assistant in the 2013 film Hervoiced by Scarlett Johansson. The actress later requested the removal of one of the ChatGPT voices for sounds too much like her.

On Tuesday, OpenAI said it plans to roll out the voice feature to all paying subscribers in the fall. OpenAI said it is “also working” on releasing video and screen sharing features, which the company demoed during its May event. The company said it will update users in the future on when those features will roll out.

As a result, when the voice option becomes available to select paying users next month, its capabilities will likely be more limited than they were at the event. For example, the chatbot will not have access to a computer vision feature that would allow it to provide spoken feedback on a user’s dance moves simply by using the smartphone’s camera.