close
close

AI generation voice interfaces promise new technologies despite implementation difficulties

AI generation voice interfaces promise new technologies despite implementation difficulties

New Delhi: In November last year, little-known Silicon Valley startup Humane unveiled a wearable device that had no displays but could use voice to perform all the tasks a traditional smartphone is used for. Two months later, equally up-and-coming Rabbit, in partnership with Swedish tech company Teenage Engineering, unveiled r1 – a device that was similar in context and operation to Humane’s ‘AI Pin’ wearable. Their similarity – both used large language models (LLMs) as their underlying technology platform to provide users with an entirely new user interface in partnership with ChatGPT maker OpenAI.

This new interface relied primarily on voice for interaction, promising users a seamless world where they could simply talk to their devices, rather than having to tap multiple screens and use a complex set of applications to perform basic tasks like ordering lunch, hailing a cab, or sending an email.

Industry experts believe that while voice as an interface has yet to become mainstream, the introduction of generative AI and natively running multimodal models could change that. In that sense, products like Humane and Rabbit are early examples of likely next-generation consumer hardware.

A multimodal native AI model runs locally on a device and does not require an internet connection to access its database on cloud platforms. This makes the AI ​​model more accessible and computable from any device, and multimodality allows it to read text, images, videos and voice media and offer results. Following recent announcements from Apple and Google, such models are now coming to smartphones. Soon, laptops certified as “AI PCs” under Microsoft’s Copilot Plus range will also feature generative voice interfaces on personal computers.

Apple, for example, has redesigned its digital assistant Siri to better understand personal context and also remember conversations – making voice usage more pleasant overall. Google Assistant, which is based on its Gemini LLM, can perform similar functions on Android smartphones that natively support its AI models.

New gadget ecosystems

Industry representatives believe this move can lead to new gadget ecosystems and product styles. Kashyap Kompella, AI industry analyst and founder of consultancy RPA2AI Research, said the rise of generative AI voice interfaces could play a role in commercially available robots. “The rise of commercial robots that you can speak to in natural language is an area that is likely to develop in the next decade. Enterprise robots are likely to develop first, followed by robots for accessibility in the home that could equip generative AI models with voice,” Kompella said.

Others believe that while voice interfaces may grow thanks to multimodal AI running locally on devices, they will be part of a broader, more complex user interface. Tuong Nguyen, principal analyst at Gartner, said that while voice interfaces “will grow in usefulness and popularity, the bigger challenge is multimodal and contextual interfaces – that is, voice alongside natural language understanding combined with image analysis.”

For many companies, language is a way to stitch interfaces together into a seamless ecosystem. At Apple’s Worldwide Developer Conference on June 10, the company showcased AI as one of its key features, the ability to integrate features and solutions across different applications. A senior executive familiar with the iPhone maker’s latest AI suite said mint on condition of anonymity that voice interaction via Siri works seamlessly across all three of Apple’s main product categories – iPhones, iPads and the Mac line of desktop and laptop PCs.

“In fact, Apple’s AI capabilities are designed to create a seamless user experience, especially with voice, for the most important products users buy from the brand. Having underlying AI models with on-device processing can establish this as the new norm across more brands,” the executive said.

Tarun Pathak, director at market research firm Counterpoint India, added that developing product ecosystems could be a key aspect of voice-based generative AI interfaces. “As voice interfaces work seamlessly across devices, more brands may consider developing their own product ecosystems. This could also lead to innovations in form factors. Early examples of this include Samsung’s efforts to design wearables to control every user function,” he said.

An email to a Samsung spokesperson about its plans for the ecosystem and voice AI remained unanswered at press time. In January, the company unveiled its Galaxy S24 line of flagship smartphones with native AI features – including the Bixby voice assistant. Samsung is expected to unveil more new hardware with native AI applications next month.

On June 10, Muralikrishnan B, President of Xiaomi India, said mint in an interview that the company’s key product strategy for the next year in India is to build a broader ecosystem of products beyond smartphones – including smart home appliances, audio products, wearables and more. One of the key aspects of Xiaomi’s ecosystem push is interoperability – a factor that can be improved by integrating AI across product categories.

Further innovations

Counterpoint India’s Pathak said more forms of devices could come to market in the next four years, such as wearable headsets, smarter wrist devices and more. “Voice with generative AI has the chance to actually replace multiple taps on a display, which is its biggest strength and the reason for its adoption,” he added.

Gartner’s Nguyen said: “Voice is not a panacea for a device. Future devices such as head-mounted displays will extend multimodal interfaces to include other aspects such as gesture recognition, motion tracking, eye tracking, sentiment analysis and more.”

But many others have also sounded a note of caution. Kompella said a key concern is that voice interfaces have not yet taken off. “Voice as a technology has shown great promise over the past two decades. However, adoption has remained limited, even though companies like Amazon once sold over 200 million smart speakers worldwide with its digital assistant Alexa. The challenge is understanding whether voice is a product or a feature – and how brands can monetize it,” he said.

“If speech is still not monetizable with generative AI, product innovation will not advance at the same rate. There are specific use cases, such as medical transcription, that could see the emergence of dedicated applications of speech-based generative AI. However, whether consumer hardware will finally see a disruption is still an unanswered question,” Kompella added.

360 million Indians visited us in a single day and voted us as India’s uncontested platform for the general election results. Discover the latest updates Here!