Microsoft develops incredibly realistic AI voice generator, but keeps it secret

July 17, 2024

In a move that highlights growing ethical concerns surrounding advanced AI, Microsoft has developed a remarkably realistic text-to-speech system called VALL-E 2, but has chosen to keep it secret due to potential misuse.

While advances in AI are often associated with spectacular releases and wide availability, they are increasingly forcing tech giants to tread carefully. Microsoft’s latest innovation, VALL-E 2, is a prime example of this trend. This AI wonder can mimic human speech with astonishing accuracy using just a few seconds of audio, representing a significant leap in text-to-speech (TTS) technology.

“VALL-E 2 is the first speech AI that reaches human levels in terms of speech robustness, naturalness and speaker similarity,” the Microsoft researchers proudly announce. This “human parity” means that AI-generated speech is almost indistinguishable from the voice of a real person.

So what makes VALL-E 2 so credible?

Two key features contribute to its realism. Repetition Aware Sampling allows the AI to avoid the monotonous repetition often found in TTS systems by intelligently addressing repeated words or syllables, making the speech flow more natural. Second, Grouped Code Modeling increases efficiency by processing shorter sound sequences, speeding up speech generation, and handling long, complex audio sequences.

Fears of abuse overshadow the potential.

Despite its enormous potential in education, entertainment, accessibility and more, Microsoft has chosen to keep tight controls on VALL-E 2. The company has expressed concerns about potential misuse, particularly regarding voice recognition spoofing and convincing identity fraud.

“VALL-E 2 is purely a research project. We currently have no plans to integrate VALL-E 2 into a product or expand access to the public,” the researchers explain, echoing similar limitations that other AI companies like OpenAI impose on their language technology.

Despite these concerns, Microsoft remains optimistic about the future of AI speech technology. Researchers envision safe and ethical applications where the identity of the speaker is preserved in synthetic speech with appropriate consent and robust recognition mechanisms.

This groundbreaking research, detailed in a preprint, offers a glimpse into the future of AI while raising critical questions about its responsible development and deployment.

Tags:AI, AI Security, artificial intelligence, Deep Learning, Ethics, Microsoft, Speech Synthesis, Speech Technology, Text-to-Speech, TTS, VALL-E 2, Voice Cloning

Rihondo

Microsoft develops incredibly realistic AI voice generator, but keeps it secret

About The Author

vergie

Related Posts

About The Author

vergie