close
close

Microsoft AI can now clone voices to make them sound perfectly “human” in seconds – but it’s too dangerous to make it public

Microsoft AI can now clone voices to make them sound perfectly “human” in seconds – but it’s too dangerous to make it public

MICROSOFT has developed an artificial intelligence tool that can replicate human language with uncanny precision.

It is so compelling that the tech giant refuses to make it available to the public, citing “potential risks” of misuse.

Microsoft's research subsidiary has developed an AI text-to-speech generator that can reproduce human voices with uncanny accuracy

3

Microsoft’s research subsidiary has developed an AI text-to-speech generator that can reproduce human voices with uncanny accuracyPhoto credit: Getty

The tool, called VALL-E 2, is a text-to-speech generator that can imitate a voice based on just a few seconds of audio material.

In a scenario called zero-shot learning, it is trained to recognize concepts without first being given examples of those concepts.

The tech giant says VALL-E 2 is the first of its kind to achieve “human parity,” meaning it meets or exceeds standards for human similarity.

It is the successor to the original VALL-E system, announced in January 2023.

According to developers at Microsoft Research, VALL-E 2 can “reproduce precise, natural speech in the exact voice of the original speaker, comparable to human performance.”

It can synthesize short phrases as well as complex sentences.

The tool uses two characteristics called Repetition Aware Sampling and Grouped Code Modeling.

Repetition Aware Sampling addresses the pitfalls of repeating tokens, or the smallest units of data a language model can process – represented here by words or parts of words.

It prevents the repetition of sounds or phrases during the decoding process, helping to make the system’s speech sound more varied and natural.

Grouped code modeling limits the number of tokens the model processes at once to produce faster results.

Microsoft AI can now clone voices to make them sound perfectly “human” in seconds – but it’s too dangerous to make it public

The researchers compared VALL-E 2 with audio samples from LibriSpeech and VCTK, two English-language databases.

They also used ELLA-V, an evaluation framework for zero-shot text-to-speech synthesis, to determine how well VALL-E performed more complex tasks.

According to a document published on June 17 summarizing the results, the system ultimately outperformed its competitors “in terms of language robustness, naturalness and speaker similarity.”

The system, called VALL-E 2, will not be released to the public because "potential risks of misuse of the model" including voice spoofing and targeted identity theft

3

The system, called VALL-E 2, will not be released to the public because there are “potential risks of misuse of the model,” including voice spoofing and deliberate imitation.Photo credit: Getty

Microsoft claims that VALL-E 2 will not be made available to the public anytime soon because it is a “pure research project.”

“We currently have no plans to incorporate VALL-E 2 into a product or expand access to the public,” the company wrote on its website.

“Misuse of the model may pose potential risks, such as faking voice recognition or imitating a specific speaker.”

The tech giant points out that suspected misuse of the tool can be reported via an online portal.

And Microsoft’s concerns are well founded. This year alone, cybersecurity experts have seen an explosion in malicious actors’ use of AI tools, including those that mimic speech.

Microsoft has come under fire for its introduction of artificial intelligence tools and its relationship with OpenAI, which has attracted the attention of antitrust authorities

3

Microsoft has come under fire for its introduction of artificial intelligence tools and its relationship with OpenAI, which has attracted the attention of antitrust authoritiesPhoto credit: Getty

“Vishing”, a portmanteau of “voice” and “phishing”, is a type of attack in which scammers pretend to be friends, Familyor other trusted parties on the phone.

Voice spoofing could even pose a national security risk. In January, a robocall using President Joe Biden’s voice urged Democrats not to vote in the New Hampshire primary.

The man behind the plot was later charged with vote suppression and impersonating a candidate.

Microsoft has come under increasing criticism in connection with the implementation of artificial intelligence from both an antitrust and data protection perspective.

Regulators have expressed concerns about the tech giant’s $13 billion partnership with OpenAI and the resulting control over the startup.

What are the arguments against AI?

Artificial intelligence is a hotly debated topic and everyone seems to have an opinion on it. Here are some common arguments against it:

Job losses – Some industry experts argue that AI will create new niches in the job market and that while some jobs will disappear, new ones will emerge. However, many artists and writers insist that the argument is ethical because generative AI tools are trained to do their work and would not work otherwise.

Ethics – When AI is trained on a dataset, much of the content is taken from the internet. This is almost always, if not exclusively, done without notifying the people whose work is being taken.

Privacy – Content from personal social media accounts can be fed into language models for training. Concerns have arisen when Meta introduced its AI assistants on platforms such as Facebook and Instagram. This has come with legal challenges: a law protecting personal data was passed in the EU in 2016, and similar legislation is in the works in the US.

Incorrect information – When AI tools pull information from the internet, they can take things out of context or suffer from hallucinations that produce nonsensical answers. Tools like Copilot on Bing and Google’s generative AI in search always run the risk of getting things wrong. Some critics argue that this could have deadly consequences – such as AI prescribing the wrong health information.

The company also faced harsh criticism from users.

The release of Recall, an “AI assistant” that takes a screenshot of a device every few seconds, was postponed indefinitely last month.

Microsoft has faced a barrage of criticism from consumers and privacy experts such as the Information Commissioner’s Office in the UK.

In a statement to The US Sun, a company spokesperson said Recall will “move from a preview generally available for Copilot+ PCs … to a preview available first in the Windows Insider program.”

Only after receiving feedback from this community will Recall be “available for all Copilot+ PCs,” the spokesman said.

The company declined to comment on whether the tool posed a security risk.