The Ethics and Future of AI Voice Technology

August 2, 2021

Prefer to listen instead? Here’s the podcast version of this article.

“Roadrunner: A Film About Anthony Bourdain” was released to theaters a couple of weeks ago and chronicled Bourdain’s life through thousands of hours of video footage and audio. In addition to the footage and audio, the film’s creators also asked a software company to generate an AI version of Anthony Bourdain’s voice.

The reaction has been mixed to the use of AI to generate new audio of someone who has passed away, and it’s fueled yet another discussion around the topic of ethics and AI technology. From deepfakes to misinformation, AI’s evolution hasn’t been able to escape controversy in the past year, even if the potential positives of the technology’s growth outweigh its negatives.

How does AI Voice Technology Work?

Bourdain’s AI voice was generated to read aloud a letter he had written to a friend. This was done through synthetic voice AI that analyzes a voice through audio. In Bourdain’s case, the television star has hours and hours of content that AI can use to generate a synthetic voice that matches Bourdain’s speaking mannerisms.

This technology is similar to the deepfake technology of Tom Cruise that went viral this past spring. Cruise’s AI-generated face was easier to replicate because of the plethora of video and photo content Cruise has accumulated throughout his career.

Another example of how this technology works can be found in a recent episode of “The Simpsons.” The voice of the character Mrs. Krabappel was performed by the late Marcia Wallace; however, the character’s voice was resurrected by splicing and assembling phonemes from past episodes and recording sessions.

Karen Hao, an MIT Technology Review editor, recently told The New Yorker she believed this use of synthetic media in a fictional show passes her ethical litmus test: “You know that the person’s voice is not representing them, so there’s less attachment to the fact that the voice might be fake.”

Hao’s concerns of the Bourdain use case run a little deeper though: “It’s not clearly faked, nor is it clearly real, and the fact that it was his actual words just muddles that even more.” Bourdain’s synthetic voice wasn’t detected until the film’s director mentioned it was, which might be a terrifying discovery or notion to the public.

The reality is that this technology is quickly breaking new ground, so new boundaries are getting crossed in this field that previously didn’t have regulations or boundaries.

What is the Future of AI Voice Technology?

A few companies are exploring how to deliver synthetic media and audio to other businesses and audiences. Sonantic, a company that started in 2018, touts AI voices as the CGI of audio. Their CEO mentioned they’ve been working on creating the first AI that can cry and shout. Their idea is that they’ll be able to provide lifelike performances for mediums like video games or films, and Sonantic hopes to reduce production timelines by quickly turning scripts into audio with the help of artificial intelligence.

VocaliD is focusing on pitching its synthetic-voice technology to corporate clients. So, in times when voice talent isn’t available or difficult to schedule, VocaliD can step in and quickly generate text-to-speech. Computer-generated audio can sound clunky and very unrealistic, but synthetic voice can bridge that gap in realism and provide companies with another viable way to create audio content.

Lastly, Resemble AI offers voice cloning and can synthesize any real speaker’s voice based off 50 or so spoken sentences. Like VocaliD, Resemble AI promises clients they’ll save money and time on creating audio content by relying on AI while also receiving high-quality audio in return.

However, there are some limitations when using this type of service. A company can’t just ask to use a celebrity’s voice for their content as they’d still need permission from that celebrity. Likewise, bringing in a voice actor and then creating more content using an AI-generated version of their voice is also forbidden. Their voice is intellectual property after all.

Companies might also need to be weary on generating AI voices of their own employees or freelance actors. Maybe companies can consider utilizing a profit share system for the actor’s voice that might be synthetically used in the future. In terms of regulations, AI voice technology is currently the wild west.

AI Voice Technology Continues to Grow

AI voice technology and synthetic voice are probably not at the point of being able to trick a listener over a long period of time; however, the strides made recently have pushed the technology into the spotlight and conversations.

Think about CGI in movies or television. For a while, CGI was very noticeable to the point where people became impressed with sets or scenes that didn’t require CGI. Now it’s almost impossible to identify every detail of CGI in any scene from a film.

At some point, synthetic voice is going to reach that stage of becoming a cost-effective tool that is hard to distinguish from reality. And the ethics of this technology will follow with each innovation.

More Insights