Using AI voice generator text-to-speech technology for better eLearning.
Efficiency is paramount in producing the best eLearning outcomes. How, then, can we streamline our processes without wasting time, energy, and money?
Voice-over narration is a good place to start.
Narrating eLearning is time-consuming.
Software that streamlines voice-over narration has been an eLearning staple. Here, I’m thinking specifically of the text-to-speech plugins for Articulate Storyline and Captivate. Unfortunately, these plugins produce a robotic sound that is less than engaging.
But what else can you do?
Recording your own voice has its drawbacks. If you aren’t using professional equipment, you may come away with a p-popping, noisy recording that diminishes the effectiveness of your course. And even with professional equipment, the dialogue can be disappointing if you aren’t an experienced narrator.
The result? You’ve sacrificed a chunk of time only to compromise the quality of your course.
You could hire professional voice talent—this has always been my preference. But this option, too, has its downsides.
With eLearning, relying on many pages of narration is common. Professional talent is expensive, hard to coordinate, prone to inconsistent reads, and, often, unavailable when you need it most.
So … what’s a hard-working, eLearning professional to do?
Enter AI (artificial intelligence) voice generator technology.
Science fiction is now a reality. AI voice takes things to a whole new level and improves eLearning.
While not new, AI voice technology has become better and more accessible; synthesized speech—closely imitating the real thing—is at our disposal.
What is AI–generated synthesized speech?
Here’s how it works. A technician feeds random samples of a narrator’s recorded speech into the generator. Through machine learning, the system encrypts those samples.
In doing so, the system “learns” how to apply the narrator’s voice to any text you give it.
But don’t worry about how it works. Just know that it does, and it can work for you. All you need is a subscription to a voice generator website and a credit card.
After that, it’s just a matter of choosing the best voice for your project. You’ll have access to unique avatars (or characters) that represent different genders, nationalities, styles, and sounds. Most sites let you preview these voice samples.
Once you’ve decided, it’s easy. Just copy and paste your text into a window, and click a button. That’s it. In mere moments, you’ll have an audio file ready to download.
The AI voice market is growing. Easy-to-use alternatives to live voice talent is catching on.
One notable resource for this technology is WellSaid Labs. I was already familiar with a number of text-to-speech apps. Many of them touted themselves as “better than the rest,” but, inevitably … they weren’t. So, when a colleague asked me to record a script through WellSaid Labs, I was skeptical.
Imagine my surprise to discover that at least one app was better than the rest.
The recording of my script sounded … real. I mean, it wasn’t perfect, exactly … but it was good. It was really good. Most people would never suspect the voice was computer-generated.
Even when presented with one minor quirk (the AI misread a company name), I found the remedy simple. It took only a quick phonetic tweak to the original text, and the AI nailed the pronunciation.
More impressive, still, were the pleasant inflections and intonations peppered throughout the recording. It was that kind of attention to detail that made the read all the more convincing.
“Wow,” I remember thinking. “This is incredible.”
This made me curious. Were other AI voice providers this advanced? I started comparing.
In the end, the results were mixed. The number of available voice avatars varied widely, as did differences between production quality, cost, and accessibility. It’s really a matter of personal preference.
Some services, however, did come within striking distance of WellSaid Labs’ quality. Synthesys, Murf, Listnr, and a few others stood out. Most of the rest just sounded robotic.
But one truth is inescapable. The number of AI voice generators is growing, with many quite capable of replicating human timbre. It won’t be long before we’re incapable of distinguishing the real voices from the engineered.
Ethically speaking, should we worry about AI voice technology?
AI voice technology raises questions and concerns about machines that sound like us. Take, for instance, this recent news story:
A corporate employee received a call. It was his boss, urgently demanding a large transfer of money. The dutiful employee snapped to and executed the command immediately … only to discover later that his boss hadn’t called him at all—but an AI voice impersonation had.
With good reason, cybersecurity professionals are increasingly concerned about the threat that AI voice technology may pose.
Last year, film director Morgan Neville produced a documentary about the late Anthony Bourdain. One scene included a brief clip of Bourdain talking.
Except … it wasn’t Bourdain. Neville had tapped into the magic of AI voice technology. And the social media backlash was fierce.
Although Neville bore no malicious intent, critics insisted the director had crossed an ethical boundary. And Neville isn’t alone. Artists, engineers, and all manner of industries are grappling with this dilemma: what happens when you separate the speech from the speaker?
WellSaid Labs is also assessing the effect and implications of AI voice technology—on both the voice-over profession, and society as a whole.
In my interview with Sara Weisweaver, Phd, Director of Ethics and Community at WellSaid Labs, she makes their position clear. Their commitment is two-fold: ensuring 1) that human talent remains a key component of AI, and 2) that those humans are protected.
“Those avatars on our site represent real humans, but their identity is secured,” says Weisweaver. “We want to keep their avatar anonymous and make sure nobody on our platform can ever connect the dots between the human voice actor and the synthetic voice.”
WellSaid Labs’ website includes a zero-tolerance policy on unlawful or hateful content, nonconsensual use of voices, and deepfake misrepresentations. Such a public decree attests to their dogged adherence to ethical standards.
Voice talent is losing work to automation.
Computer-generated voices are becoming so lifelike that voice actors are beginning to feel the pressure. The competition is real.
Voices123.com, a leading online marketplace for voice talent, acknowledges the conflict.
“From a business perspective, AI voices are a rapidly developing niche that has a positive impact on evolving business trends,” their blog page reads. “Love ’em or hate ’em, this technology creates a volcanic topic for debate. Some commentators, however, genuinely believe that AI is good for the voice-over industry.”
Weisweaver of WellSaid Labs weighed in on this topic as well. She recognizes these concerns but notes the silver lining: many voice actors are excited about having their voices represented synthetically.
Weisweaver went on to say, “They also earn a revenue stream that is simply passive income with a quarterly check based on every recording made by their avatar.
“It’s a very collaborative process. Our voice talent decides the name for their avatar and the look of their image and descriptors for their voices. We really put a lot of faith in our voice actors to deliver the kind of emotion that feels most meaningful to them, and I think that’s a large part of power behind the voices sounding so natural. We’re allowing them to lean on their expertise to give us something that feels very natural to them.”
The future of AI and text to voice.
We knew that, eventually, it would be hard to distinguish between voice recordings that are real and those that are computer-generated. I think we’re getting there.
But we’re still a long way from removing the human element entirely.
Weisweaver contends that traditional voice recordings will always be necessary. Only voice actors can deliver a full range of performances. Only voice actors can affect the nuanced modulations that make recordings … human. For now, anyway.
For some applications, there’s simply no substitute for live dialogue. This becomes especially evident in applications like video games or animation, where characters clash in violent battle.
When it comes to churning out reams of recorded text, there’s no better solution than AI voice technology.
Of course, the market is still littered with inferior products—low-quality audio that sounds unnatural, monotone, and robotic. But, nowadays, most voice generators are improved significantly. They have benefited from advances in machine learning and deep learning, and they produce more authentic results.
AI-generated voices will likely be a godsend for many, but not for everyone. For the foreseeable future, projects that require a certain intensity—perhaps to convey anger, disappointment, or sadness—won’t be good candidates for AI.
On the whole, though … if you’re looking to achieve maximum efficiency, AI–generated voice technology is the way to go.
Chase Roberts provides complete corporate instructional design services: www.chaseroberts.net
Chase is also the owner and director of VISIONSOUND FILMS, a Los Angeles video production company specializing in documentaries, marketing films, and corporate films: www.visionsoundfilms.com
Explore more Insights!
AI Voice Technology and eLearning
See how text-to-speech technology is changing how we develop eLearning!