5 minute read

That Synching Feeling

It feels like you can’t set foot online these days without someone flogging their latest AI tool to put human creators out of action. One that threatens to wipe out the e-learning voice-over business is Text-To-Speech Avatars. For a small fee, you input your script, then select from a library of voices and heads to receive a ready-to-go MP4, expertly spoken and gestured along to by your chosen avatar. If you need to change anything, just tweak the script and re-render - no expensive VO artists or presenters to pay again to re-record an edit. But are they any good? Mark Gash puts 6 of them to the test.

Our testing criteria was fairly simple - take whatever avatar and voice was being offered as part of the free trial and task it with presenting our same script. We then compared the result of each service, scoring it on the voice, the avatar and the synching between the two. It may well be that paid-for tiers offer better versions of the voice and avatar but we would argue that you should try and reel in your piunters with your best shot!

D-ID for Canva

The voice on this is actually pretty good, with some good audio expression and emphasis given to words. The actual avatar however is pretty abysmal - it’s a pixelated close-up with weird dead eyes that bore into your soul.

Vidnoz

The avatars seem a bit too uncanny valley and although I appreciate the effort to try and animate their arms and facial expressions, they’re overly exaggerated and don’t seem to coincide with any particular phrase or intonation, so they come across as random, creepy spasms. The voice is also noticeably more AI than some of the others I tested.

Deepbrain-IO

This one almost had me until the last line where she seems to turn evil and ruins the experience. Other than that, the voice is lively with good intonation and the avatar moves well without coming across like a puppet. She did pronounce “I am” as “A.M.” which was odd.

Veed.io

The avatar looks great - no facial expressions were too large or over exaggerated and the lip-synching was spoton. If anything, she was almost too reserved, with her facial movements being quite subtle, so she looked a bit unenthusiastic about my script. Her eyes were a bit dead and moved slightly side-to-side as though she was reading from an autocue, which isn’t really what you want. If you’re looking for realism, this might be the one for you but don’t expect any excitement.5

Elai

Aside from the medieval serving wench attire, this was passable. The arm movements were too soft and didn’t look convincing but it was a good effort. The mouth was slightly too large and the lip synch wasn’t the best but saying all that, it came together well. Not sure about the gappy teeth on her left side though - the right side was perfect. Needs to see an orthodontist.

Synthesia

This is the winner for me - natural expression and body movement, coupled with a friendly enthusiastic voice and great lip-synching. If I’m being picky, she also had weird / missing teeth on her left side and sounded a bit sinister on the question at the end but she was still “head and shoulders” above the rest.

For what it’s worth, I don’t think that AI avatars and text-to-speech software are anywhere near the quality you get from commissioning an actual human voiceover artist or presenter. However, in a world of ever-shrinking budgets and deadlines of yesterday, the costs for a professional, coupled with the lead times and turnaround for re-edits, mean that AI has already taken a foothold in e-learning.

At the moment, giving direction to an AI in terms of emphasis on certain words or adding hints of sarcasm can be quite difficult and leaves you feeling frustrated when the finished product doesn’t sound exactly as you would have liked. Especially when you know that these audible niggles would be easy to solve if you were dealing with a real person. Yet AI is improving every day and soon, these issues won’t be issues at all. Hollywood has no qualms about de-aging actors with artificial intelligence and using archival recordings to recreate speech, and soon this movie-quality AI will be available to everyone for a low monthly subscription price. Once you can licence a young Harrison Ford to present your Health and Safety training, I have a synching feeling, that the era of the voiceover artist will be at an end.

This article is from: