Fact-checking Multimedia contents: How to spot fake audio

Silas Jonathan

Misinformation and disinformation is as old as the existence of man. As much as I can remember, Thomas, one of Jesus’s disciples, was among the ancient set of fact checkers demanding to see the nail-pierced hands of a resurrected Jesus. While seeing in science is crucial as proof of existence, in fact-checking, ‘seeing’ is merely a step to verification.

Manipulated videos, audio and images have been around for years, but the rise of artificial intelligence and advanced editing software have made them much, much harder to spot. It’s a complicated new world, where our very sense of reality can be thrown into doubt. This is one reason why verifying multimedia contents is a crucial part of fact checking. Theresa Giarrusso, Media literacy educator and expert said.

Several videos and images were passed during the 2020 Ghana presidential campaign and in the height of the COVID-19 pandemic. Most of these heralding contents turned out to be doctored. Yet this scheme goes even deeper than that, and these manipulated pieces of content can and have emerged in unexpected forms and context. While researchers are trying to wrap their arms around the verification of video and photos, what seems even more overwhelming is the idea around spotting fake audio.

Identifying fake Audio

Vijay Balasubramaniyan, the CEO and co-founder of Pindrop a company that creates security solutions to protect against the damage fake audio can do, while explaining the concept of Fake audio, he outlined that “if you have a smartphone or have ever chatted with a virtual assistant on a call, you’ve probably already interacted with manipulated audio voices. But like fake video, fake audio has gotten very sophisticated via artificial intelligence – and it can be just as damaging.”

Mr. Vijay added that manipulated audio is the basis for a lot of scams that can ruin people’s lives and even compromise large companies. “Every year, we see about $470 million in fraud losses, including from wire transfer and phone scams. It’s a massive scale,” he says.

While some of these rely on basic tricks similar to cheap fake videos (manipulating pitch to sound like a different gender, or inserting suggestive background noises) Mr. Vijay says running a few hours of someone’s voice through AI software can give you enough data to manipulate the voice into saying anything you want. And the audio can be so realistic, it’s difficult for the human ear to tell the difference.

What to look out for

However, it’s not impossible. When you’re listening for manipulated audio, here’s what to take note of:

Listen for a whine

The Whine is a long, high-pitched vocal speech that has the same pitch and tonality. It’s like a human voice but it keeps the same tone throughout. “If you don’t have enough audio to fill out all of the different sounds of someone’s voice, the result tends to sound more whiny than humans are,” Mr. Vijay says. The reason, he explains, is that AI programs find it hard to differentiate between general noise and speech in a recording. “The machine doesn’t know any different, so all of that noise is packaged in as part of the voice.”

Take note the timing

The timing between words spoken is most often quicker than the natural human Voice.

“When you record audio, every second of audio you analyze gives between 8,000 to 40,000 data points for your voice,” Mr. Vijay says. “But what some algorithms are going to do is just make a created voice sound similar, not necessarily follow the human model of speech production. So if the voice says ‘Hello Paul,’ you may notice the speed at which it went from ‘Hello’ to ‘Paul’ was too quick.”

Lacks unvoiced consonants

Unvoiced consonants are consonant sounds that are made without vibrating the vocal cords. The vocal cords are not making sound, there is just air passing through them. Like “t” “m” “f” etc. These sounds can be made without the main voice. Mr. Vijay explains that unvoiced consonants “have unique characteristics than other parts of vocal speech, and machines aren’t very good at replicating them.”

Listen to what it sounds like

The speed of machine manipulated sounds differs from the natural human voice. The tone, speed, tonality and its vocal sounds are strikingly different when listening to it attentively. Listen to this manipulated audio. Note the speed of the words and the placement of the consonants, and how they sound different from natural speech.

Conclusion

The digital world has no doubt revolutionized human existence and while it has helped greatly; it has also been used in many negative ways .i.e. manipulated audio sounds. Nonetheless, paying attention to the nature of content you believe and share might just be the solution to these unfolding menaces.