Audio-based Deepfakes: Risks, detection, and verification techniques

Silas Jonathan

Jennifer DeStefano, a mother from Scottsdale, Arizona, received a hoax call in which her 15-year-old daughter was falsely claimed to have been kidnapped. The caller, pretending to be a kidnapper, even put Jennifer’s daughter on the phone to make the demand more convincing.

The police later discovered the call was a “deepfake,” an intelligence-based technology that created realistic-looking videos, images, and audio of people saying or doing things they never did. This technology has been used for various purposes, including creating fake news stories and altering existing audio and videos to spread false information.

In Nigeria, deepfakes have been used extensively to create videos and audio of non-existent people or to alter existing videos or audio to make them appear different. However, it is audio-based deepfakes that are now generating more controversies. During the 2023 elections, fact-checkers were overwhelmed with numerous claims of audio deepfakes, but there needed to be more tools to fact-check them. As a result, audio-based deepfakes made waves in the Nigerian social media sphere instigating unhealthy debates.

Audio-based deepfakes: The risks and challenges of AI-generated synthetic voices

Audio-based deepfakes are AI-generated technology that uses audio recordings to create realistic-sounding synthetic voices. They are forged using machine learning algorithms and natural language processing techniques to generate realistic-sounding audio clips. The average listener often struggles to identify the gaps and red flags, which is why audio deepfakes are massively used to imitate famous personalities since samples of their voices are in public access.

Audio-based deepfakes have also been used to create convincing audio recordings of people speaking in different languages and fake recorded conversations to influence public discourse. Therefore, it is essential to be aware of this technology’s potential risks and take steps to protect society from its possible misuse.

How to Detect Audio Deepfakes: Tips and Techniques

Listen for unnatural pauses: When listening to an audio recording, note any unnatural pauses or stutters that may indicate an audio deepfake. For example, if the speaker’s speech pattern suddenly changes or there are long pauses between words that are not typical of their natural speech, it could be a sign of a deepfake.
Check the audio quality: Audio deepfakes are often less quality than natural speech. Listen for any distortion or artefacts that may indicate an audio deepfake. For example, if the recording sounds muffled or there are sudden changes in volume or clarity, it could be a sign of a deepfake.
Analyse the audio spectrogram: An audio spectrogram visually represents an audio signal. By analysing the spectrogram, you can identify unnatural patterns or frequencies that may indicate an audio deepfake. For example, sudden changes in frequency or amplitude that are not typical of natural speech could be a sign of a deepfake.
Use deep learning algorithms: Deep learning algorithms can detect deep audio fakes. These algorithms can analyse the audio signal and identify unnatural patterns or features that may indicate an audio deepfake. For example, some algorithms may look for inconsistencies in the waveform or analyse the frequency distribution of the signal to identify anomalies. Although this technique is mainly for experts, it is essential to be aware of it.
If you are familiar with the voice in question, compare it with verified ones. Consider the context of the conversation and the possibility of it being real or fake. For example, if you know the speaker well, you can identify subtle differences in their voice, such as tone or accent, that are not present in the deepfake. You may also want to consider the context of the conversation – if the topic or style seems unusual or out of character for the speaker, it could be a sign of a deepfake.

Verifying Audio-Based DeepFakes: Techniques for Fact-Checkers and Journalists

In today’s world of digital media, the digital exchange of information is everything. Most exchanges happen digitally, from breaking news to intimate conversations with loved ones. However, the rise of deepfake videos and audio has left people questioning the authenticity of these exchanges, effectively limiting their usefulness. Verifying audio-based deepfakes remains a significant challenge for fact-checkers and journalists worldwide.

According to Mr Logan Blue, a PhD candidate at the Florida Institute for Cybersecurity Research (FICS), University of Florida, persuading people that what they hear is fake is an enormous challenge as AI-powered deep fakes become more sophisticated.

“AI is attaining perfection, and audio deepfakes are obvious evidence. It is difficult to tell people what they hear is fake, especially if the voice or conversation fits a trending issue. AI is fast evolving our mere humanity, and deep fake is just one piece of evidence,” he said.

Despite the challenges, there are several techniques that fact-checkers and journalists can use to verify audio-based deepfakes.

Audio Analysis: This is a technique that involves analysing audio signals to extract meaningful information. It can be used to detect and verify the authenticity of a voice call. For example, audio analysis can be used in banking to detect fraud in voice-based transactions. In such cases, audio analysis can help detect changes in a caller’s voice or any signs of deception.
Voice Biometrics: Voice biometrics is a technology that uses voice recognition to identify a person’s identity. It can be used to verify the identity of a caller and detect any discrepancies in the voice. For instance, law enforcement agencies can use voice biometrics to identify criminals by their voice patterns. Voice biometrics can also be used in call centres to authenticate customers.
Speech Recognition: Speech recognition technology uses algorithms to recognise spoken words. It can detect any changes in a caller’s voice and verify the authenticity of a voice call. It can also transcribe voice calls and convert them into text. This can be useful in call centres to record customer interactions.
Voiceprint Analysis involves analysing a person’s voice to identify unique characteristics. It can detect any changes in a caller’s voice and verify the authenticity of a voice call. For instance, financial institutions use voiceprint analysis to identify customers by their voice patterns. This can help prevent identity theft and fraud.
Acoustic Analysis: Acoustic analysis involves analysing the sound waves of a voice call to detect any changes in a caller’s voice. It can be used to verify the authenticity of a voice call. Acoustic analysis can also detect background noise and other environmental factors that may affect the quality of a voice call. This can help improve the overall customer experience in call centres.

While these methods effectively verify audio deepfake, Mr Blue explains that they are expensive and mostly way above a fact-checker’s budget.

“The challenge is that these methods are expensive for the typical fact-checker. Most of these techniques for detecting fake audio are not created for fact-checkers. It is mostly for financial institutions and big corporate bodies, so it may be difficult for fact-checkers to access them easily.” However, like Deepware (a tool for verifying video deepfakes,) Mr Blue adds that tools to verify audio deepfakes will soon be available and affordable for fact-checkers worldwide.

For just anyone, however, when verifying audio-based deepfakes, comparing the voice in question with verified ones is important. It is also essential to consider the context of the conversation and the possibility of it being real or fake, Mr Blue said. If the digital world is to remain a critical resource for information in people’s lives, effective and secure techniques for determining the source of an audio sample are crucial.