ExplainersElectionsFeaturedHeadlineHomepage

The growing threat of audio deepfakes and why it’s difficult to fact-check 

Getting your Trinity Audio player ready...

The integrity of democratic elections rests on informed electorate decision-making based on verifiable facts. However, the dynamic field of Artificial Intelligence (AI) presents a potent new threat: Audio Deepfakes. 

These AI-generated audio clips, capable of convincingly mimicking a person’s voice and speech patterns, pose a significant risk because their manipulation could be particularly disruptive during election seasons. 

In the 2023 Nigerian general election, we saw various narratives emerge from audio claims circulating ahead of the election. From the leaked conversation about Peter Obi’s conversation with David Oyedepo of Living Faith Church to the audio of Atiku Abubakar, Aminu Tambuwal, and Ifeanyi Okowa’s plan to rig the election

Although fact-checkers debunked these claims here and here, they did little to change the minds of electorates that had bought into the narrative.

As AI tools for altering and creating audio advance, verifying such claims is becoming more difficult. Given the Nigerian experience, we anticipated similar claims in the December 7, 2024, Ghanaian election, and we were not disappointed. 

A few days before the election, we observed an audio clip going viral on WhatsApp (archived here) that claimed John Mahama had advocated for his supporters to lie to the electorate in order to secure their votes. 

Another audio clip from TikTok (archived here) claims that Mahamudu Bawumia, the NPP’s presidential candidate, called Ghanaians weak-minded and promised to deceive them into winning the upcoming election. However, we found both audio claims false and misleading, as they had been manipulated.

The Nigerian and Ghanaian experiences demonstrate how audio deepfakes are becoming a form of electoral disinformation, yet this is not a top priority in conversations. 

Silas Jonathan, the Research and Digital Investigations manager at the Digital Technology, Artificial Intelligence, and Information Disorder Analysis Centre (DAIDAC) of the Centre for Journalism Innovation and Development (CJID), noted that there is a lack of public awareness about the problem. 

He stated that a lack of awareness among the populace is a significant challenge that enables this deepfake to spread. Therefore, he believes that the starting point to fighting deepfake audio is to create awareness and educate the public. 

“Generally, there is a lack of awareness among the populace that there is, even in the first place, the possibility for AI to create a convincing voice of someone. The public is not well aware of the possibility and capacity of AI to do those kinds of magic. 

“What that means is we need to start from there to explain to people that there is something called an audio deepfake, and this is what it looks like. This is the problem we are having with the audio deepfake. People don’t know that it is possible or already exists, so we need to create awareness.”

AI tools fall short 

During the Ghana election, although we utilised some AI detection tools, such as Hive Moderation, Resemble AI, and Deepware, we could not rely solely on their results because they were inconsistent. We had to compare previous interviews of the said actors with the viral audio clips, focusing on their speech patterns and intonation to reach a conclusion.

Kunle Adebajo, the head of Investigations at HumAngle, confirmed this experience by noting that AI tools are inconsistent and unreliable.

“Unfortunately, many of the AI tools that have been developed to detect whether audio is AI-generated have not been consistent or reliable. Sometimes they return conflicting results, sometimes they return false positives, and many are not even free tools, so you would have to pay to access them. So these are the challenges journalists are facing.”

Silas agreed with Kunle, noting that the tools available for verifying audio deepfakes are inefficient. 

“The fourth challenge is that the tools that are available to verify deepfakes are not efficient enough. Sometimes you try some tools and you see the findings are false, especially when you try it with your voice clips.”

Kunle, therefore, advised that fact-checkers should not rely entirely on AI detection tools, but instead use multiple sources to verify deepfake audio and combine them with different investigative skills to achieve their goal. 

He recommended that when fact-checkers use these AI detection tools, they test them by submitting a file that they know is authentic and another that is generated by AI to assess the accuracy of the results. This test, he said, should be conducted across multiple tools to identify the tool with more precise and consistent results, before corroborating their findings through other investigative techniques.

“My advice would be to rely on multiple sources in fact-checking claims like that, and speak to experts with experience investigating audio claims. They should use reverse image search to see if another copy of that file is different; maybe it was manipulated. 

“Fact-checkers should also consult language experts to see if there are any red flags in the way the words are said, whether the person that is being claimed to say that would say those words in that way, whether the accent is different from that of the person, whether the dialect is different, etc. They should also check for signs of coordinated inauthentic behaviour. 

“If they find that a particular audio file is being shared by a closed circle of social media accounts that have a pattern of sharing disinformation or propaganda, then that can also be a red flag pointing to coordinated behaviour. This indicates an intention and a tactic that suggest disinformation. Fact-checkers should generally just use common sense and basic investigative techniques.

He also pointed out that fact-checkers must have a basic understanding of audio files to effectively fact-check audio deepfakes. 

“I think journalists should not entirely rely on AI tools. They should develop a close technical understanding of what makes an audio file, what the potential red flags are to look out for if you suspect fabrication, and that would also guide the thinking process of the investigation. If you will use AI detection tools, do not entirely rely on them; use multiple tools and files to check how reliable that tool is.”

Chioma Iruke, the Program officer, Digital Governance at the Centre for Democracy and Development (CDD) said audio deepfakes are one of the hardest type of disinformation to detect because there are no actual tools for it and using other strategies is time consuming.

“One of my experiences with audio claims is shortly after the elections, when that ‘yes daddy’ audio began to trend. It was difficult to do because there was no actual tool to track deepfake audio, except for those that are cut and joined. Those ones are easily trackable because you can hear the brakes and all that. AI-generated audio is quite easier to track if it was generated, but generally, audio is one of the hardest forms of disinformation to tackle because there is no tool, especially if you are trying to work with a deadline.

“This does not mean audio disinformation cannot be tracked. It can, but it takes time. One of the ways to track audio disinformation is that you have to start listening to the person’s speech more like a dedication ot the person’s speech. It takes you a week, sometimes it takes months to track it. How the person speaks and talks when they are tense. So you are tracking this and putting it against the disinformation put out. Compare and contrast to find similarities and differences.”

Audio deepfakes are easy to create but difficult to detect

Kunle shared his concerns about deepfake audio during elections. He noted that this is particularly worrisome because it is easy to create, seeing that political actors have their voices in the public domain. 

He said these voices can easily be fed to AI. However, audio deepfakes are challenging to detect, especially with advancements in AI.

“Everybody knows audio fakes are very cheap to create, especially compared to videos. They can be very dangerous, especially during elections, because the actors are people whose voices are already out there and can easily be fed into these generative AI tools, which would mimic them with amazing results, amazing similarity.”

Elizabeth Ogunbamowo, a fact-checker with DUBAWA, also emphasised this, noting she had to let go of some audio claims during the 2023 Nigerian election because of this challenge. She added that sometimes fact-checkers resort to using tools for video deepfakes to verify audio fakes which is not ideal. 

“Audio claims can be very challenging to fact-check because there are limited tools available to fact-checkers to ascertain the authenticity of such claims. There have been cases where some journalists used a tool meant for video verification to fact-check an audio claim, with a verdict, but that didn’t really sit well with me because that’s more like the misuse of a tool.

“We faced challenges with audio claims during the live fact-check of the 2023 presidential elections and subsequent by-elections that took place in the country. There were claims I wanted to work on but had to leave out because I needed to be sure and transparent with the process as much as possible. 

Deepfake audio provides no clue 

Silas stated that one of the challenges is that deepfake audio, unlike deepfake video, has no visual clues. He added that these audio clips can be subjected to further editing, where realistic sounds and effects can be added, making them appear more lifelike. 

“Audio deepfakes, unlike video deepfakes, do not give us more clues than what we already have. Another thing is that, unlike video deepfakes, audio fakes can also be subjected to further editing. In audio, there are actualities, and actualities are supposed to prove the reality of what the audio is saying. For example, if I say I am at Chicken House, I went to buy chicken, you can hear chicken noises in the background, and believe I am telling the truth. But because audio can be edited and actualities can be added, it has made deepfake identification, especially during elections, so convincing.”  

Kunle agreed with this, noting things like breathing pattern and background noises, which used to be clues, can now be edited into an audio clip. 

“It is still difficult to detect whether an audio file is genuine or created using artificial intelligence, and many factors are responsible for that. Number one, AI has gotten good, number two, some of the things that you check out for, like breathing patterns, background noise, etc, can be added to the audio. Detecting audio deepfake also depends on how it is created, whether it is text-to-speech or voice conversion,” he said.

Elizabeth agreeing with this said the lack of clues and the sophistication of audio manipulation tools makes the job of the fact-checker difficult. 

She however noted that some AI tools like DUBAWA Audio come in handy in extracting claims from audio recordings which makes the process faster for fact-checkers.

“Truth is, with a manipulated video or image, without the use of verification tools, merely by looking at the visual, you may be able to tell that it has been touched, but audio does not work that way.

“The pointers we used to look out for before, like if an audio recording sounds a certain way, maybe the tone or speed at which the person is speaking doesn’t sound human-like, then we conclude it is machine-generated. Times have changed, AI has made it more complex, such that the voice of a person can be well cloned and made to sound like they said those words they never uttered. The software used for this manipulation is now so sophisticated.”

Sam Ojo, an audio/video editor, noted that verifying deepfake audio is tricky because even when the audio is real, you can’t tell if further editing was done. 

He added that fact-checkers are not just verifying words; they verify the source, setting, speaker, and intention.

“Verifying audio fakes can be tricky because it’s not just about what was said, but also how and who said it. Sometimes, you’re stuck trying to figure out: Was this audio created from scratch with AI? Was it someone’s real voice that got edited or manipulated? Or is it a genuine recording, or is the content itself false? 

“All these layers make it hard to draw the line. Just because an audio sounds real doesn’t mean the content is true. And just because it was truly recorded doesn’t mean it wasn’t edited to mislead. That’s the challenge, you’re not just fact-checking words, you’re fact-checking the source, the setting, the speaker, and even the intention.”

Sam noted that detecting deepfake audio also depends on the type of claim made. Still, a good start is listening to the background sounds and comparing the audio with the actor’s original voice recordings. 

“The first thing I’ll say is this—it depends on the claim. But generally, listening to the background is one of the smartest ways to start. The ambience tells you a lot. If someone claims the audio was recorded in a market or a church, but the sound is too clean or quiet, that’s a red flag. Also, compare voices. If a known person is speaking, look for older clips or public recordings of them. AI-generated voices usually sound too perfect, but real humans make mistakes, pause, or have things like the “H factor” or “R factor” in the way they talk. Those minor imperfections are important.”

He also agreed with Kunle that fact-checkers need to learn audio, its structure, and how to create or edit it. That way, they can understand the various possible ways of manipulation. 

“Then there’s understanding audio itself. Learn how waveforms work, play with analysis tools, and train your ear. The more you listen to audio critically, the easier it is to spot when something feels “off”.

Therefore, he advised fact-checkers to pay attention to details and not rush to conclusions. 

“My advice is simple: pay attention to the details. Audio can be sneaky—sometimes even more than video or text. It might sound real, but the context could be totally misleading. So take your time. Don’t rush to conclusions. Always check: Who is speaking? Where was it recorded? Is the content true, even if the voice is real? 

“Compare it with other recordings. Be patient. Be curious. And also, don’t stop learning. New tools are coming out daily, and audio manipulation will only get more advanced. So you need to equip yourself.”

Polarisation during elections promotes the spread of Audio Deepfakes

Silas highlighted that polarisation, a major player in African elections, aids the spread and acceptance of deepfake audio. 

He said that because people are divided during elections, confirmation bias plays a role, making the acceptance and spread of audio fakes swift. 

“The third challenge to audio deepfakes is that people are highly polarised during elections, that is, people are usually divided during elections, either by politics, religion or region. That division comes with confirmation bias, and that confirmation bias comes from that division because people want to believe what they want to believe. So, audio deepfakes, because there is not much to prove in terms of seeing, reinforce people’s biases, and that makes it difficult for them to look at it critically to identify it. Even if it is verified, sometimes they question the findings,” he explained. 

Conclusion

Audio deepfakes have become a part of our elections, but several challenges still make it difficult for fact-checkers to detect and debunk them quickly. Therefore, strategic conversations and actions from all stakeholders are needed to address this problem ahead of other election cycles. 

Show More

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button
Translate »