Six ways to identify visuals made with Google’s Veo 3, other AI video tools

Phillip Anjorin

4 months ago

Six ways to identify visuals made with Google’s Veo 3, other AI video tools

A video showing snowfall in Enugu went viral on social media, gathering many views. Reactions depicted an audience with split opinions.

Despite the video’s numerous glaring errors, some users in the comment section still inquired whether it was created with Artificial Intelligence (AI). When DUBAWA conducted a fact check, we discovered that some individuals even claimed to have witnessed the snowfall a day before the claim went viral. Whether comical or intentional, such claims further convince sceptics of the possibility of snowfall in Enugu State or elsewhere in Nigeria. However, DUBAWA’s findings revealed that AI-generated the video.

The snowfall video was not the only viral visual that confused viewers, as DUBAWA discovered that many claims focused on the climate. Some showed snowfall in Delta State and flooding in Ogun State. Another set of videos accused the Federal Government of plotting to arrest single people in Nigeria and deport them to fight in Iran.

Another viral video claimed that authorities in Ankpa town, Kogi State, announced that men can freely marry wives from the town without paying a bride price.

AI has revolutionised video creation, and Google’s Veo 3 currently leads this transformation. Social media platforms now overflow with AI-generated videos, many of which look so authentic that they easily mislead viewers. Even professional fact-checkers, armed with advanced verification tools, often struggle to confirm a video’s authenticity.

AI video generation’s chronology

The evolution of AI video generation has transformed video creation from a labour-intensive art into an accessible, prompt-driven process. Several key stages of transformation, built upon the breakthroughs of the previous era, gradually blurred the line between synthetic and authentic footage.

From 2015 to 2021, AI video generation was in its infancy. Technologies like Google’s DeepDream pioneered neural network-driven video manipulation. In the early days, it enhanced images unusually, producing pattern-heavy sequences.

By 2018, NVIDIA introduced Vid2Vid, which employed conditional Generative Adversarial Networks (GANs) to convert sketches into videos. The development featured basic motion consistency.

However, these early systems revealed obvious limitations. For instance, the video’s duration was brief, and the motions were incoherent. Depending on the creator, the narratives could also feel out of touch with the visuals because they manipulate existing footage or images instead of generating original content.

Some of these challenges were addressed between 2022 and 2023 when innovative models like DALL-E and MidJourney advanced text-to-visuals generation. These models demonstrated an understanding of complex visual semantics by interpreting text prompts and translating them into coherent images.

Researchers then adapted these architectures to generate sequences of images, addressing the challenge of temporal coherence. The adaptation ensured that characters and objects remained consistent across frames. Some tools that emerged during this period, like Pika Lab, could create short clips of three to five seconds. However, users must upload images with basic motion interpolation before the tools can generate such a video.

Although these outputs showed some understanding of physics and rudimentary expressions, they still appeared artificial, with noticeable transitions and limited practical use beyond experimentation.

Two factors influenced the leap in video generation between 2024 and 2025. Their impact led to better audio synchronisation, 4k resolution, and more efficient cinematic controls.

Firstly, architectural innovations shifted the dominant framework from GANs to diffusion models. These models employ iterative noise reduction to generate more stable and high-fidelity video sequences.

Then, an industry powerhouse race propelled the technology forward. OpenAI’s Sora generated detailed, minute-long narratives from single prompts. Google’s Veo 3, learning from its two previous models, improved its scene complexity to generate eight-second-long high-definition videos per prompt.

The update included realistic speech, sound effects, and music. Users could guide the model with reference images and detailed prompts, resulting in videos that rival professional productions.

The advanced camera movements and prompt-based editing made these upgrades a favourite among content creators and, unfortunately, those seeking to spread misinformation.

Key Features that Define Veo 3 Videos

The upgrades in video generation caught many deepfake detection tools unaware, making the detection process more complex. Identifying such videos created with the latest AI video generation tools now depends on combining human intellect with deepfake detection tools. Here are easy ways to identify such videos:

Short Video Duration

Veo 3 restricts each generated video to a maximum of eight seconds. If you encounter a video that lasts exactly eight seconds or consists of several eight-second segments stitched together, consider the possibility that it originated from Veo 3.

Resolution and Visual Quality

While Veo 3 can output high-quality videos, most of its content appears at 720p. Some users might upscale these videos, but the original resolution often remains evident. If a video appears unusually sharp yet capped at 720p or its visual quality seems inconsistent with the context, it may indicate AI-generated content.

Subtle Visual and Audio Anomalies

Despite its sophistication, the artificial nature of videos made with Veo 3 is easy to recognise. You might notice slightly unnatural facial expressions, robotic body movements, or overly smooth camera transitions. You can also look out for altered texts or incoherent subtitles. The audio, though synchronised, sometimes sounds too perfect or lacks the subtle background noises of real-world recordings. In complex scenes, objects may appear inconsistently placed or lit.

Implausible Scenes

Veo 3 can create scenarios that defy logic or physics. Watch for improbable events, flawless characters, or backgrounds with repeating patterns in unusual contexts.

Watermarks and Digital Signatures

Google embeds both visible and invisible watermarks in every Veo 3 video. Find small logos or “AI-generated” text within the frame. More importantly, Veo 3 uses SynthID, an invisible digital signature detectable only with specialised tools. While the public cannot access SynthID scanners, some organisations and fact-checkers can verify a video’s origin this way.

Metadata and Context

If you can access the original video file, check its metadata for references to Veo, Gemini, or Google AI. However, understand that users can strip or alter this information. Consider the video’s context as well: Veo 3 requires a paid subscription, so the sudden appearance of many short, high-quality, topical videos may suggest AI involvement. Be careful if a realistic video appears almost instantly after a news event.

Conclusion

Traditional detection methods become less reliable when fact-checking videos created with artificial intelligence. Apart from using forensic tools, public education, and critical thinking help protect viewers from AI-generated misinformation risks.