
Google has launched Veo 3 (its most advanced video-generation AI yet) and for the first time, it can also create synced sound effects, ambient noise, and even dialogue to accompany the visuals.
From Silent Clips to Fully-Sounded Scenes
Announced at Google I/O 2025, the company’s annual developer conference, Veo 3 marks a significant leap in AI video generation by breaking the sound barrier. Unlike earlier models that produced silent clips requiring manual audio dubbing, Veo 3 natively generates both video and sound in response to user prompts. That includes environmental ambience, footsteps, character dialogue, and background music, all tightly synced with the generated visuals.
“For the first time, we’re emerging from the silent era of video generation,” said Demis Hassabis, CEO of Google DeepMind. “You can give Veo 3 a prompt describing characters and an environment, and suggest dialogue with a description of how you want it to sound.”
This appears to mark a clear departure from the static video outputs of Veo 2, which could render realistic 1080p clips but had no inbuilt audio functionality. Veo 3’s ability to generate both media types simultaneously is underpinned by multimodal training, allowing it to understand and translate visual scenes into contextually accurate sound.
Who Can Use Veo 3, And Where?
Veo 3 is now available through Google’s Gemini app for users subscribed to the AI Ultra plan, priced at $249.99 per month. As of now (early July), it’s rolling out across all countries where Gemini is active, including the UK and India. Users can access it via desktop or mobile and prompt the system using text, images, or a combination of both.
Up to 8 Seconds of Video With Audio
At launch, Veo 3 can generate up to 8 seconds of video with audio. For example, users can describe entire scenes, suggest character speech with tonal guidance (e.g. “a soft, nervous voice”), or request specific environmental sounds like birdsong, waves, or city traffic. Google says it plans to extend clip length and creative controls over time.
What’s New and Different?
The most notable change in Veo 3 (from 2) is its seamless integration of audio with video, something no other major model currently achieves at this level of fidelity and control. While earlier experiments with audio-generating AI exist, such as Meta’s AudioCraft or Google’s own SoundStorm, these tools typically treat sound and visuals as separate processes.
Veo 3, however, is built to generate both in parallel. It can understand raw video pixels and adjust audio timing accordingly, such as syncing a character’s footsteps with the terrain they walk on, or matching mouth movements to speech.
It also boasts significant improvements in visual realism. Google says Veo 3 now supports 4K resolution, more accurate physics, and refined prompt adherence. This means it’s better at understanding and sticking to the details users provide, even over multi-shot sequences involving actions and camera movements like pans or zooms.
Creators and Businesses
For video creators, advertisers, educators, and independent filmmakers, Veo 3 could remove one of the biggest barriers in AI content generation, namely having to source or manually create matching audio. With sound now generated natively, users can produce short-form content much faster, with minimal editing or post-production work.
For example, a marketing team could prompt Veo 3 to produce a product demo with a voiceover, or a teacher might generate an animated science explanation complete with relevant sound effects and narration.
Move to “Generative Cinema”
Google sees this as part of a broader shift toward “generative cinema,” where AI can help prototype, storyboard or even produce short-form entertainment. However, its reach could extend to gaming, AR/VR environments, and accessibility use cases such as auto-generating descriptive audio.
Google’s Position in a Crowded Field
Veo 3 arrives in an increasingly competitive video-generation space. For example, over the past year, tools like Runway Gen-3 Alpha, Pika Labs, Luma Dream Machine, and Alibaba’s EMO model have raised the bar for visual quality and scene consistency. However, very few models currently offer audio, and none do so at Veo 3’s level of native integration.
OpenAI’s Sora, which impressed with its photorealistic clips earlier this year, still outputs silent videos. While Runway allows users to add music and basic sound effects, this remains a separate, manually applied process. That gives Veo 3 a unique value proposition, at least for now.
Still, Google’s dominance is not guaranteed. As of now (July 2025), Veo 3’s capabilities are only available to high-paying subscribers through Gemini and haven’t yet been integrated into tools like YouTube Shorts, Google Ads, or enterprise APIs, though the company has confirmed that Veo 2 features are heading to the Vertex AI API in the coming weeks.
How Veo 3 Works
Though Google has not published technical papers on Veo 3, it builds on DeepMind’s earlier work in video-to-audio AI. In 2024, DeepMind revealed it was training models using paired video clips, ambient audio, and transcripts to learn audio-visual correlations. That foundational research likely informed Veo 3’s ability to match visual motion with appropriate audio output.
The model was almost certainly trained on large-scale datasets including YouTube material, though Google has not confirmed this publicly. DeepMind has said only that its models “may” use some YouTube content, raising questions about copyright and consent.
To address misuse risks, Veo 3 uses SynthID, Google’s proprietary watermarking system, which embeds invisible markers into every generated frame. It also includes visible watermarks for user-generated content and is subject to policy enforcement for unsafe or misleading material.
Criticism
Despite the impressive technology, it seems that Veo 3 has drawn scrutiny from some corners of the creative industry. For example, a 2024 study commissioned by the Animation Guild projected that AI tools like Veo could disrupt over 100,000 creative jobs in the US by 2026. Voice actors, sound designers, editors, and animators are among the roles most at risk.
Many artists also remain concerned about the lack of clarity around training data. Without formal consent or opt-out tools for creators on platforms like YouTube, Veo’s capabilities could be seen as drawing from (and replacing) the work of the very communities that power it.
Google says it is committed to responsible AI use and continues to test Veo with red-teaming exercises to identify abuse cases. It also relies on user feedback tools and policy enforcement to detect violations, though details on enforcement mechanisms remain limited.
That said, Veo 3’s creative potential is undeniable, and for businesses, creators, and Google’s own AI ambitions, it appears to mark a significant step forward in the race to multimodal dominance.
What Does This Mean For Your Business?
The arrival of Veo 3 appears to place Google at a clear technological advantage, at least temporarily, by addressing one of the most limiting aspects of AI video creation so far (i.e. the lack of audio). By combining video and sound generation into a single, prompt-driven process, it gives users far more flexibility and reduces the need for specialist editing tools or additional production stages. This will likely appeal to a wide range of professionals, from marketing teams to educators and indie content creators who want fast, realistic results without high production overheads.
For UK businesses in particular, the ability to generate short, full-sound videos in seconds could transform workflows across advertising, training, communications, and social media. SME marketing teams with limited budgets could produce explainers or campaign content in-house, while creative agencies may be able to build new service models around generative assets. However, the high monthly cost of access via Gemini’s AI Ultra plan may still limit uptake to larger firms or early adopters in creative sectors for now.
Competitively, Veo 3 puts pressure on OpenAI, Meta, and other major players who are still struggling to synchronise visuals and sound in a meaningful way. However, it also raises expectations. The moment Google delivers this feature set, users and clients may begin to assume it as standard. And as competitors catch up or release open-access alternatives, Google may need to expand Veo’s availability beyond Gemini and into more accessible developer platforms like Vertex AI or YouTube integrations.
The ethical questions are not going away either. Artists and voice professionals continue to challenge the use of training data that may have been scraped without consent. Even with SynthID watermarking, the risk of misuse or deepfake production remains a concern for regulators and rights-holders. Unless Google can offer greater transparency and clearer opt-out mechanisms, it may face mounting legal and reputational risks as adoption grows.
For now, though, Veo 3 appears to set a new benchmark in what multimodal AI tools can achieve. Whether it remains a premium creative niche or signals a broader shift in how visual content is produced will depend on how Google chooses to scale and integrate its technology in the months ahead.