Veo 2

by Google

Veo 2 is a text-to-video and image-to-video model that generates high-quality eight-second video clips with extensive camera controls and cinematic understanding. It interprets filmmaking language, letting users specify lens types, depth-of-field effects, genres, and camera movements like zoom, pan, and dolly. The model shows improved understanding of real-world physics and human movement, reducing hallucinations like extra fingers or unexpected objects common in earlier video generators. It handles both straightforward and complex instructions, capturing diverse visual and cinematic styles with temporal consistency. Veo 2 supports text-to-video, turning detailed descriptions into dynamic scenes, and image-to-video, animating static images with optional text guidance for style and motion. Outputs reach up to 4K resolution, include person generation controls, and carry invisible SynthID watermarks for transparency.