google/veo-3.1

New and improved version of Veo 3, with higher-fidelity video, context-aware audio, reference image and last frame support

Input

*string
Shift + Return to add a new line

Text prompt for video generation

string

Video aspect ratio

Default: "16:9"

string

Video duration in seconds

Default: "8"

file

Add file

Input image to start generating from. Ideal images are 16:9 or 9:16 and 1280x720 or 720x1280, depending on the aspect ratio you choose.

file

Add file

Ending image for interpolation. When provided with an input image, creates a transition between the two images.

file[]

Add multiple files

Preview
Preview

1 to 3 reference images for subject-consistent generation (reference-to-video, or R2V). Reference images only work with 16:9 aspect ratio and 8-second duration. Last frame is ignored if reference images are provided.

Default: []

string
Shift + Return to add a new line

Description of what to exclude from the generated video

string

Resolution of the generated video

Default: "1080p"

Generate audio with the video

Default: true

number

Random seed. Omit for random generations

Output

Generated in101.8 seconds

README

Google Veo 3.1

Veo 3.1 is Google's flagship video generation model, delivering the highest-fidelity video output with context-aware audio generation, reference image support for subject consistency, and frame interpolation. Built on Google DeepMind's latest research, it produces cinematic-quality videos with precise motion control and natural sound design.

☀️ Why it stands out

  • Context-aware audio generation Automatically generates synchronized audio that matches the visual content — ambient sounds, dialogue, music, and sound effects.
  • Reference image support (R2V) Provide 1-3 reference images to maintain subject consistency across the generated video. Ideal for brand assets, character-driven content, and product videos.
  • Frame interpolation Supply a start image and an end image (last frame) to create smooth transitions between two scenes.
  • Image-to-video Start from any image to animate it into a video while preserving the original composition and style.
  • Up to 1080p resolution Generate videos in 720p or 1080p with 16:9 or 9:16 aspect ratios.
  • Flexible duration Choose 4, 6, or 8 second clips depending on your needs.
  • Negative prompts Explicitly exclude unwanted elements from the generation for more precise control.

⚙️ How to use

  • Input: text prompt, optionally with start image, last frame, and/or reference images
  • Output: MP4 video with optional synchronized audio
  • Duration: 4, 6, or 8 seconds
  • Resolution: 720p or 1080p
  • Aspect ratios: 16:9 (landscape) or 9:16 (portrait)
  • Generation modes:
    • Text-to-video — prompt only
    • Image-to-video — prompt + start image
    • Frame interpolation — prompt + start image + last frame
    • Reference-to-video (R2V) — prompt + 1-3 reference images (16:9, 8s only)

🔥 Pricing

ConfigRouteAnyFal.aiReplicate
8s 720p no audio$0.40$0.80$0.80
8s 720p with audio$0.80$1.20$1.20
8s 1080p no audio$0.60$1.20$1.20
8s 1080p with audio$1.20$1.80$1.80

Shorter durations (4s, 6s) are proportionally cheaper.

💡 Best Use Cases

  • Short-form social content — Generate TikTok, Reels, and Shorts with synchronized audio in 9:16 format.
  • Product demos & ads — Use reference images to keep brand consistency across video clips.
  • Cinematic B-roll — Create atmospheric establishing shots with natural sound design.
  • Storyboarding & previz — Quickly visualize scenes before committing to production.
  • Scene transitions — Use frame interpolation to create smooth transitions between two keyframes.

📝 Notes

  • Reference images (R2V) only work with 16:9 aspect ratio and 8-second duration. Last frame is ignored when reference images are provided.
  • Ideal input images are 1280x720 (16:9) or 720x1280 (9:16).
  • Audio generation adds processing time but produces context-aware sound that matches the visual content.
  • Please ensure your prompts comply with Google's Safety Guidelines.

🌐 Where Veo 3.1 Fits In

FeatureVeo 3.1Veo 3.1 Fast
QualityHighest fidelityHigh (optimized for speed)
AudioContext-awareContext-aware
Reference ImagesUp to 3No
Frame InterpolationYesNo
Generation Time~100 sec~40-60 sec
Best ForFinal production, cinematic qualityRapid iteration, drafts

Need faster turnaround for iterating on ideas? Try Veo 3.1 Fast.

Related Models