New — Google I/O 2026

Google Omni Prompt Generator

Generate optimized prompts for Google's new Gemini Omni Flash — the "any-to-any" world model that turns text, images, audio, and video into high-resolution video with synchronized native audio. Multi-modal references, conversational editing, world-knowledge grounding, and the official Omni prompting framework — all baked in.

Describe the main visual scene — subjects, environment, mood, and key visual details

Generated Prompt

Fill in the form and click "Generate" to create an optimized Gemini Omni Flash video prompt.

Tip: Describe the motion and temporal progression of your scene. Think in terms of "what happens over time" rather than a static description.

Gemini Omni Flash Tips

  • Less is more — Omni reasons across modalities, so describe intent and a few key visual facts, not every detail.
  • Always include an explicit camera cue ("one continuous shot," "push in," "locked off") — outputs feel flat without one.
  • For edits, speak surgically: "Change the butterfly to a bee. Keep everything else identical." Repeat the preserve list.
  • Audio is native — always state audio intent, even if it's "natural ambient sound only, no music, no voiceover."
  • Omni Flash caps at ~10 seconds. Match aspect ratio to intent: 9:16 for portraits, 16:9 for cinematic, 1:1 for centered.
  • Lean on world knowledge for explainers (quantum computing, anatomy, physics) — that's Omni's edge over Veo.

What is Gemini Omni?

Gemini Omni is Google DeepMind's new "any-to-any" foundation model family, announced at Google I/O on May 19, 2026. It collapses the previously fragmented multimodal stack — text-to-image, image-to-video, video-to-video, audio generation — into a single foundation model with a single editing surface. Google positions it as a world model, combining an intuitive understanding of physics (gravity, kinetic energy, fluid dynamics) with Gemini's knowledge of history, science, and cultural context.

The first model in the family — Gemini Omni Flash — is live now in the Gemini app, Google Flow, Google Flow Music, YouTube Shorts, and the YouTube Create app. Developer access via the Gemini API and Vertex AI Agent Platform API rolls out in the coming weeks. A more professional Gemini Omni Pro variant is planned for later, when Google feels it represents a step change above Flash.

Core Capabilities

Any-to-Any Input

Mix text, images, audio, and video references in one prompt — Omni reasons across all of them.

Conversational Multi-Turn Edits

Each instruction builds on the previous, preserving character identity, lighting, and scene geometry.

World-Knowledge Grounding

Generates accurate explainers for technical topics — physics, biology, history, chemistry, architecture.

Physics Reasoning

Improved understanding of gravity, kinetic energy, and fluid dynamics for realistic motion.

Native Synchronized Audio

Ambient sound, SFX, dialogue, and music generated in the same forward pass as the video.

Style Transfer

Apply claymation, anime, watercolour, risograph, or chalk-on-blackboard while preserving motion.

SynthID + C2PA Watermark

Every output carries an imperceptible pixel + audio watermark and C2PA cryptographic provenance.

Avatar Mode

Create videos featuring a digital version of yourself with your own voice for personal branding.

The Official Omni Prompting Framework

Based on Google DeepMind's official Omni prompting guide. Omni rewards less prescriptive, more conversational prompts — it reasons across modalities to fill in the details.

1

Six foundational dimensions

Every Omni prompt should cover: shot framing & motion, style, lighting, location, action, and (when relevant) text rendering. Cover them with a few well-chosen sentences — don't over-specify.

2

Camera cue is non-optional

Omni outputs feel flat without an explicit camera cue. Use the official vocabulary: "one continuous shot" or "oner" for unbroken takes, "static / locked off / fixed" for stillness, "push in / punch in / dolly zoom" for movement, "orbit," "tilt up," or "handheld" for energy.

3

Six prompt patterns

Trigger-Action ("when X, do Y"), Multi-Turn Refinement (sequential edits that preserve identity), Reference Stacking (bracketed inputs with assigned roles), Knowledge-Grounded (topic + medium + accuracy), Text-Synced Action (word-by-word with pacing), and Sketch-to-Video (drawing as motion guide).

4

Conversational editing discipline

Speak in surgical edits: "Change the ships to be made from white origami paper." Always restate the preserve list — "keep the character, lighting, and geometry identical" — to prevent drift across turns.

5

World knowledge is the moat

For explainers (physics, biology, history, abstract concepts) lean on Gemini's reasoning. Name the concrete visual metaphor — "stop-motion paper-craft of a classical bit flipping between 0 and 1, then a paper coin spinning to represent a qubit" — and the model fills in accurate details.

6

Audio is first-class

Omni generates synchronized native audio in the same pass. Always state audio intent — ambient sound, SFX, music mood, or "no music, no voiceover, natural ambience only." Dialogue goes in quotation marks; only use the Avatar feature for the user's own voice.

Example Omni Prompts

Three official-style prompts that showcase the six structural patterns.

Cinematic single-shot (10s, 16:9)
“A 10-second 16:9 cinematic video in one continuous shot. A young product designer sits at a small desk beside a rainy window, opens a sketchbook, and a compact silver drone design rises from the page as a realistic hologram. The camera starts as a close-up on the pencil tip, slowly pulls back to a medium shot, then gently orbits left as the hologram rotates above the page. Warm desk lamp light, cool blue rain outside, shallow depth of field, realistic hand motion, no subtitles, no logos, natural room ambience only.”
Knowledge-grounded explainer (10s)
“A 10-second educational explainer about the difference between classical computing and quantum computing. Tactile stop-motion paper-craft style on a dark tabletop. Show a single classical bit as a small paper switch flipping between 0 and 1, then show a qubit as a glowing paper coin spinning with both states implied before measurement. Clear visual metaphors, accurate motion, soft overhead light, no human hands, no voiceover, no on-screen text except the labels ‘bit’ and ‘qubit’ placed beside the objects.”
Text-synced social video (9s, 16:9)
“A 9-second horizontal 16:9 social video for an AI video creation tip. A clean black studio background with a floating glass timeline interface stretched across the frame. Each word appears one at a time in perfect rhythm with soft electronic clicks: ‘prompt’, ‘reference’, ‘motion’, ‘lighting’, ‘sound’. Each word has a different tasteful animation style, but the timeline and camera stay stable. End with all five words arranged as a neat widescreen checklist. High contrast, crisp typography, no brand names.”

Frequently Asked Questions

What is Google Gemini Omni?

Gemini Omni is Google DeepMind's "any-to-any" world model, announced at Google I/O on May 19, 2026. The first model in the family — Gemini Omni Flash — accepts text, images, audio, and video freely mixed in a single prompt and outputs high-resolution video with synchronized native audio. It combines Gemini's reasoning and world knowledge (physics, history, science, culture) with conversational multi-turn video editing.

How is Gemini Omni different from Veo 3.1 or Sora 2?

Omni is "any-to-any" — you can mix images, audio, video, and text in one prompt and the model reasons across all of them. Its core differentiators are (1) conversational multi-turn editing that preserves character identity, geometry, and lighting across edits, and (2) world-knowledge grounding for accurate explainers (physics, biology, history). Veo leads on cinematic long-form polish; Sora leads on raw physics simulation; Omni leads on reasoning + editing.

What does the Reference Mode option do?

It tells the generator how to structure your prompt around your inputs. Text-Only writes a full scene brief. Image / Audio / Video Reference cites your input and describes what should change or stay. Multi-Modal Mix assigns explicit roles to each reference ("image defines character, audio defines rhythm"). Sketch-to-Video uses your drawing as a motion guide without showing it. Conversational Edit produces surgical edit instructions with a preserve list.

When should I turn on World-Knowledge Grounding?

Turn it ON for educational, scientific, historical, or technical explainers where accuracy matters — protein folding, quantum computing, anatomy, fluid dynamics, architecture. The generator will frame the prompt to lean on Gemini's reasoning and add accuracy directives ("accurate motion," "physics-grounded"). Keep it OFF for purely artistic or stylistic scenes.

How long can Gemini Omni Flash videos be?

Currently capped at around 10 seconds per clip — described by Google as a deployment decision rather than a model limit. The aspect ratios available in this generator (16:9, 9:16, 1:1) match the formats supported across the Gemini app, Google Flow, YouTube Shorts, and YouTube Create.

Does Gemini Omni generate audio?

Yes — synchronized native audio is generated in the same forward pass as the video. You can describe ambient sound, sound effects, dialogue (in quotation marks), or music mood. Speech manipulation of third parties is restricted; only the Avatar feature (user's own voice) or generic narrator voiceovers are supported.

Where can I use the generated prompts?

Paste them directly into the Gemini app (Google AI Plus, Pro, or Ultra), Google Flow, Google Flow Music, YouTube Shorts, the YouTube Create app, or — when the API rolls out — the Gemini API and Vertex AI Agent Platform API.

Is this tool free?

Yes! You get 3 free Gemini Omni prompt generations per day. For unlimited generations across all 28+ AI tools, sign up for a Promptslove membership.

Want Unlimited Omni Prompt Generation?

Get unlimited access to the Google Omni Prompt Generator, all 28+ AI tools, 20,000+ premium prompts, courses, and resources.