Gemini Flash 3.5 Review: I Built 5 Apps in One Go and Here's What Happened

Google I/O 2026 dropped a lot of announcements, but two things grabbed my attention immediately — Antigravity 2.0 and Gemini Flash 3.5. I spent time testing both, and the results genuinely surprised me. In this article, I'm going to walk you through everything: what Antigravity 2.0 is, why Gemini Flash 3.5 scored higher than Opus 4.7 on Terminal Bench, and exactly what happened when I built five real apps using this model in a single session with zero iteration.

Spoiler: four of the five apps earned a perfect score from me. The fifth got a three out of five — and that honest rating tells you more about the model than any benchmark does.

Key Takeaways

Gemini Flash 3.5 scored higher than Claude Opus 4.7 on Terminal Bench, which makes it one of the most capable lightweight coding models available right now.

Antigravity 2.0 is Google's answer to Claude Code — a desktop app for AI-assisted development with goal mode, skill management, and folder-level file access.

I built five apps in one go with zero iteration: a frontend landing page, a Python collage maker, an HTML airplane shooter game, a full SaaS link-in-bio builder, and a macOS Pomodoro timer.

Four apps scored 5/5. One scored 3/5 because the model produced SVG instead of the 3D objects I specified in my prompt.

Antigravity 2.0 has two major problems — quota limits and rate limiting — that will interrupt your workflow mid-build if you push the model hard.

promptslove.com hosts 82 downloadable skills compatible with Antigravity 2.0 and Codex, including a detailed frontend UI skill I used for the landing page.

Gemini Flash 3.5 is a strong choice for solo developers who want to build production-ready MVPs fast without paying for a frontier model on every task.

What Is Antigravity 2.0?

Antigravity 2.0 is Google's Claude Code-style desktop application. If you've used Claude Code or OpenAI Codex, the interface will feel familiar. You select a working folder, attach media files, run slash commands, and give the model high-level instructions. The app handles the actual code generation, file creation, and iteration.

What sets Antigravity 2.0 apart from the command line tools I've used before is the goal mode. You set a goal — for example, "build a fully working SaaS app with user profiles and analytics" — and the model runs until that goal is finished. It doesn't stop to ask for approval on every step. If you don't want interruptions while the app is being built, you enable goal mode and come back when it's done.

There's also a scheduler built in, which opens up some interesting automation possibilities I haven't fully explored yet.

Skills: The Killer Feature

The skills system in Antigravity 2.0 is the feature I'm most excited about. You install skills into a dedicated skills folder, and the model can call them during any session. Think of skills as reusable instruction modules — you define a behavior once, and the model applies it every time you need it.

If you want custom skills, head to promptslove.com and signup an account. We have 82 skills available right now, all downloadable and compatible with both Antigravity 2.0 and Codex.

You can configure them to match your stack, your design system, or your preferred code style. When I sign into promptslove.com, the full skillset installs automatically on first login — no manual setup required.

I used their frontend UI skill for App 1, and it made a real difference. More on that in a moment.

The Two Problems I Can't Ignore

I have to be honest here: I ran into two serious issues with Antigravity 2.0, and I don't know when they'll be fixed.

The first problem is quota exhaustion. I built three apps and my quota was gone. The app displays a clear warning when your credits run out, but the quota depletes faster than I expected — especially when you're running goal mode on larger builds.

The second problem is rate limiting. In the middle of building an app, the session crashes. The model stops generating. I had to manually tell it to continue, and even then it took several attempts before it picked back up cleanly. This is disruptive when you're trying to ship something fast.

Google hasn't resolved the rate limiting issue even for paid plans with active quota remaining. That's the part I find frustrating. I understand quota limits — those are billing constraints. But rate limiting while your quota is still active means the tool interrupts you for no billing reason. That's a product problem that needs fixing.

Despite these issues, the interface itself is strong. You can review all generated files in the app, see a complete overview of your project structure, rename assets, and manage folders directly from the UI. It feels like a Gemini-powered CLI in a clean desktop wrapper.

Gemini Flash 3.5: The Numbers That Shocked Me

Before I get into the apps, let me talk about the model itself.

Flash models have historically been fast but shallow. They handle simple tasks well and fall apart on anything complex. I expected Gemini Flash 3.5 to behave the same way. It didn't.

The benchmark that got my attention was Terminal Bench — a coding-focused evaluation that tests models on real terminal tasks, system-level code, and multi-step programming challenges. Gemini Flash 3.5 scored higher than Claude Opus 4.7 on this benchmark.

That result is striking. Opus 4.7 is a frontier model. Gemini Flash 3.5 is a lightweight, speed-optimized model. The fact that Flash 3.5 outperforms Opus 4.7 on a coding benchmark — not a general intelligence benchmark, but specifically a coding benchmark — tells you that Google has done something genuinely different with this architecture.

The output speed is also exceptional. Flash 3.5 generates code noticeably faster than heavier models, which matters when you're in goal mode and the agent is working through a hundred-step build.

Google also added a feature called Spark inside the Gemini App. I'll cover Spark in a separate deep dive, but it ties directly into Flash 3.5's real-time generation capabilities.

The 5 Apps I Built in One Go

Here's the setup. I used Antigravity 2.0 with Gemini Flash 3.5 selected as the model. For each app, I wrote one detailed prompt. No follow-up prompts. No corrections. One shot, and whatever came out is what I evaluated.

App 1: Animated Frontend Landing Page — 5/5

For this build, I wrote a prompt that was roughly 200 to 300 lines long. I didn't keep it short and vague. I specified every major design decision upfront:

Full color system with hex codes and semantic token names

Typography scale with font families, weights, and line heights

Spacing system based on a consistent grid

Animation library preferences and specific interaction patterns

Component list: hero, features, pricing, blog cards, timeline, accordion, contact section

I also specified that the hero should include parallax scroll effects and animated 3D cards. Both were delivered exactly as described.

The output included a working light mode and dark mode toggle. When I switched to dark, every section maintained the right contrast ratios and visual hierarchy. The pricing cards had 3D hover effects. The timeline section animated as elements entered the viewport. The blog cards had clean hover states. Every interaction worked on first load.

The one thing that didn't fully land was a 3D model element in the hero. The model generated a CSS-based approximation instead of a WebGL-rendered 3D object. But given the rest of the output, I still gave it a 5/5. This was one go. No iteration. The quality is significantly better than what I got from previous Flash versions.

The frontend UI skill from promptslove.com is available to download if you want the exact prompt structure I use for builds like this. It defines the full design system in a format the model understands clearly.

App 2: Python Desktop Collage Maker — 5/5

This was my test for desktop application development. I wanted to see if Flash 3.5 could build a real, installable Python app — not a browser demo, not a script, but a GUI application you run on your machine.

The collage maker came out fully functional. Here's what worked out of the box:

Drag-and-drop image import from any folder on my machine

Border radius control for each image or globally

Custom background color picker with live preview

Style presets (warm, cool, minimal, high contrast)

Canvas size selection — I tested 4:3 and it applied correctly

Export functionality that saved the finished collage as a PNG

I added my own real images during the demo. The app handled them without any errors. I adjusted the background color mid-session and the preview updated immediately. When I clicked export, the file was there.

Five out of five. For a Python desktop GUI app built in one prompt with no iteration, this is a strong result.

App 3: Neon Brush HTML Airplane Shooter Game — 3/5

This is the honest review.

My prompt specified that the game should include real 3D objects — 3D airplane models, 3D enemy designs, depth and perspective in the visual space. The model built the game, and the game works. But it built it using SVG graphics instead of 3D objects.

When I launched it, I hit space and waves of enemies came at me with sound effects playing through the browser. The wave system worked. Level progression worked. The main menu worked. The sound design was solid. I could return to the main menu and restart cleanly.

But the visuals were flat SVGs. Not 3D. My prompt was detailed. The instruction was clear. The model found a simpler path.

I gave it three out of five. The game is playable and the mechanics are sound. But when my prompt specifies 3D objects and the output uses 2D SVGs, that's a meaningful gap between the instruction and the result. I'm rating the model's ability to follow my spec, not the game's playability in isolation.

This is useful data. Flash 3.5 handles 3D CSS tricks well, as we saw in the landing page. But when the prompt requires WebGL or Three.js-level 3D rendering inside an HTML game, the model defaults to a simpler approach. Keep that in mind when you're writing prompts for game builds.

App 4: BEACON — SaaS Link-in-Bio Builder — 5/5

This was the most ambitious build of the five, and it's the one that made me sit back and say "mind blowing."

BEACON is a full link-in-bio and personal page builder — a direct competitor to Linktree in concept. I gave it one prompt and it generated a complete SaaS application with:

A landing page with sample profiles and clean URLs

A login and signup flow (I logged in as a demo user, Alex Chen, and everything worked)

A profile editor where you can rename your page, add a bio, and upload a photo

Link management — I added a link with title, URL, and icon in under 30 seconds

Multiple page themes with live preview

A dashboard showing analytics: total views, link clicks, and click-through rates

Real-time tracking — I navigated to the live page, the view count incremented from 48 to 49. I clicked a link and the click tracker registered it immediately.

Zero errors. Zero broken flows. Every button worked. Every state updated correctly.

This is the result that defines what Flash 3.5 can do. A production-quality SaaS MVP, built in one go, with working analytics, user profiles, link management, and real-time tracking. Five out of five.

App 5: Tempo — macOS Pomodoro Timer in Electron — 5/5

For the final app, I wanted a real macOS application. Not a web app you run in the browser. An installable Electron app you launch from your Applications folder.

I specified:

Build in Electron for macOS

Global keyboard shortcut: Cmd + Shift + T to open and minimize

Standard Pomodoro settings: 25-minute focus sessions, 5-minute short breaks, 15-minute long breaks

Autostart breaks after each session

Sound effects for session transitions

Daily session goal tracking

Charts and visualizations for focus time over the day

Session history with work type labels

Antigravity 2.0 built it. The installable file was in the distribution folder. I installed it, went to my Applications, and found Tempo waiting.

I hit Cmd + Shift + T. A compact timer appeared on screen — minimized, non-intrusive, with the remaining time displayed. I set a task label: "building Mac app." I hit start. The timer began, a ticking sound played softly, and I could see exactly how much time was left.

In the settings panel, everything I specified was configurable — focus duration, break lengths, autostart, sound effects, and daily session goal. The statistics view showed a chart of daily focus time, session types, and a list of completed sessions with timestamps.

I can iterate on this and add features to get it closer to the famous Sessions app on Mac. But as a first-pass build from one prompt? Five out of five.

How to Write Prompts That Get Results Like This

The difference between App 3's 3/5 and App 4's 5/5 is mostly in how I wrote the prompts.

For the landing page and the BEACON SaaS app, my prompts were specific. I named the components I wanted. I described the data flow. I specified the visual behavior. I gave the model enough context that it had one clear path forward.

For the game, I described the visual style I wanted (3D objects) without specifying the implementation approach (WebGL, Three.js, CSS 3D transforms). The model picked the safest path: SVGs.

My recommendation: when you're building with Flash 3.5, define both the what and the how. Don't just say "3D game" — say "HTML game using Three.js with perspective camera and 3D mesh objects for the player and enemies." The more specific the implementation path, the less room for the model to simplify.

For frontend work, I use the frontend design system skill from promptslove.com. It gives the model a complete design language to work from — colors, spacing, typography, animations — and the output consistency is noticeably better.

Gemini Flash 3.5 vs. Heavier Models: When Should You Use It?

Flash 3.5 is the right tool when:

You're building an MVP and want to move fast

The task is clearly defined and the scope is bounded

You're doing most of the architectural thinking yourself

Speed matters more than deep reasoning

Flash 3.5 is the wrong tool when:

You need the model to architect a complex system from scratch with ambiguous requirements

The task requires deep reasoning across multiple interdependent files

You need precise 3D rendering or advanced graphics in a game build

The codebase is large enough that context management becomes critical

For the builds I ran, Flash 3.5 delivered frontier-level results on four out of five apps. That's a strong hit rate. For developers building prototypes, personal tools, or early-stage SaaS products, this model removes the argument for reaching straight for Opus or GPT-4 on every task.

Frequently Asked Questions (FAQs)

What is Gemini Flash 3.5 and how does it compare to other models?

Gemini Flash 3.5 is Google's lightweight, high-speed coding model released at Google I/O 2026. It scored higher than Claude Opus 4.7 on Terminal Bench, a coding-focused benchmark. For development tasks, it delivers results close to frontier models at faster output speeds.

What is Antigravity 2.0?

Antigravity 2.0 is Google's desktop AI coding assistant, similar to Claude Code or OpenAI Codex. It lets you select project folders, install custom skills, use slash commands, and run goal mode where the model works autonomously until a defined task is complete. Skills from promptslove.com are compatible with it.

Does Antigravity 2.0 have rate limiting issues?

Yes. My testing confirmed two major issues: quota depletion on heavy builds, and rate limiting that interrupts sessions mid-build even when quota remains. The model crashes and requires a manual "continue" prompt before resuming. Google hasn't resolved this as of Google I/O 2026.

Can Gemini Flash 3.5 build a real SaaS app in one prompt?

Based on my testing, yes. I built BEACON — a full link-in-bio SaaS builder with user authentication, profile editing, link management, themes, and real-time analytics — in a single prompt with zero iterations. It worked with zero errors on first run.

Where can I get skills for Antigravity 2.0?

Visit promptslove.com and sign up for an account. They currently host 82 skills compatible with Antigravity 2.0 and Codex. The full skillset installs automatically when you first log in, and individual skills are downloadable and customizable.

What type of apps does Gemini Flash 3.5 struggle with?

My testing showed Flash 3.5 defaults to simpler rendering approaches when prompts specify advanced graphics without naming the implementation technology. When I asked for 3D objects in a game without specifying Three.js or WebGL, it produced SVG graphics instead. For 3D game development, you need to name the exact library in your prompt.

Final Thoughts

Google I/O 2026 gave us two tools worth paying attention to. Gemini Flash 3.5 is not a toy. It's a capable coding model that outperformed a frontier model on a coding-specific benchmark and built four production-quality applications in a single session with no iteration.

Antigravity 2.0 has real potential — goal mode and the skills system are genuinely useful — but the rate limiting and quota issues will frustrate you unless Google ships a fix. I'd use it for scoped builds rather than long, multi-app sessions until those problems are addressed.

If you're a developer looking to test Flash 3.5 today, start with a well-defined project. Write a detailed prompt. Be specific about libraries and implementation. Then let the model run. The results will surprise you.