MiniMax M3 Review: I Built 5 Apps in One Go and Here Is What Happened

Ramanpal Singh

June 2, 2026 • 71 min read

AI Tips

Listen to this article

MiniMax M3 Review: I Built 5 Apps in One Go and Here Is What Happened

0:0030:58

onyx

MiniMax broke the internet on June 1, 2026, when they dropped M3. I watched the benchmarks come in and immediately pulled up OpenCode to start testing. Within the same session, I built five completely different apps using M3 as the sole model. No switching between providers. No iteration after the first prompt. Just one shot per app and whatever came out is what you are reading about right now.

The results genuinely surprised me. Three apps earned a perfect score. One scored four out of five. One scored three out of five. And the pattern I found across all five builds tells me more about M3's real-world capability than any benchmark table can.

I want to be clear upfront: M3 is available for free on OpenCode right now. If you use OpenCode, the model is included at no cost for the first seven days. That is the setup I used for everything in this article. Go check it out and start building while the free access is live.

Here is the full breakdown.

Key Takeaways

MiniMax M3 launched June 1, 2026, and it is the first open-weight model combining frontier-level coding, a 1-million-token context window, and native multimodality in a single package.

The MSA (MiniMax Sparse Attention) architecture cuts per-token compute at 1M context to one-twentieth of the prior generation. Prefill is 9.7x faster. Decoding is 15.6x faster.

M3 scores 59.0% on SWE-Bench Pro, beating both GPT-5.5 and Gemini 3.1 Pro. It approaches Claude Opus 4.7 on that benchmark.

I tested M3 on five different app types in one session using OpenCode. The stack is OpenCode plus M3, and the model is currently free to use through OpenCode.

Three out of five apps scored a perfect five out of five on the first prompt with zero iteration.

The pricing argument for M3 is strong: $0.30 per million input tokens against Claude Opus 4.7's $5 per million. That is a 15x difference on input cost alone.

After testing DeepSeek's GLM models extensively, M3 is now my favorite open-weight model at this capability level.

What Is MiniMax M3? The Specs That Actually Matter

MiniMax is a Shanghai-based AI lab founded in 2021. They listed on the Hong Kong Stock Exchange on January 9, 2026, making them one of the few AI labs in this class that is publicly traded. Outside China, they are best known for their Hailuo video model and the M-series of language models.

M3 is their newest flagship, and it represents a real architectural shift from the M2 line.

Here is the full technical picture:

Specification	Detail
Release date	June 1, 2026
Context window	Up to 1 million tokens (guaranteed minimum 512K)
Max output tokens	512,000
Architecture	MiniMax Sparse Attention (MSA)
Modalities	Text, image, video input; desktop computer operation
Output speed	Approximately 100 tokens per second
Open-weight	Yes (weights and technical report due within 10 days of launch)
API input pricing	$0.30 per million tokens (standard tier)
API output pricing	$1.20 per million tokens

The three capabilities MiniMax identifies as defining features for M3 are frontier-level coding, the 1M-token context window, and native multimodality. Their claim is that M3 is the first and only open-weight model to bring all three together at the same time. Based on everything I have seen from the launch, that claim holds up.

The model also supports desktop computer operation natively, which is what MiniMax Code uses when you ask it to interact with applications on your machine.

The MSA Architecture Change That Makes 1M Context Affordable

The most technically important thing about M3 is not the benchmark scores. It is what MiniMax did to make a 1-million-token context window actually usable in production.

With the M2 family, including M2, M2.1, M2.5, and M2.7, MiniMax used full attention across the context window. Full attention means every token can look at every other token. That is accurate but the compute cost explodes quadratically as the context grows. They even published an engineering blog post during the M2 generation explaining that sparse attention infrastructure was not mature enough yet.

With M3, they reversed course. The new architecture is MSA, short for MiniMax Sparse Attention.

Here is how it works in plain terms. Instead of computing attention between every pair of tokens, MSA uses a lightweight index branch to scan incoming tokens and select which blocks of past tokens actually deserve attention. It then runs attention only on those relevant key-value blocks. The base is Grouped Query Attention with block-level selection performed on real, uncompressed key-values rather than a compressed representation.

That detail matters. Previous sparse attention approaches often worked on compressed or approximated key-values, which introduced accuracy loss. MSA skips the compression and operates directly on the real data, which is how MiniMax gets the efficiency gains without sacrificing quality on standard benchmarks.

The practical result at 1-million-token context:

Metric	MSA vs Prior Generation
Per-token compute	One-twentieth of M2
Prefill speed	More than 9.7x faster
Decoding speed	More than 15.6x faster
Flash-Sparse-Attention comparison	More than 4x faster

At approximately 100 tokens per second output speed, M3 is roughly 3x faster than Claude Opus 4.7. That speed difference compounds over long agentic sessions where the model is generating thousands of tokens across many tool calls.

MiniMax demonstrated this in a CUDA kernel optimization task where M3 ran for 24 hours continuously, made 1,959 tool calls, and pushed NVIDIA Hopper FP8 hardware utilization from 7.6% to 71.3%. That is a 9.4x speedup. The fact that M3 made its best submission on attempt 147 out of 147 total attempts shows that the model kept exploring when other models had already stopped. Only Opus 4.7 showed similar persistence.

Benchmark Performance: Where M3 Stands Against the Giants

Before I get into the apps I built, I want you to understand what the benchmark data actually shows, because the numbers explain why I expected M3 to perform the way it did.

Benchmark	M3 Score	What It Measures
SWE-Bench Pro	59.0%	Real-world software engineering fixes
Terminal-Bench 2.1	66.0%	Command-line agent tasks
SWE-fficiency	34.8%	Efficient code changes
KernelBench Hard	28.8%	Low-level kernel optimization
MCP Atlas	74.2%	Tool use via MCP
BrowseComp	83.5	Web search and autonomous browsing
OmniDocBench	Above Gemini 3.1 Pro	Multimodal document understanding
Claw-Eval	First place	End-to-end autonomous agent evaluation
SVG-Bench	Above Opus 4.7	SVG generation quality
PostTrainBench	0.37	Autonomous model training
OSWorld-Verified	70.06%	Computer use tasks

The headline number is SWE-Bench Pro at 59.0%. That beats both GPT-5.5 and Gemini 3.1 Pro and comes close to Claude Opus 4.7. On BrowseComp, M3 actually surpasses Opus 4.7 with 83.5 against 79.3.

One important transparency note: several of these results were run on MiniMax's own infrastructure using agent scaffolding tools including Claude Code, Mini-SWE-Agent, and Terminus. That does not make the numbers meaningless, but it does mean independent verification is still pending. Wait for external benchmark runs before making final procurement decisions on production workloads.

For paper reproduction, M3 autonomously reproduced an ICLR 2025 Outstanding Paper in 12 hours, producing 18 commits and 23 experimental figures with no human intervention. That is the kind of long-horizon task where context management, coding, and multimodality all have to work together at the same time.

How I Set Up the Test: OpenCode Plus M3

My setup for all five apps was straightforward. OpenCode is an open-source coding agent similar in concept to Claude Code and Google's Antigravity 2.0. You point it at a working folder, give it a detailed prompt, and it generates files.

With OpenCode, you can select M3 as the model provider. Right now, M3 is available through OpenCode for free, and MiniMax's official statement is that you can also access it directly through their API and through MiniMax Code.

For each app, my approach was the same. I wrote one detailed prompt. I specified the tech stack, the component list, the design system, the behavior, and where applicable the schema and data model. I did not send follow-up prompts. I did not correct errors mid-build. Whatever M3 produced in one pass is what I evaluated.

One prompt tip that I use for everything: write the prompt before you build the app. Not a casual one-liner. A prompt that covers what files you want, what libraries to use, what the color system looks like, how each section should behave, what the data schema is, and what the expected user flow is. The difference between a detailed prompt and a vague one is not a minor quality improvement. It determines whether the app is usable or a skeleton.

Now here is what happened.

App 1: Vantage AI Portfolio Website (4 Out of 5)

0:00 / 0:00

The first test was a UI build. I asked M3 to create a personal portfolio and achievement dashboard called Vantage AI. The spec included six pages, a dark navy and electric lime color palette, a custom cursor, light and dark mode toggle, 3D card effects, scroll animations, a skills section with radar charts, a timeline section, a blog card section, a contact form, and an animated statistics counter.

The tech stack I specified was HTML, CSS, and JavaScript with no external image dependencies. No Unsplash, no placeholder images. Everything had to be coded.

What came out on the first run:

The custom cursor appeared immediately when I loaded the page. This was not the default browser cursor. It was the custom one I specified. Light and dark mode toggled cleanly with the switch in the header, and every section maintained the right contrast ratios in both modes. The 3D card hover effects on the portfolio items worked exactly as I described. The statistics counter animated on scroll. The timeline section built from start to end. Blog cards had clean hover states. The contact section included a start conversation button that functioned.

The frontend skills chart in the scroll animation section had a rendering issue. The radar chart for the frontend skills category was not rendering the hover state correctly on scroll. I had specified that hovering the front-end parameter should highlight it distinctly from the back-end parameters behind it. The model handled the animation but the hover state layering was off.

That one issue is why I gave it four out of five rather than five. Everything else, the full six-page structure, the scroll animations, the 3D effects, the mode toggle, the statistics, the footer, all delivered on the first prompt. For an open-weight model running for free on OpenCode, that result is better than most paid tools I have tested.

App 2: Vaultify SaaS Magic Link Authentication Service (3 Out of 5)

The second test was a SaaS product build. I asked M3 to create Vaultify, a Magic Link Authentication service for developers. The idea is a service similar to Auth0 or Clerk, where developers register their apps and use a hosted magic link flow to authenticate their users.

My prompt specified the full product architecture: a landing page, a sign-in flow with three subscription tiers (free, starter, and pro), an app management dashboard, a dev mode for testing without a live email server, live activity logging, a widget embed system for developers, and a PostgreSQL schema for the underlying data model.

Here is what worked on the first run:

The landing page loaded clean. The sign-in page showed three account types. I selected the free plan. The dev mode dashboard loaded and showed a live activity feed. I clicked through to create a new app, added a domain for promptslove.com, and the create app flow completed. When I tested the sign-in flow as a user entering an email address, the activity log showed the magic link send event in real time. The login success event followed when I clicked the link. Every state updated correctly.

Here is what failed:

The button color customization inside the widget settings panel did not persist. I changed the primary color, saved the settings, and tested the widget. The color did not change. Beyond that, several features I specifically listed in the prompt were present as UI elements but the underlying data connections were not wired. The doc page had no content because I had not prompted for it, but M3 added a navigation link to it anyway and left it empty.

The other issue is that for a complex SaaS build, I made a deliberate decision not to emphasize the landing page in my prompt. The reason is context window management. A detailed landing page specification pushed the generation toward spending tokens on marketing copy and design details rather than the core app logic. If you are building a complex multi-page SaaS with M3 in one session, build the app first. Add the landing page last.

My rating is three out of five. The core authentication flow worked. The magic link logic ran correctly in dev mode. The app management dashboard was functional. But the features I specifically requested that were missing, and the button color that refused to persist, tell me that M3 at this complexity level needs iteration to close the gaps. If I had run a second prompt targeting those specific issues, I am confident the score would have gone to four or five.

App 3: Neon Dungeon Crawler HTML Game (5 Out of 5)

0:00 / 0:00

This is the result that made me say this model is a serious option for development work.

I asked M3 to build a dungeon crawler game in HTML and JavaScript. I specified the game mechanics in detail: a player character with health and attack stats, enemy AI that tracks and attacks when you enter range, a loot system with items like health potions and attack orbs, an inventory system, multiple floor levels with progression, and a dungeon environment rendered in SVG with pillars, walls, and atmospheric details.

What M3 delivered on the first prompt:

A fully playable dungeon crawler. When I launched the game, the dungeon environment appeared with SVG-rendered floors, walls, and pillars. The enemy characters moved toward me when I entered their detection radius. My health bar reduced when enemies got close enough. I found items on the map, including health potions, and when I opened the inventory and used a health potion, my health went from a depleted state back to 100. I found an attack orb in a red box and picked it up. The inventory tracked both items correctly.

The game interface included a main menu, floor number display, health indicator, inventory button, and item use system. Every interaction I tested worked. The SVG graphics for the dungeon environment looked genuinely good for a browser-based game.

My rating is five out of five. For an open-weight model building a functional interactive game with enemy AI, item systems, inventory management, and floor progression in one prompt, this is a result I would not have predicted six months ago. I specifically noted that this is commendable for an open-source model to produce something like this in one go.

App 4: Sketch to UI Builder in Python (4 Out of 5)

The Python test was about building a native macOS desktop application with an unusual use case: a sketch-to-code tool. The concept is a drawing canvas where you sketch a rough wireframe of a UI and M3 interprets it into actual HTML code. You can also upload an image and the model converts it to HTML.

My prompt specified a native macOS look, a canvas with multiple background modes (white, grid, light grid), model provider selection between Minimax M3 directly and OpenRouter, a code view and preview view split, and both sketch-to-HTML and image-to-HTML generation.

What M3 delivered:

The app had a genuinely native macOS feel. The toolbar, the canvas, the panel layout all matched what you would expect from a macOS application rather than a web app wrapped in Electron. The settings panel showed two model provider options: Minimax M3 from their official API, and OpenRouter. I selected M3, clicked test, and confirmed the connection worked with the 3.42 model version displayed. I selected the dark grid canvas background and it rendered correctly. I drew a basic wireframe including a rectangle at the top for a navbar, a circle element, and a text block reading "Welcome to promptslove.com." I then generated the HTML from the sketch, which opened in the browser. The output matched what I had drawn, with a navbar, the circle element, and the text block.

What did not work:

The in-app preview panel did not render the generated HTML. The CSS was not picked up by the preview component. The browser fallback worked, but the split view preview that I had specified did not. The image-to-HTML feature was also incomplete. The image upload input was absent from the interface, so the image conversion path I had built into my prompt was missing entirely.

My rating is four out of five. The core sketch-to-code workflow ran end to end. The native macOS aesthetic was delivered. The model and provider selection worked. But two specific features I asked for were either absent or broken, which drops it below a five.

App 5: Canopy macOS Focus and Pomodoro App (4 Out of 5)

The desktop app test was my most ambitious individual build. I asked M3 to create Canopy, a macOS focus application that lives in the menu bar. The reference point I gave was the Sessions app, which costs around $100 per year. I wanted a competitive alternative built for free using M3.

My prompt specified: a menu bar icon that expands the app, a focus timer with configurable durations, Pomodoro-style session management with short and long breaks, auto-start break mode, a daily session goal tracker, ambient sound presets, task management, session notes, a statistics view with a heat map and project breakdown, and a GIF display during breaks.

What M3 delivered:

Canopy appeared in my menu bar after the build completed. I clicked the icon and the app opened in a compact overlay, exactly the pattern used by apps like Sessions. The settings panel let me configure focus duration, short break length, long break length, number of sessions, auto-start breaks after session ends, daily goal in minutes, tick sound during focus, and mute during breaks. All settings accepted input and saved correctly.

The ambient sound system was the feature that impressed me most. I had options for lo-fi music, ocean sounds, forest sounds, and coffee shop ambience. Each played immediately when selected and I could adjust the volume. I added tasks to the session. I added a note that read "creating and evaluating Minimax M3." The statistics view showed a seven-day heat map and a session breakdown by project type, rendered in the colors I had requested.

What did not work:

The menu bar timer display did not sync correctly with the running session inside the app. When I started a session inside the app window, the menu bar showed a different value. The two were not calibrating together. That is one deducted mark.

My rating is four out of five. A macOS menu bar focus application with ambient sounds, Pomodoro sessions, task management, notes, and a statistics heat map, all built in one prompt for free using M3, is exactly the kind of output that shows what this model class can do. The Sessions app equivalent that costs $100 per year was reproduced in a single session. The sync issue between the menu bar and the in-app timer is the only meaningful gap.

What the Results Tell Me About M3

Across five apps in one session, M3 delivered:

App	Type	Score
Vantage AI Portfolio	Frontend UI with 6 pages, animations, dark mode	4 out of 5
Vaultify SaaS	Magic link authentication service, dashboard, PostgreSQL schema	3 out of 5
Neon Dungeon Crawler	HTML/JS game with enemy AI, inventory, item systems	5 out of 5
Sketch to UI Builder	Python macOS app with canvas, sketch-to-code, model selection	4 out of 5
Canopy Focus App	macOS menu bar Pomodoro app with ambient sound, stats	4 out of 5

The pattern is consistent with what I have seen from other models at this capability level. Scoped builds with clear interaction patterns perform better than complex multi-component SaaS products that require precise data connections across layers. The dungeon crawler scored five out of five because every feature had a clear binary test: does the enemy chase me, does the health potion work, does the inventory open. The SaaS product scored three out of five because some of its features require precise wiring between frontend state and backend logic that M3 got partially right on the first pass.

The lesson I take from this: write prompts that define both the what and the how. For the game, I described the interaction logic and the model executed it. For the SaaS, I described some behaviors without specifying the exact data flow, and the model made choices that missed what I intended. The more specific you are about how data moves through the system, the closer the output will be to what you want.

Pricing Breakdown: What You Actually Pay

This is where M3 makes its most persuasive argument.

Option	Cost	Token Allowance
OpenCode (current)	Free for first 7 days	Included in OpenCode quota
MiniMax API (standard)	$0.30 per million input / $1.20 per million output	Up to 512K per call standard rate
MiniMax API (long context, above 512K)	Higher rate	For full repository and document analysis
MiniMax Plus	$20 per month	Approximately 1.7 billion tokens
MiniMax Max	$50 per month	Approximately 5.1 billion tokens
MiniMax Ultra	$120 per month	Approximately 9.8 billion tokens

For comparison, Claude Opus 4.7 runs $5 per million input tokens and $25 per million output. That makes M3 more than 15 times cheaper on input. With cache optimization, the blended cost on M3 drops further to approximately $0.06 per million tokens.

At the token plan level, $20 per month gets you 1.7 billion tokens of M3 usage. Text, image, speech, and music all share the same token pool, which means one subscription covers multiple modalities without separate billing.

The cost argument is strongest for developers running agentic workflows with long sessions and high token consumption. A 24-hour autonomous coding run like the CUDA optimization task MiniMax demonstrated internally would be economically viable at M3's pricing. The same task at frontier closed-model pricing would be significantly more expensive.

MiniMax M3 vs the Competition

I want to give you the honest comparison, not a promotional one.

M3 vs Claude Opus 4.7. Claude likely still edges M3 on raw reasoning quality for the most complex tasks. On SWE-Bench Pro, Opus 4.7 scores higher than M3's 59.0%. On BrowseComp, M3 wins with 83.5 against Opus 4.7's 79.3. The key difference is cost and open-weight availability. M3 at $0.30 per million input tokens against Opus 4.7 at $5 per million is a 15x gap. For developers who do not need the absolute ceiling of reasoning quality, M3 covers a large percentage of use cases at a fraction of the cost.

M3 vs GPT-5.5. On SWE-Bench Pro, M3's own numbers put it ahead of GPT-5.5. MiniMax states this explicitly in their technical release and the VentureBeat coverage confirms the claim. GPT-5.5 is closed-weight and priced for enterprise. M3 is open-weight and coming to Hugging Face within 10 days of launch.

M3 vs Gemini 3.1 Pro. M3 beats Gemini 3.1 Pro on SWE-Bench Pro and also scores above it on OmniDocBench (multimodal document understanding). Gemini 3.1 Pro remains strong in the Google ecosystem. M3 competes on cost and on specific coding tasks.

M3 vs DeepSeek V4 and Qwen3.7 Max. These are M3's closest competitors. All three are Chinese open-weight models chasing the same agentic use cases. M3's differentiator in this lane is the combination of all three capabilities at once: frontier coding, 1M context, and native multimodality. No single competitor currently matches all three simultaneously.

One transparency note: VentureBeat reported that M3 eclipses GPT-5.5 and Gemini 3.1 Pro on key benchmarks at 5 to 10 percent of the cost. The price comparison is accurate. The benchmark comparison is based on MiniMax's own infrastructure runs, and external validation is still pending.

The Real-World Tasks That Convinced Me

Beyond my five apps, the MiniMax team demonstrated M3 on three long-horizon tasks during the launch. I want to include these because they are the most honest signal of what the model can sustain over time.

The first was an ICLR 2025 Outstanding Paper reproduction. M3 ran autonomously for nearly 12 hours, produced 18 commits and 23 experimental figures, and successfully reproduced the core experiments from the paper "Learning Dynamics of LLM Finetuning." This required the model to read and understand complex academic figures and formulas (multimodal), write and execute code over an extended session (coding and agents), and hold the entire paper and experiment log in context simultaneously (1M token context). All three of M3's headline capabilities had to work together at the same time.

The second was a CUDA kernel optimization task. M3 started from a Triton skeleton that could not run and had no reference high-performance implementation to copy from. Over 24 hours, it made 147 benchmark submissions and 1,959 tool calls, improving FP8 hardware utilization from 7.6% to 71.3%. Most other models stopped making progress within the first 30 submissions. M3's best result appeared on attempt 145.

The third was autonomous model training on PostTrainBench. M3 was given four base models and asked to complete the full pipeline from data synthesis to training to evaluation and iteration within 12 hours. It scored 0.37, below Opus 4.7's 0.42 and GPT-5.5's 0.39, but ahead of all other models tested.

These are not typical app builds. They are stress tests for long-horizon autonomy. The results tell me that M3 can sustain productive work over many hours, not just generate good output for short prompts.

Who Should Use MiniMax M3?

M3 is the right choice when:

You are running long agentic workflows where context accumulates across many tool calls and session length matters. The 1M-token window and MSA architecture make this category economical instead of painful.

You are building MVPs, personal tools, or early-stage products where the quality gap between M3 and frontier closed models is small enough to matter less than the cost gap.

You need multimodal understanding in your agent pipeline, including image analysis, video input, or computer use, and you want to do it without paying frontier model pricing.

You are working with open-weight models for customization, fine-tuning, or self-hosting and you want the most capable open-weight coding model currently available.

M3 is the wrong choice when:

You need the absolute ceiling of reasoning quality for complex multi-step inference tasks. Opus 4.7 still holds an edge on the hardest reasoning benchmarks.

You are in production and need independent benchmark validation before switching your core model. Wait for the external verification runs.

You are building complex multi-layer SaaS architecture where precise data connections across components are required on the first pass. Plan for iteration on large complex builds.

Frequently Asked Questions (FAQs)

What is MiniMax M3 and when was it released?

MiniMax M3 is a language model released on June 1, 2026, by MiniMax, a Shanghai-based AI lab. It is the first open-weight model combining frontier-level coding performance, a 1-million-token context window, and native multimodal capabilities in a single model. You can access it today through the MiniMax API, MiniMax Code, and through OpenCode where it is currently available for free.

How do I access MiniMax M3 for free?

MiniMax M3 is available for free through OpenCode for the first seven days of access. OpenCode is an open-source coding agent that lets you point the model at your project folder and start building. You can also access M3 through MiniMax's own platform and through OpenRouter. The open-weight model files and technical report are scheduled for release on Hugging Face within approximately 10 days of the June 1 launch.

What is MiniMax Sparse Attention and why does it matter?

MSA is the architectural innovation that makes M3's 1-million-token context window practical. Instead of computing attention between every pair of tokens (which gets exponentially more expensive as context grows), MSA selects which blocks of past tokens need attention and runs computation only on those. The result at 1M context is one-twentieth the per-token compute of the prior generation, with 9.7x faster prefill and 15.6x faster decoding. For developers running long agentic sessions, this makes million-token workflows affordable rather than prohibitively expensive.

How does MiniMax M3 compare to Claude Opus 4.7 on coding?

On SWE-Bench Pro, M3 scores 59.0% compared to Opus 4.7's higher score, but M3 beats Opus 4.7 on BrowseComp with 83.5 against 79.3. M3 also beats Opus 4.7 on SVG-Bench. The cost difference is significant: M3 inputs at $0.30 per million tokens against Opus 4.7 at $5 per million, a 15x gap. For most coding and agentic workflows, M3 delivers competitive quality at a fraction of the cost. For the most complex reasoning tasks, Opus 4.7 maintains an edge.

Can MiniMax M3 build a real working app in one prompt?

Based on my testing, yes for bounded, well-specified builds. The dungeon crawler game scored five out of five with zero iteration. The macOS focus app and the Python sketch-to-code tool each scored four out of five in one pass. The SaaS product scored three out of five because complex multi-layer applications with precise data connections require more iteration. My recommendation: write very detailed prompts that specify not just what you want but how each component should connect and behave. The more specific the implementation path, the closer the output will be to your vision.

What is MiniMax Code and how does it relate to M3?

MiniMax Code is MiniMax's agent product built specifically for M3. It was trained together with M3 and is designed to leverage M3's long-context, coding, and multimodal capabilities simultaneously. It includes an Agent Team framework for multi-stage concurrent workflows, a Producer and Verifier adversarial loop for self-correction, and computer use support for interacting with desktop applications. MiniMax Code is built on a harness based on the open-source OpenCode and Pi projects. You can download it at agent.minimaxi.com.

Final Thoughts

After testing GLM and other open-weight models over the past year, MiniMax M3 is now my favorite model in this class. The combination of frontier-level coding, the 1M-token context window powered by MSA, and native multimodality in a single open-weight model at $0.30 per million input tokens is a compelling package.

My overall rating across the five app tests is four out of five. That score reflects a model that consistently delivers functional, high-quality outputs in one pass on most build types, with room for iteration on complex multi-layer SaaS products.

The free access through OpenCode is where I would start. Write a detailed prompt. Specify the tech stack, the component list, the design system, and the data flow. Let M3 run. Whatever comes out in one pass will tell you more about what the model can do for your specific use case than any benchmark table.

If you push the model hard and discover the gaps in one-pass generation, those gaps are closeable with iteration. The architecture is strong enough, the pricing is low enough, and the context window is large enough that M3 deserves a serious place in any developer's model evaluation.

Try it. Start building.

Share this article

Ramanpal Singh

Ramanpal Singh Is the founder of Promptslove, kwebby and copyrocket ai. He has 10+ years of experience in web development and web marketing specialized in SEO. He has his own youtube channel and active on social media platform.