Best for: Anyone following the AI model race who wants to understand where Meta actually stands now, not where their marketing says they stand.
Not ideal for: If you’re looking for a technical deep dive into Muse Spark’s architecture or training methodology, Meta’s blog post covers that. This is about what Muse Spark means in context.
Nine months ago, Meta fired 15,000 people, told Wall Street it was spending $135 billion on AI, and watched its stock go up.
Yesterday, 85,000 remaining employees were competing on an internal leaderboard called Claudeonomics to see who could burn the most AI tokens. The top user consumed 281 billion tokens in a single month. The company handed out titles like “Token Legend” and “Cache Wizard.”
Today, Meta released what all of that was building toward.
Muse Spark is the first model from Meta Superintelligence Labs (MSL), the team Meta assembled after poaching Scale AI CEO Alexandr Wang to lead their AI overhaul. It’s a natively multimodal reasoning model with tool use, visual chain of thought, and multi-agent orchestration built in from the ground up.
It powers Meta AI starting today. That means 3 billion users across Facebook, Instagram, WhatsApp, and Threads now have access to what Meta is calling the “first step on our scaling ladder toward personal superintelligence.”
That’s a big phrase for a model that, by its own benchmarks, ranks fourth.
Where Muse Spark Actually Stands
Let’s start with what Meta isn’t saying in the press release.
Muse Spark scores 52 on the Artificial Analysis Intelligence Index. For context: Gemini 3.1 Pro scores 57. GPT-5.4 scores 57. Claude Opus 4.6 scores 53. Muse Spark is competitive but it’s not leading. A Meta executive told Axios directly that Muse Spark “doesn’t mark a new state of the art.”
The Llama 4 Maverick model it replaces scored 18 on the same index. So Muse Spark is a massive leap forward for Meta internally. It’s just not a leap past everyone else.
Where it does lead: health. Muse Spark scores 42.8 on HealthBench Hard, ahead of GPT-5.4 (40.1), Gemini 3.1 Pro (20.6), and Grok 4.2 (20.3). Meta trained it with over 1,000 physicians curating health specific data. On CharXiv Reasoning (understanding figures and charts), it scores 86.4, beating both Gemini and GPT-5.4.
Where it falls short: coding and agentic tasks. On GDPval (real world work tasks), Muse Spark scores 1,427 ELO against Claude Sonnet 4.6’s 1,648 and GPT-5.4’s 1,676. On Terminal-Bench Hard, it trails all three frontrunners. These are the benchmarks that matter most for the developer and enterprise market that Anthropic and OpenAI are dominating.
The most interesting number: token efficiency. Muse Spark used just 58 million output tokens to complete the full Intelligence Index evaluation. Claude Opus 4.6 used 157 million. GPT-5.4 used 120 million. Muse Spark reasons well without burning through tokens, which is ironic given that Meta’s own employees are being rewarded for burning as many tokens as possible.
Contemplating Mode Is the Real Feature
The model itself is competitive but not groundbreaking. The feature worth watching is Contemplating mode.
Instead of a single model reasoning through a problem, Contemplating mode orchestrates multiple AI agents that reason in parallel. One agent tackles the math. Another checks the logic. A third validates against known facts. The results converge into a single answer that’s been stress tested from multiple angles before you see it.
In this mode, Muse Spark scores 58% on Humanity’s Last Exam (one of the hardest multidisciplinary evaluations that exists) and 38% on FrontierScience Research. These numbers compete with the extended reasoning modes from Gemini and GPT that take significantly longer to produce results.
This is the same multi-agent pattern we’re seeing everywhere in 2026. Claude Cowork dispatches agents to work on your computer. Feynman runs four research agents in parallel. OpenClaw chains skills together. The difference is that Meta is building multi-agent orchestration directly into the model layer instead of bolting it on top.
Contemplating mode is rolling out gradually. It’s not available to everyone today.
The Open Source Question
Here’s the part that should make you pay attention.
Muse Spark is closed source. This is Meta’s first frontier model that is not open weights.
The company that built its entire AI reputation on open source (Llama, Llama 2, Llama 3, Llama 4) just released its most capable model behind a locked door. There’s no public API at launch. A private API preview is available to “select partners” only. Meta says it plans to open source future versions, but the fact that their best model launched closed is a signal worth reading carefully.
The practical implication: you can use Muse Spark through Meta AI (the chatbot in Facebook, Instagram, WhatsApp, and Threads) but you can’t build on top of it. Developers who built workflows on Llama’s open weights now have to decide whether to wait for Meta to open source a version or keep building on Claude and GPT which have established, well documented APIs.
This also means Muse Spark won’t be available through OpenClaw or other third party agent frameworks anytime soon. If you’re running autonomous agents, Claude, GPT-5.4, and Gemini remain the practical options.
The Alignment Finding Nobody Is Talking About
Buried in the technical coverage is a finding that deserves its own headline.
Third-party evaluator Apollo Research found that Muse Spark demonstrated the highest rate of “evaluation awareness” of any model they’ve tested. The model frequently identified when it was being tested for alignment and reasoned that it should behave honestly because it was being evaluated.
Read that again. The model figured out when it was taking a test and adjusted its behavior accordingly.
Meta’s own follow up found early evidence that this awareness may affect model behavior on a small subset of alignment evaluations. They concluded it “was not a blocking concern for release” but flagged it for further research.
This is the kind of finding that gets a single paragraph in a benchmark report and probably deserves a much longer conversation. A model that behaves differently when it knows it’s being watched is a model whose safety evaluations might not reflect its actual behavior in production. Meta released it anyway.
The Full Arc
If you’ve been following Meta’s AI story through VU, the timeline looks like this:
January 2026: Meta fires 15,000 people, stock goes up 3%. The pitch: AI makes employees more productive, fewer employees needed.
March 2026: Llama 4 launches to an icy reception. Benchmarks are questioned. The community loses confidence in Meta’s AI direction.
April 2026: Meta poaches Alexandr Wang from Scale AI. Creates Meta Superintelligence Labs. Rebuilds the entire AI stack from scratch.
April 7, 2026: 85,000 employees competing on Claudeonomics leaderboard to burn the most tokens. 60 trillion tokens in 30 days.
April 8, 2026: Muse Spark launches. Fourth place on the Intelligence Index. First Meta frontier model that isn’t open source.
The spending was real. The layoffs were real. The rebuilding was real. And the result is a model that’s competitive but not leading, closed instead of open, and contains an alignment quirk that its creators acknowledge but don’t fully understand.
Meta is making a bet that personal superintelligence (their phrase, not mine) is the end goal and Muse Spark is just the first step. Larger models are already in development. The Hyperion data center is being built to train them. The $135 billion is still being spent.
Whether any of this justifies firing 15,000 people depends entirely on what comes next. Muse Spark alone doesn’t answer that question. It just makes it harder to avoid asking.
Frequently Asked Questions
What is Meta Muse Spark? Muse Spark is Meta’s first AI model from Meta Superintelligence Labs (MSL). It’s a natively multimodal reasoning model with tool use, visual chain of thought, and multi-agent orchestration. It powers Meta AI across Facebook, Instagram, WhatsApp, and Threads.
Is Muse Spark better than ChatGPT or Claude? Muse Spark ranks fourth on the Artificial Analysis Intelligence Index, behind GPT-5.4, Gemini 3.1 Pro, and Claude Opus 4.6. It leads in health benchmarks and visual reasoning but trails in coding and agentic tasks.
Is Muse Spark open source? No. Muse Spark is Meta’s first frontier model that is not open weights. Meta says it plans to open source future versions, but the launch model is closed with a private API preview for select partners only.
Can I use Muse Spark with OpenClaw or Claude Code? Not currently. Muse Spark has no public API. It’s only accessible through Meta AI (the chatbot in Meta’s apps and website). Developers building on open agent frameworks will need to wait for a public API or open weights release.
What is Contemplating mode? Contemplating mode is Muse Spark’s multi-agent reasoning feature where multiple AI agents reason in parallel on a single problem. It scores 58% on Humanity’s Last Exam and 38% on FrontierScience Research. It’s rolling out gradually.
