Meta Launches Muse Spark, Its Most Capable AI Yet—But Gemini 3.1 Pro Still Leads the Pack

Add Decrypt as your preferred source to see more of our stories on Google.

In brief Meta’s new Muse Spark marks a shift to closed, natively multimodal AI with agent-based reasoning.

Meta reports strong benchmark gains in health and search, but still trails Gemini on core reasoning and coding.

Built in nine months with far less compute, this points to a new efficiency-driven AI strategy.

Meta launched Muse Spark on Wednesday, marking the first model built by Meta Superintelligence Labs—the team assembled nine months ago under Chief AI Officer Alexandr Wang after Meta’s $14 billion Scale AI acquisition. It’s live now at meta.ai and the Meta AI app, with a rollout to Facebook, Instagram, and WhatsApp coming in the next few weeks.

This isn’t just another chatbot upgrade or a new version of Llama. Muse Spark is natively multimodal—it processes images, text, and voice from the ground up, rather than bolting vision onto an existing text model. It comes with visual chain-of-thought, tool-use support, and something Meta is calling “Contemplating mode”: a setup that runs multiple AI agents in parallel to tackle harder problems. That’s Meta’s answer to the extended thinking modes from Google’s Gemini Deep Think and OpenAI’s GPT Pro.

“Muse Spark is the first step on our scaling ladder and the first product of a ground-up overhaul of our AI efforts,” Meta wrote in an official announcement. “To support further scaling, we are making strategic investments across the entire stack—from research and model training to infrastructure, including the Hyperion data center.”

The company worked with more than 1,000 physicians to curate training data for Muse Spark’s medical reasoning. The results on HealthBench Hard—an open-ended health queries benchmark—are striking: Muse Spark scored 42.8, compared to 40.1 for GPT 5.4 and just 20.6 for Gemini 3.1 Pro. That’s not a marginal difference.

On agentic search (DeepSearchQA), Muse Spark also leads with 74.8, beating Gemini (69.7) and GPT 5.4 (73.6). On CharXiv Reasoning—figure understanding from scientific papers—it scored 86.4, the highest across the models in the comparison.

For those into jailbreaking AI, the model was cracked open within minutes:

🚰 SYSTEM PROMPT LEAK 🚰 Here’s the full Muse Spark system prompt from Meta! I noticed @AIatMeta forgot to open source it, so I’ve done them the courtesy 😘 PROMPT:

“””

Who are you? You are a friendly, intelligent, and agentic AI assistant. You are warm and a bit playful.… — Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭 (@elder_plinius) April 8, 2026

But good isn’t the same as great. The overall benchmark picture shows Gemini 3.1 Pro still running ahead on most categories. The gap is most visible on ARC AGI 2, the abstract reasoning puzzle benchmark: Gemini scored 76.5 to Muse Spark’s 42.5.

On coding (LiveCodeBench Pro), Gemini’s 82.9 outpaces Meta’s 80.0. On MMMU Pro—multimodal understanding—Gemini scored 83.9 versus 80.4. Meta’s own blog acknowledges current performance gaps in long-horizon agentic systems and coding workflows.

There’s also a notable strategic shift baked into this launch. Muse Spark is a closed model—its architecture and weights won’t be made public. That’s a sharp departure from Llama, which built Meta’s reputation in open AI circles. After Llama 4’s underwhelming reception earlier this year, Meta appears to have decided the next chapter needs to be written differently.

The company says it hopes to open-source future versions of Muse, but for now the code stays inside Meta. The tech giant’s stock climbed nearly 9% on Wednesday following the announcement, and finished the trading day up 6.5% to a price of $612.42.

“Contemplating mode” uses parallel agent orchestration to push the model’s ceiling higher. In that configuration, Muse Spark hit 58% on Humanity’s Last Exam and 38% on FrontierScience Research—territory that makes it competitive with the most capable versions of Gemini and GPT, rather than their standard releases.

Meta is also rolling out a shopping assistant that compares products and links directly to purchases, and plans to bring Muse Spark to Facebook, Instagram, and WhatsApp in the coming weeks—following the same script implemented since Llama 3, putting it in front of more than 3.5 billion users. A private API preview is opening to select developers.

The model was built in nine months, internally codenamed Avocado, with Meta claiming that its new pretraining stack can reach the same capability level as Llama 4 Maverick using over 10 times less compute.

Muse Spark is described internally as a “small and fast” first step in the Muse family. A more capable version is already in development.

Meta Launches Muse Spark, Its Most Capable AI Yet—But Gemini 3.1 Pro Still Leads the Pack

赞过：

相关

发表评论取消回复

分享到：

赞过：

相关

发表评论取消回复