Windows

Musk’s Grok 3 impresses but isn’t a major leap over OpenAI’s o3

February 18, 2025

After much anticipation and hype around xAI’s Grok 3, the next-gen finally shipped. The company CEO and billionaire Elon Musk touted it as the “smartest AI on earth,” claiming it outperformed proprietary models from top AI firms, including OpenAI, Anthropic, DeepSeek, and Google, across a wide range of benchmarks, including math, science, and coding.

The performance boost could be attributed to Musk’s indication that Grok 3 is “complete with 10X more compute” than its predecessor. During the launch of the product on X (formerly Twitter), Elon Musk indicated:

“Grok 3 is an order of magnitude more capable than Grok 2…[It’s a] maximally truth-seeking AI, even if that truth is sometimes at odds with what is politically correct.”

“We’re continually improving the models every day, and literally within 24 hours, you’ll see improvements,” added Musk. Interestingly, Grok 3 surpasses OpenAI’s GPT-4o across several benchmarks, including the AIME test (which evaluates a model’s math capabilities) and GPQA, which evaluates a model’s capabilities in science.

However, Andrej Karpathy, OpenAI co-founder and former Tesla AI lead, shared some interesting insights about Grok 3’s performance:

“As far as a quick vibe check over ~2 hours this morning, Grok 3 + Thinking feels somewhere around the state of the art territory of OpenAI’s strongest models (o1-pro, $200/month), and slightly better than DeepSeek-R1 and Gemini 2.0 Flash Thinking. Which is quite incredible considering that the team started from scratch ~1 year ago, this timescale to state of the art territory is unprecedented. Do also keep in mind the caveats – the models are stochastic and may give slightly different answers each time, and it is very early, so we’ll have to wait for a lot more evaluations over a period of the next few days/weeks. The early LM arena results look quite encouraging indeed. For now, big congrats to the xAI team, they clearly have huge velocity and momentum and I am excited to add Grok 3 to my “LLM council” and hear what it thinks going forward.”

I was given early access to Grok 3 earlier today, making me I think one of the first few who could run a quick vibe check.Thinking✅ First, Grok 3 clearly has an around state of the art thinking model (“Think” button) and did great out of the box on my Settler’s of Catan… pic.twitter.com/qIrUAN1IfDFebruary 18, 2025

Everything you need to know about Grok 3

(Image credit: Getty Images | NurPhoto)

Grok 3, trained using xAI’s Memphis data center featuring 200,000 GPUs, garnered higher ratings than its competitors on Chatbot Arena, a crowdsourced test designed to compare different AI models.

Grok 3 ships with two modes: Think and Big Brain. The former can be used for general queries, whereas the latter handles difficult queries due to its access to more compute resources for deeper reasoning.

According to xAI, Grok 3 Reasoning and Grok 3 mini Reasoning can think and reason through problems like OpenAI’s o3-mini or DeepSeek’s R1 AI. The tool also ships with a new DeepSearch feature for better research, brainstorming, and data analysis when responding to queries, taking on OpenAI’s Deep Search and Perplexity DeepResearch.

Grok 3 has already rolled out to X users subscribed to the Premium+ tier. It’s worth noting that xAI plans to unveil a new subscription plan dubbed SuperGrok, including exclusive access to DeepSearch, better reasoning capabilities, and unlimited image generation.

To that end, Elon Musk plans to open-source Grok 2 in the next few months:

“Our general approach is that we will open-source the last version [of Grok] when the next version is fully out. When Grok 3 is mature and stable, which is probably within a few months, then we’ll open-source Grok 2.”

Interestingly, Ethan Mollick, an associate professor at the University of Pennsylvania’s Wharton School, indicated that Grok 3 isn’t a leader in the AI space despite Musk’s claims:

X has caught up with the frontier of released models VERY quickly, if they continue to scale this fast, they are a major player. That said, while their base model is currently leading the Chatbot Arena, their benchmarks are not clearly beating OpenAI’s o3
Grok 3 is closely following the OpenAI playbook, including using the same product mix
Not sure whether firms will use the Grok API at this point, given the enterprise partnerships (Azure, AWS, etc.), support and extensive sales & training efforts for the other big labs, I don’t know if Grok has a big opening.

While Grok 3’s performance against OpenAI o3 remains debatable, Gary Marcus, founder of Geometric Intelligence, indicated (via Business Insider):

“Elon Musk promised that Grok 3 would be the smartest AI ever. Spoiler alert: it wasn’t.”

Marcus branded Grok 3’s launch a “carbon copy” of previous demos. He added that while the model shows great promise, its performance has yet to scale OpenAI’s models’ heights. “Sam Altman can breathe easy for now,” he added. “No major leap forward here.”

Source link

RELATED ARTICLESMORE FROM AUTHOR

Can you change your character appearance in Avowed?

Tariffs have forced Acer to increase laptop prices by 10%

Intel vs Snapdragon: Which Surface is best for you?

RELATED ARTICLES MORE FROM AUTHOR