# Grok 4 - 10 New Things to Know

## Метаданные

- **Канал:** AI Explained
- **YouTube:** https://www.youtube.com/watch?v=dbgL00a7_xs
- **Дата:** 10.07.2025
- **Длительность:** 11:44
- **Просмотры:** 179,244

## Описание

Grok 4 is here, but did you know these 10 things about the new model? From benchmark caveats to soloing science, $300 a month secrets to Grok 5 promises, here's 10 new things to know in just under 12 minutes.

AI Insiders ($9!): https://www.patreon.com/AIExplained

Chapters:
00:00 - Introduction
00:22 - Benchmark Results
02:11 - Benchmark Caveats
02:59 - ARC-AGI 2 
03:35 - SimpleBench
04:49 - ‘Humanity’s Last Exam’
07:20 - SuperGrok Heavy Price
07:58 - API Price
08:12 - Grok 5, Gemini 3.0 Beta, GPT-5
09:12 - System Prompt Change + $1B a month, pollution
10:20 - Not soloing science, helping you solo code

Livestream: https://www.youtube.com/watch?v=1tQ_KrlHgfg&t=1s

Price: https://grok.com/#subscribe
https://x.com/ArtificialAnlys/status/1943166841150644622

Gemini DeepThink: https://blog.google/technology/google-deepmind/google-gemini-updates-io-2025/#deep-think

https://simple-bench.com/

ARC-AGI 2: https://x.com/arcprize/status/1943168950763950555

Humanity’s Last Exam: https://agi.safe.ai/

SmartGPT: https://www.youtube.com/watch?v=hVade_8H8mE

New Power Plant, 1m GPUs: https://www.tomshardware.com/tech-industry/artificial-intelligence/elon-musk-xai-power-plant-overseas-to-power-1-million-gpus

Gemini 3.0 beta: https://web.archive.org/web/20250709174548/https://github.com/google-gemini/gemini-cli/blob/b0cce952860b9ff51a0f731fbb8a7649ead23530/packages/cli/src/ui/utils/errorParsing.test.ts

Pollution: https://www.theguardian.com/technology/2025/apr/24/elon-musk-xai-memphis
https://www.youtube.com/watch?v=C8rU4dv2w8Q
https://www.youtube.com/watch?v=3VJT2JeDCyw

System Prompt: https://github.com/xai-org/grok-prompts/blob/535aa67a6221ce4928761335a38dea8e678d8501/ask_grok_system_prompt.j2

Burn Rate: https://www.bloomberg.com/news/articles/2025-06-17/musk-s-xai-burning-through-1-billion-a-month-as-costs-pile-up

Ron Johnson: https://x.com/jdcmedlock/status/1939814516503847259


Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Podcast: https://aiexplainedopodcast.buzzsprout.com/

## Содержание

### [0:00](https://www.youtube.com/watch?v=dbgL00a7_xs) Introduction

Gro 4 is out and it's a pretty good AI model, but there is going to be more noise about this language model than possibly any other. So hopefully I can give you a little signal amid the chaos. Let's boil things down to just 10 things to know about the newest and possibly smartest AI model. Point one is that

### [0:22](https://www.youtube.com/watch?v=dbgL00a7_xs&t=22s) Benchmark Results

Croc 4 might just be the smartest model around, at least according to the benchmarks. In certain settings on high school math competitions, it beats out OpenAI's best model and Google's best model. The same is true for a fairly famous science benchmark, the Google proof Q& A, where it again beats out Anthropic's best model and Google's. Likewise on at least one coding benchmark, but Elon Musk went much further saying about Gro 4 that quote it's smarter than almost all graduate students in all disciplines simultaneously. That quote is of course going to be picked up by everyone, but it needs three important caveats. First from me is that Grock 4 is still a language model, which means it's still going to be prone to all those hallucinations you're familiar with. It's not a new paradigm of AI. Second, we have heard that kind of hype before, notably from the Google DeepMind CEO Demesis Saris almost 18 months ago saying that Gemini 2 was better than almost all human experts. amazing about Gemini is that it's so good at so many things. As we started getting to the end of the training, for example, each of the 50 different subject areas that we tested on, um, it's as good as the best expert humans in those areas. That was an exaggeration then and Musk is exaggerating now because real world performance doesn't always match up to benchmark performance. Expertise is way more than answering multiple choice questions. Hence the third bit of context coming from Musk himself the CEO of XAI saying that quote about being smarter than graduates was at least with respect to academic questions. Gro four is a postgrad level in everything like it's it just some of these things are just worth repeating like Grock 4 is postgraduate like PhD level in everything better than Ph but like most PhDs would fail so it's better that said I mean at least with respect to academic questions point number two

### [2:11](https://www.youtube.com/watch?v=dbgL00a7_xs&t=131s) Benchmark Caveats

is that I've been highly impressed by Gro 4 but these benchmark results are misleading for another reason note first of all that the y-axis doesn't begin at zero so these differences between the models are somewhat exaggerated in terms of scale. XAI, makers of Gro 4, selectively choose which models to compare to. Notice in one recent high school maths competition, Gro 4 heavy, and I'll get to that later, way outperforms Gemini Deepthink, that's the soonto-bereleased Gemini 2. 5 Pro Heavy, if you like. But in this coding benchmark, Live Codebench, Gemini Deepthink actually outperforms Grock 4 heavy and yet is not in the chart. As always then, when these model providers show benchmarks, you've got to take them with a grain of salt, especially when the answers to the benchmarks are available online. But none of that quite

### [2:59](https://www.youtube.com/watch?v=dbgL00a7_xs&t=179s) ARC-AGI 2

explains Grot 4's brilliant performance on ARK AGI2, a semi-private evaluation. As you can see, this post on Twitter or X has got almost 3 million views and is climbing rapidly because this is known to be a fairly rigorous test of so-called fluid intelligence or IQ if you like. And Gro 4 genuinely does beat out other models. I've covered ARC AGI in other videos, but suffice to say, Gro 4 can genuinely pick up on latent patterns in your data. Of course, that is of relevance to almost all disciplines. Next, is there a benchmark for how smart a model feels? Well, yes.

### [3:35](https://www.youtube.com/watch?v=dbgL00a7_xs&t=215s) SimpleBench

I tried to come up with one and it's called simple bench. It's a test of social intelligence, trick questions, and spatiotemporal questions. Now, because everyone is spamming the Gro 4 API, it's pretty tough to run the full benchmark today, but I've run about 20 questions to get a pretty good estimation. Take this question. It's a bit of a spin on a common logic puzzle, and Grock 4 actually sees through it. That's actually the first model not to pick the trap answer. Grock 4 will feel smart, but of course, if you draw it out of its comfort zone, for example, with spatial reasoning, it can still fall apart. In this question, in common with all other models, Quark 4 doesn't notice that the glove will simply fall on the road. It also takes an extremely long time to answer fairly often, which could be a slight issue for many of you. Having said all of that, I strongly suspect that Gro 4 will be around the top of my leaderboard on Simple Bench. In other words, try not to be too tempted to explain away all those benchmark results just to benchmark hacking. Now, that doesn't mean Grock 4 is worth $300 a month, but I'll come to that in just a second because there's one more benchmark I want to touch on, and that is, of course, the grandiosely

### [4:49](https://www.youtube.com/watch?v=dbgL00a7_xs&t=289s) ‘Humanity’s Last Exam’

named humanity's last exam in which under certain settings, Grock 4 scores over 50%, by far the best performance of any model. However, you should know that this is a knowledge intense benchmark and therefore performance is heavily dependent on the training data that goes into the model. To give you just one example, is it critical to your use case that a model know about hummingbirds having a bilateral paired oval bone? Now, I sound cynical, but I think it's actually really cool that models have such an incredible knowledge base. And so, genuinely, I will be using Grock for a fair bit. I said at the time of the release of that exam that it wouldn't be humanity's last exam. Whether you happen to have the requisite knowledge in your training data isn't so much a marker of how intelligent you are as a model. This is not hindsight. On my Patreon in September of last year, I called that the exam would fall sooner than many others with tools means that for example, Grock 4 can write code to perform certain computations. But what is this Gro 4 heavy? Well, here's Musk to explain the with the GR 4 heavy. What it does is it spawns multiple agents in parallel and uh all of those agents do work independently and then they compare their work and they decide which one like it's like a study group. Um and it's not as simple as a majority vote because often only one of the agents actually figures out the trick or figures out the solution. um and and but once they share the trick or or figure out what the real nature of the problem is, they share that uh solution with the other agents and then they compare they essentially compare notes and then and then yield uh yield an answer. So that's the heavy part of GR. Now, long-standing followers of the channel may note that is the exact premise of Smart GPT that I released around 18 months ago, which scored at the time a record performance on the MLU, 89%. Ironically, that exam was also authored by Dan Hendris, who is the lead author of humanity's last exam. And yes, I can't resist plugging that Andre Karpathy shouted out smart GPT. One last thing on the benchmark that many might have missed is that the textbased performance of Gro 4 and Gro 4 heavy is extremely good. But on the full benchmark, it's a more modest improvement over say Gemini 2. 5 Pro. So Gro 4 must do really quite badly on the visual segment. In other words, you might not want to rely on it for decoding Roman inscriptions. Which brings me, of course, to super Groheavy

### [7:20](https://www.youtube.com/watch?v=dbgL00a7_xs&t=440s) SuperGrok Heavy Price

for $3,000 a year or $300 a month. XAI are promising new features will come to super Groheavy like video generation in October, but Gemini Ultra for a lower price already has V3. Now, if your pockets are deep enough, you'll just subscribe to everything. But if this is your only maxed out subscription, it's hard to look past the much cheaper $20 Gemini Pro. Let me know, of course, in the comments if you think it's worth this amount and why. I'm open to being persuaded. I just don't see it at the

### [7:58](https://www.youtube.com/watch?v=dbgL00a7_xs&t=478s) API Price

moment. Just quickly, if you're a developer, you'll know that Gro 4's pricing is at the same level as Claude for Sonnet, $3 input, $15 output, which is a decent price for a Frontier model, but again, there are much cheaper alternatives. Next, if you did watch the

### [8:12](https://www.youtube.com/watch?v=dbgL00a7_xs&t=492s) Grok 5, Gemini 3.0 Beta, GPT-5

live stream, of course, Musk mentions repeatedly that they have new features and new models coming soon and that Gro 5 may be finishing training imminently. However, we also get leaks this week that Gemini 3 is coming and of course, perennial leaks about GPT5 coming this month. Now, it used to be the case that we would then have to wait 6 months for the actual release of the model because of safety checks. Would a model help with creating a bioweapon, for example? But that all seems to have gone out of the window at the moment. Which brings me to this fairly wild quote from Musk on safety. Will this be bad or good for humanity? Um it's like I I think it'll be good. Most likely it'll be good. Um Yeah. But I somewhat reconciled myself to the fact that even if I if even if it wasn't going to be good, I'd at least like to be alive to see it happen. So yeah. So actually one Yeah.

### [9:12](https://www.youtube.com/watch?v=dbgL00a7_xs&t=552s) System Prompt Change + $1B a month, pollution

Yeah. Next, and you might have been wondering when I was going to talk about this, but yes, of course, Grock 4 may suffer at times from a similar issue to Grock 3 in that it seems to get sudden urges to praise certain historical figures or focus on a country, for example, South Africa. That behavior seems to have been caused by this addition to Grock 3's system prompt, which is that its response should not shy away from making claims which are politically incorrect. If such a small change to the system prompt causes such wild behavior, then anything could happen with Gro 4. System prompts aren't of course the only issue for XAI. They are apparently burning through $1 billion a month. either Grock 4 or Grock 5 almost needs to bring in more revenue for XAI. Then of course there is the awkward pollution point because while it is crazy impressive how fast XAI have caught up to OpenAI and Google DeepMind bringing in the generators necessary to get competitive that fast did come at a local cost. And if you thought it was wild how quickly Mus XAI got up to 100,000 GPUs, well

### [10:20](https://www.youtube.com/watch?v=dbgL00a7_xs&t=620s) Not soloing science, helping you solo code

they're planning to bring an entire overseas power plant to Memphis with 1 million AI GPUs to be powered. I'm going to try to end on a positive though because even though Mus said that Grock 4 couldn't be used to generate new scientific discoveries just yet, I do think there is an underrated point to be made that's demonstrated by this game made with the help of Grock 4 in just 4 hours. And that's that while models like Grock 4 often struggle with current techniques to solo generate new science, what they are optimized for is making existing science or code easier for you to solo. We probably shouldn't underestimate the impact of allowing everyone to do much more on their own. Then again, you probably shouldn't be using Grock to analyze whether or not you should vote for the big beautiful bill. However, and if Grock 4 or Grock 5's edge comes from its access to X and Twitter data, then at least for Grock 5's sake, let's hope that X can clean up so much of the bot replies, spam, and clickbait that's on there at the moment. Thank you so much as ever for watching. I am certain that this won't be the last mention of Grock 4 on this channel. In fact, I think I mention Grock in a documentary coming up on Patreon. Either way though, have an absolutely wonderful

---
*Источник: https://ekstraktznaniy.ru/video/12061*