# Elon Musk's Grok3 Just STUNNED The Entire AI Industry (Beats Everything)

## Метаданные

- **Канал:** TheAIGRID
- **YouTube:** https://www.youtube.com/watch?v=igDU0_R_oOM
- **Дата:** 18.02.2025
- **Длительность:** 19:00
- **Просмотры:** 275,414

## Описание

Join my AI Academy - https://www.skool.com/postagiprepardness 
🐤 Follow Me on Twitter https://twitter.com/TheAiGrid
🌐 Checkout My website - https://theaigrid.com/

00:00 - Introduction to Grok 3
00:23 - Benchmark Performance of Grok 3
01:12 - Grok 3 vs Other AI Models
02:12 - Chatbot Arena Rankings
04:12 - Continuous Improvements in Grok 3
05:07 - Reasoning Model Capabilities
06:45 - Performance in Mathematics, Science, and Coding
08:26 - Generalization & Overfitting Concerns Addressed
10:25 - Demonstrating Grok 3’s Advanced Reasoning
12:22 - AI Generating a Mars Trajectory
13:34 - Grok 3 Enters the AI Agent Era
14:18 - Introduction to Deep Search
16:16 - Transparent AI Search Results
17:31 - Grok 3 Availability & New Website
18:52 - Conclusion & Final Thoughts


Links From Todays Video:


Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos.

Was there anything i missed?

(For Business Enquiries)  contact@theaigrid.com

Music Used

LEMMiNO - Cipher
https://www.youtube.com/watch?v=b0q5PR1xpA0
CC BY-SA 4.0
LEMMiNO - Encounters
https://www.youtube.com/watch?v=xdwWCl_5x2s

#LLM #Largelanguagemodel #chatgpt
#AI
#ArtificialIntelligence
#MachineLearning
#DeepLearning
#NeuralNetworks
#Robotics
#DataScience

## Содержание

### [0:00](https://www.youtube.com/watch?v=igDU0_R_oOM) Introduction to Grok 3

so it's quite clear that when Elon Musk said that Gro 3 is the world's smartest AI he wasn't actually jumping on the hype train today he unveiled Gro 3 which is by far the world's smartest AI across a range of several different benchmarks in today's video I'll get through all of the announcements that you need to understand for Gro 3 and show you guys why this is actually the world's smartest AI in its current form so one

### [0:23](https://www.youtube.com/watch?v=igDU0_R_oOM&t=23s) Benchmark Performance of Grok 3

of the first things most people of course going to want to look at is of course the benchmarks and if we take a look at grock 3 the non reasoning model we can actually see that those benchmarks are pretty incredible across the board we can clearly see that grock 3 and grock 3 mini actually surpass recent state-ofthe-art models like Gemini 2 deep seek V3 claw 3. 5 Sonet and the recently updated GPT 4 oh this is something that is truly incredible and even if you guys think that benchmarks are important later on the team actually goes to show why they looked at new benchmarks tested grock 3 on those and it still manages to excel so it seems like all of the training like this you know huge training round that they managed to do has actually managed to make the model that much smarter and these scaling laws are still performing very well you know the model

### [1:12](https://www.youtube.com/watch?v=igDU0_R_oOM&t=72s) Grok 3 vs Other AI Models

is still currently training actually so this is a little preview of our Benchmark numbers so we evaluated grock 3 on you know three different categories on General mathematical reasonings on general knowledge about stem and science and then also on computer science coding so Amy uh American Invitational math examination uh host it you know once a year uh and if we evaluate the model performance we can see that the grock 3 across the board is in a league of its own even his little brother Gro 3 mini is reaching the frontier across all the other competitors so you will say well at this point all these benchmarks you're just evaluating you know the memorization of the textbooks memorization of the GitHub repost how about realtime usefulness how about we actually use those models in our product another thing that I do love

### [2:12](https://www.youtube.com/watch?v=igDU0_R_oOM&t=132s) Chatbot Arena Rankings

about the gro 3 team is that they also manag to put their model into the chatbot Arena if you aren't familiar with the chatbot Arena it's not quite like a standardized test it's essentially a test where you have two AI models that give you a response anytime you ask an AI question and out of the two responses that you get you essentially pick which one you think is better and it's a blind test so you don't know which model is called which you just receive a response you click which one you think is better and over time they showcase which model has been receiving the most number of wins and currently the number one model on the chatbot arena is actually grock 3 so even in blind tests where people aren't you know biased by the names of the model grock 3 is something that is clearly winning and remember this is just the non reasoning model we actually kicked off a blind test of our gr 3 Model code named Chocolate it's pretty hot yeah hot chocolate um and uh you know been running on this uh platform called CH arena for two weeks um I think the entire a platform at some point speculated this might be the next generation of a uh AI coming your way so uh how this CH Arena works is that um it strip away the entire product surface right it's just raw comparison of the engine of those agis the language models themselves and place interface where the user was submit one single query and you get to show two responses you don't know which model they come from and in then you make the vote so in this blind test grock 3 an early version of grock 3 already reached like 1,400 no other models has reach an ELO score had to have comparison to all the other models at this score and it's not just one single category it's, 1400 aggregated across all the categories in chb capabilities in instruction following coding so it's number one across the board in this blind test and it's

### [4:12](https://www.youtube.com/watch?v=igDU0_R_oOM&t=252s) Continuous Improvements in Grok 3

still climbing so we actually have to keep updating it so it's 14,400 about, 1400 in climbing yeah in fact we have a version of the model that we think is already much better than the one that we tested here yeah we'll see you know how far it gets uh but that's the one that we're you know uh um working on or talking about today yeah so actually one thing if you're if you're using grock 3 you I think you may notice improvements almost every day um because we're continuously improving the model so literally even within 24 hours you'll see improvements so now of course is the juicy part the reasoning model so these models are basically models that think for a extended period of time rather than just giving you an instantaneous response if you aren't familiar with why these models do this it's because it allows the models to think for a longer amount of time giving them access to better and higher quality responses enabling them to be more accurate and Tackle more complex problems this is

### [5:07](https://www.youtube.com/watch?v=igDU0_R_oOM&t=307s) Reasoning Model Capabilities

something that we've moved to do as an industry because this is something that you know is providing promising results and will likely lead us to truly smart AI now when we look at Croc 3's reasoning capabilities in terms of the thinking models those two also managed to surpass even the recently debuted 03 mini that many people heralded as the smartest AI on the planet but now that is unfortunately number two yeah okay so let's see how Grog do on those interesting challenging benchmarks uh so yeah so reasoning again refers to those models that actually thinks quite for quite a long time before it tries to solve a problem so in this case uh you know around a month ago the grph 3 pre-training finishes so after that we worked very hard to put the reasoning capability into the uh current graph three model but again this is very early days so the model is still currently in training so right now what we're going to show to people is this beta version of the grth 3 reasoning model alongside we also are training a mini version of the reasoning model so essentially on this plot you can see uh the grth three reasoning beta and then grth 3 mini reasoning the grth three reason mini reasoning is actually a model that we train for much longer time and you can see that sometimes it actually perform slightly better compared to the gr 3 reasoning this also just means that there's a huge potential for the grth 3 reasoning because it's trained for much less time um so all right so let's actually look at what how it does on those three benchmarks so Jimmy also introduced already so essentially we're

### [6:45](https://www.youtube.com/watch?v=igDU0_R_oOM&t=405s) Performance in Mathematics, Science, and Coding

looking at three different areas mathematics science and coding um and for math we're picking this high school competition problem um for science we actually pick those PhD level science questions um and for coding it's also actually pretty challenging it's competitive coding and also some lead code which is some code inter interview problems that people usually get when they interview for companies so on those benchmarks you can see that the gro 3 actually perform quite well uh across the board compared to other competitors um yeah so it's pretty promising these models are very smart so Tony what what are those uh shaded bars yeah so okay so uh I'm glad you asked this question so for those models because it can reason it can thinks you can also ask them to even think longer uh you can spend more what we call test and compute which means you can spend more time to reason to think about a problem before you spit out the answer so in this case the Shaded bar here means that we just ask the model to spend more time you know you can solve the same problem many times before it tries to conclude what is the right solution and once you give this compute or this kind of budget to the model it turns out the model can even perform better so this is essentially the Shaded part in those SPS right so I think this is really exciting right because now instead of just doing one chain of thoughts with AI why not do multiple all at once yes so that's a very powerful technique that allows to continue scale the model capabilities after training uh so they

### [8:26](https://www.youtube.com/watch?v=igDU0_R_oOM&t=506s) Generalization & Overfitting Concerns Addressed

also wanted to see if this was just a situation where because they trained it for so long and on so much data was this something that was just you know overfitting and it's basically just memorizing parts of the test they decided to test it on the newer am 2025 and the results were pretty surprising um and you know people often ask are we actually just over fitting to the benchmarks yes so how about generalization so yes I think uh yeah this is definitely a question that we are asking ourselves whether we are overfitting to those current benchmarks uh luckily we have a real test so about 5 days ago Amy 2025 just finished this is where High School come students compete in this particular Benchmark so we got this very fresh new competition and then we asked our two models to compete on the same Benchmark at the same exam and it turns out uh very interestingly the grth three reasoning the big one um actually does uh better um on this particular new fresh exam this also means that the generalization capability of the big model is stronger much stronger compared to the smaller model uh if you compare to the last year's exam actually this is the opposite the smaller model kind of learns the uh the previous exams better so yeah so this actually shows some kind of true generalization from the model that's right so 17 months ago our Gro zero and Gro one barely solves any High School problems that's right and now we have a kid that just already graduate the gr is ready to go to college is that right yeah I mean it won't be long before it's simply perfect the human exams won't be part they'll be too easy yeah like and internally we actually as gret continue evolves uh we're going to talk about you know what we're excited about but very soon there will be no more benchmarks left yeah now let's actually take a look at these reasoning capabilities in action to show you guys what these models can actually do yeah so like Jimmy said

### [10:25](https://www.youtube.com/watch?v=igDU0_R_oOM&t=625s) Demonstrating Grok 3’s Advanced Reasoning

we've added Advanced reasoning capability to Gro and we've been testing them pretty heavily over the last few weeks and order to give you a little bit of a taste of what it looks like when Gro is solving heart reasoning problems so we prepared two little problems for you one comes from physics and one is actually a game that gr is going to write for us um so when it comes to the physics problem you know what we want Gro to do is to plot a viable trajectory to do a transfer from Earth to Mars and then uh at a later point in time a transfer back from Mars to Earth um and that requires some know some Physics that Gro will have to understand um so we're going to challenge Gro you know come up with a viable trajectory or calculate and then plot it for us so we can see it and um yeah this is totally unscripted by the way this is the that's the entirety of the prompt which should be clarifi is that yeah there's nothing more than that yeah exactly this is the grock interface and we've typed in this text that you can see here generate code for an animated 3D plot of a launch from Earth uh landing on Mars and then back to Earth at the next launch window um and we've not kicked off with the query and you can see Gro is thinking so uh part of gr's advanced reasoning capabilities are these thinking traces that you can see here you can even go inside and actually read what Gro is thinking as it's going through the problem as it's trying to solve it um yeah we say like we are doing some obscuration of the thinking so that our model doesn't get totally copied instantly um so there's more to the thinking than is displayed uh yeah all right so this was the little physics problem we had um know we we've collapsed the fs here so they're you know they're hidden and then we see grock's answer below that so it explains it wrote a python script here using matplot lip then gives us all of the code um so let's take a

### [12:22](https://www.youtube.com/watch?v=igDU0_R_oOM&t=742s) AI Generating a Mars Trajectory

quick look at the code you know seems like it's doing reasonable things here not totally of the Mark um solve Kepler says here so maybe it's solving capler laws cap Kepler law numerically um yeah there's really only one way to find out if this thing is working I'd say let's give it a try let's run the code all right and we can see um yeah gr is animating two different planets Earth and Mars here and then the green uh ball is the vehicle that's transiting the spacecraft that's transitioning between Earth and Mars and you could see the journey from Earth to Mars and looks like yeah indeed the astronauts return safely you know at the right moment in time um so now obviously this was just generated on the spot so now we can't tell you if that was actually correct solution so we're going to take a closer look now maybe we're going to call some colleagues from space X ask them if this is legit um it's pretty close it's I mean uh yeah I mean there there's a lot of complexities in the actual orbits that have to be taken into account but this is pretty close to what it looks like awesome um now grock 3 also entered its

### [13:34](https://www.youtube.com/watch?v=igDU0_R_oOM&t=814s) Grok 3 Enters the AI Agent Era

agentic era which is something that I'm not surprised by AI agents are essentially the theme for 2025 and Beyond and essentially they released something once again I'm honestly surprised that all of these companies have named the product the exact same thing they've called it once again deep research or in this case they've called it deep search um so today we're actually introducing a new product called Deep search that is the first generation of our Gro agents that not just helping the engineers and researchers and scientists to do coding but actually help everyone to answer questions that you have dayto day it's a kind of like a Next Generation search engine that really help you to understand the universe so you can start

### [14:18](https://www.youtube.com/watch?v=igDU0_R_oOM&t=858s) Introduction to Deep Search

asking question like for example hey when is the next Starship launch day for example um so let's try that okay the answer um on the left hand side we see uh a high level progress bar essentially you know the model now is going to do one single search like the current rack system but actually thought very deeply about hey what's the user intent here and what are the facts I should consider at the same time and how many different website I should actually go and read their content right so this can really save hundreds hours of everyone's Google time if you want to really look into certain topics and then on the right hand side you can see the bullet summaries of how the current model uh you know is doing what websites browsing what sources verifying and often time actually cross validate different sources out there uh to make sure the answer is actually correct before it's output final answer and we can you know at the same time fire up a few more queries um how about you know you don't you're a gamer right so uh sure yeah so how about what are some of the best builds and most popular builds in the PA EXL hardcore right a hardcore League I you can technically just look at the hardcore ladder might be a fast way to figure it out yeah we'll see what model does um and then we can also do uh you know uh something more fun for example um how about like make a prediction about the marsh madness out there yeah so this is kind of a fun one where um Warren Buffett has a billion dollar bet if you can exactly match the I think the all the the sort of the entire winning tree of March Madness you can win a billion dollars from Warren Buffett so like it would be pretty cool if AI could help you win a billion dollars from Buffett that seems like a pretty good investment let's go now another neat

### [16:16](https://www.youtube.com/watch?v=igDU0_R_oOM&t=976s) Transparent AI Search Results

feature of this deep search is that you can actually look at the Chain of Thought of the model so if the model doesn't respond with something that you wanted you can actually look at how the model reasoned through it search data to see how it came to that conclusion I think this one's really useful because they talk about you know having the model be as transparent as possible and this is actually going to make it even more useful so if you don't get the response that you do want you can actually reason look into the model thoughts and figure out why and then in this case you can actually scroll through actually reading through the mind of grock what informations does the model actually think about or trustworthy what are not how does they actually cross validate different information sources so that makes the entire search experience and information retrieval process a lot more transparent to our users and this is much more powerful than any search engine out there you can literally just tell it only use sources from X you know it will try to respect that yeah and so it's much more steerable much more intelligent than I mean it really should save you a lot of time so something that might take you half an hour or an hour of researching on the web or searching social media you can just ask it to go do that and come back in 10 minutes later it's done an hour's worth of work for you that's

### [17:31](https://www.youtube.com/watch?v=igDU0_R_oOM&t=1051s) Grok 3 Availability & New Website

really what it comes down to so now if you are wondering how this AI is going to be rolled out they actually speak about a new website called gro. com currently as of recording this video the website is unfortunately down I'm pretty sure the hype just simply broke the website maybe they didn't expect that many viewers but it's basically going to be on gro. com where they also have the Super Gro which is basically going to be where you can access the app dedicated on their website one of the most advanced capabilities and earliest access to new features um so feel free to check that out as well this is for the dedicated Gro app and for the website exite so our new website is called gro. com yeah and you'll also find you never guess yeah you never guess and you can also find our grock app in the IOS app store and that gives you like a more Pol even more polished uh experience that's totally grock focused if you're if you want to have grock know easily available one Tap Away yeah the version on gro. com on uh you know on a web browser is going to be the most the latest and most advanced version because obviously takes us a while to getting get something into an app and then get it approved by the app store so uh and then if something's on a phone format there's limitations of what you can do so the most powerful version of Gro um and the latest version will be the web version at gro. com yeah so watch out for the name grock free in the app dead giveaway yeah exactly that that's the giveaway

### [18:52](https://www.youtube.com/watch?v=igDU0_R_oOM&t=1132s) Conclusion & Final Thoughts

that you have gr and if it says gr through then gr hasn't quite arrived for yet but we're working hard to roll this out today

---
*Источник: https://ekstraktznaniy.ru/video/13296*