# Googles NEW Gemini 2.5 Pro BEATS Everything! (Gemini 2.5 Pro)

## Метаданные

- **Канал:** TheAIGRID
- **YouTube:** https://www.youtube.com/watch?v=a9-_HqyjKEc
- **Дата:** 26.03.2025
- **Длительность:** 12:45
- **Просмотры:** 29,903
- **Источник:** https://ekstraktznaniy.ru/video/13154

## Описание

Join my AI Academy - https://www.skool.com/postagiprepardness 
🐤 Follow Me on Twitter https://twitter.com/TheAiGrid
🌐 Checkout My website - https://theaigrid.com/


Links From Todays Video:
https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/

Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos.

Was there anything i missed?

(For Business Enquiries)  contact@theaigrid.com

Music Used

LEMMiNO - Cipher
https://www.youtube.com/watch?v=b0q5PR1xpA0
CC BY-SA 4.0
LEMMiNO - Encounters
https://www.youtube.com/watch?v=xdwWCl_5x2s

#LLM #Largelanguagemodel #chatgpt
#AI
#ArtificialIntelligence
#MachineLearning
#DeepLearning
#NeuralNetworks
#Robotics
#DataScience

## Транскрипт

### Segment 1 (00:00 - 05:00) []

So, Google just took over the AI industry. They just released Gemini 2. 5, which is currently the most intelligent AI model, not by a little bit, but by far. Gemini 2. 5 is a thinking model designed to tackle increasingly complex problems. And this model, Gemini 2. 5 experimental, leads all benchmarks by a meaningful margin and showcases strong reasoning and incredible code capabilities. Let's take a look at why this model truly has changed a lot of things for the AI industry and what is to come. So, I'm going to get into benchmarks in just a moment, but benchmarks are one thing that companies usually show you to show you how successful the model is. And currently, Gemini 2. 5 Pro, the newest version, seems to excel in every single area that matters. So, when we do take a look at all of these benchmarks here, we can see that Gemini 2. 5 Pro does perform really, really well. And I will say that one of the things I'm starting to see is that these benchmarks are starting to get pretty saturated. Benchmark saturation occurs when the models start to get so good that they all converge around a natural point. And taking a look at Gemini 2. 5 Pro, we can see that the GPQA all of these models are around 80%, Gemini at 84%. On the math benchmark, which is the AME 2025, you can see that all models here are in around that 85%, you can see Gemini is at 86. 7%. On the AME 2024, the models are around 90% with Gemini being at 92%. And on the Live Codebench V5, we can see that all models are around 75%. Now, one of the standout features for Gemini was actually the visual rankings. And I think the visual rankings are really important. Vision is something that is largely underexplored in AI, but Google seemed to have won in that area with an MMLU, the exam that's based on vision performing at an 81. 7% far surpassing any other model. Now on the long context benchmark, the MRCR, we can see that this one also gets 91% and on other image benchmarks, the model seems to do exceedingly well. So one of the key things about this model is that it is a really important model in terms of the vision space. And I think this model goes to show just how good it is at understanding multimodal context. So for those of you who are wondering what Gemini 2. 5's standout features are, I would refer to the visual reasoning. And not just that, but also the coding. If you look at aentic coding, it only falls a little bit short of claw 3. 7 Sonnet, but interestingly enough, in the ADA polyot test, which some would argue is probably a bit more complex than any other, you know, coding benchmark, we can see that it actually performs at a state-of-the-art level. The ADA Polygot test being a comprehensive test to test real world software engineering tasks covering multiple different languages. This one is truly surprising considering that Claude for the longest time has been the leader in terms of coding capabilities. Now, when we actually take a look at these benchmarks, I do want to show you guys something that I think is by far one of the most impressive things on the reasoning and knowledge. One of the key things where Google Gemini Pro stands out is a benchmark that many other models haven't seem to broke the mold yet. So what we can see here is that this is humanity's last exam and this is a challenging model benchmark designed to test the limits of AI systems created collaboratively by the center for AI and safety and scale AI. It consists of 3,000 unambiguous and easily verifiable academic questions spanning maths, humanities, and natural sciences. Now, this exam was developed with contributions from nearly 1,000 subject experts affiliated with over 500 institutions across 50 countries, primarily professors, researchers, and graduate degree holders. And this benchmark was created in response to rapid advancements of AI, which have already achieved 90% on many different benchmarks, which is of course reaching saturation. Now, the reason that I say this is so impressive is because only 10% of questions require both image and text, while 90% are textbased. And this also includes expert level difficulty. These questions are intentionally challenging, often at a level where most college students wouldn't even understand what is being asked. Yet, Google Gemini gets 18. 8%. And it's actually quite diverse. This benchmark contains, you know, chemistry, ecology, pure mathematics, ancient Hebrew, rocket science, Greek mythology, and nearly every organized field of study. And the crazy thing about this is that there are answers for this test, but they cannot be answered quickly via internet retrieval. They actually require really good reasoning. And of course, it's called humanity's last exam because this may be the last general knowledge benchmark that humans can craft and score without AI assistance. So for

### Segment 2 (05:00 - 10:00) [5:00]

Google Gemini 2. 5 Pro to be able to get 18% above open 3 Mini and above Core 3. 7 Sonet tells me that they've done something really well when it comes to the reasoning side. Now we've taken a look at the official benchmarks, but let's take a look at some benchmarks that can't be faked. When we head on over to the large language model arena, the benchmark that actually looks at what real users are using on a day-to-day basis, we actually see something rather fascinating. Gemini 2. 5 Pro actually excels on many of the benchmarks when it comes to human engagement. So whilst yes, we just took a look at the benchmarks and Gemini 2. 5 Pro didn't have any standout areas because of course we've reached benchmark saturation, we can see that in the arena where users are using different AI models every single day. Gemini 2. 5 Pro, this is where the model is being used the most. And the craziest thing about this is that the ELO jump is actually quite large. This one was actually 40 points, meaning that this was the largest score jump from an AI model that ever existed. So, they've clearly done something new here. I'm not sure how they managed to get this model out so soon, but this ELO jump is seriously impressive. When we also take a look at the vision area, and like I said, vision is an extremely underrated form of AI usage because a lot of people don't really want to analyze images. But trust me guys, in a few years this is going to have some wide reaching implications. There are so many things that you can do with vision. For example, ask it to analyze a video. You can ask it to, you know, point out certain things in an image. Ask it to change around The possibilities are completely endless. And I think vision models truly do lag behind a lot of where the text is because people just don't see that many crazy use cases for it. But having such once again another big jump compared to all of these other models is seriously very impressive. Now another thing that I truly do believe is impressive that actually shows me that Google is coming for every other AI company's neck is the fact that they've managed to surpass everyone else on the web dev arena. Now this is of course the web development area where you can see that no surprisingly the number one model is claude 3. 7 Sonic by a clear margin and this is quite surprising because we've actually seen Google go from eighth to second in this coding area. Now the reason that this is pretty impressive is because we've never really seen any other model come close to what Claude have been offering and for the first time ever and this is now an AI that I'm actually using in my workflows. This is definitely one that I'm going to be releasing some key AI workflows in my community, showing everyone how to take advantage of this. And I already did that earlier today, releasing the ultimate Gemini 2. 5 Pro cheat sheet. Don't forget to check out my community. It's going to be the first link in bio if you guys actually want to access this. Now, one of the benchmarks that I'm really intrigued in seeing is the simple benchmark. Now, simple bench is essentially a benchmark that's designed to catch if an AI is truly reasoning or not. One of the problems is that, you know, these AI systems talk in a way which means that they can look as if they're reasoning. And a key individual problem that one person discovered was that often times you will not realize that the model isn't, you know, actually reasoning, but retracing reasoning steps from other questions. Meaning that it's just looking at the problem and not truly analyzing what should be correct. In this, you know, exam, I guess you could call this sort of not an ARC AGI test, but a kind of test for human level reasoning where the human baseline is 83%. The current frontier cla 3. 7 set with thinking only scores 46%. And many of the questions are ones that humans get easily because I think we have a backbone understanding of physics. For example, one of the questions is if you left ice cubes on flames for around 5 minutes, how many ice cubes would be left after 2 minutes? And often times these AIs get tricked into thinking that ice cubes would still be there and they start reasoning and doing a whole bunch of mathematics when on the face value you'll know that there wouldn't be any ice there. So this is something that I'm really intrigued to see because the last time we got models here and they performed really well they got around 46%. So this is going to be a benchmark that I'm really intrigued to see how it manages to perform. Now, if we look at the coding area, Google released a bunch of different demos where you can see right now someone wants a HTML file of a simulation of a reflection of a nebula. They go over to Google AI Studio. They enter this prompt and then this AI is able to simply do this oneshot with a bunch of different lines of code. So first reasons it then codes the necessary app and then of course if you

### Segment 3 (10:00 - 12:00) [10:00]

copy and then you paste that into HTML on code pen you can see that once you click enter this simulation is here and this is a really interesting ability for the AI to be able to code so well. We also have the mandler set demo. Once again, you can say p5 js to explore a mandlero set. And then of course, the individual decides to once again request Gemini 2. 5 pro. Once again, it now thinks and really interestingly, you can actually see the chains of thought unlike OpenAI's model. You can see exactly what it is thinking about. And then of course, you can see that once you use this again and you click play, you're able to get the Mandalor demo. This is of course a famous pattern in mathematics and it just keeps on going which is super interesting but these things are not easy to code at all. Of course this is an interactive plotting demo that Gemini 2. 5 Pro has been able to do. You can see he created an animated bubble chart using plotly express and talk about how economic and health indicators have evolved over the years for each continent. Once again, the thinking part. And I would say if you're going to be using these models, definitely take a look at how the model thinks because that allows you to see if the model is reasoning with the same type of logic that you are using in order to do this. So you can see right here that once you click play, this then produces something rather interesting. And you can see it produces this interactive chart. So this is something that it oneshot with one prompt. And this is something that is really, really interesting because this has a variety of different use cases. Maybe you want to quickly visualize some data that you have on hand. This is going to be something that I would recommend you definitely do because it's something that I use to do all the time. For example, you've got the boys in the hexagon demo. P5GS no HTML, a swarm of 30 colorful swimming in a rotational hexagon. Once again, you can use the model to do this. If you just click run, go over to the thinking, you'll see it quickly thinking about different stuff. And I think you guys get the gist of this. This is a model that it's clearly comprehensive in the sense that it is able to do a bunch of different things in a range of different ways. And like I said before, I definitely use this model to see if you can do a variety of different things that you previously weren't able to considering many individuals are using this model at the frontier. So overall, let me know what you guys think about Gemini 2. 5 Pro. Do you think that Google has done it again? I certainly think that they are on the state-of-the-art area when it comes to leading the AI race, but I do think that there are a few changes in the AI industry. They're definitely going to change how we use these tools. If you guys have en enjoyed this video, don't forget to leave a like on the video and I'll see you in the next
