GEMINI 3.1 PRO DROPPED: Google Is BACK!
10:53

GEMINI 3.1 PRO DROPPED: Google Is BACK!

Universe of AI 19.02.2026 2 966 просмотров 71 лайков
Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
Gemini 3.1 Pro is finally here. Google's most capable model yet drops today with a massive reasoning upgrade, insane benchmark scores, and some wild generations. Here's everything you need to know. For hands-on demos, tools, workflows, and dev-focused content, check out World of AI, our channel dedicated to building with these models: ‪‪ ⁨‪‪‪‪‪‪‪@intheworldofai 🔗 My Links: 📩 Sponsor a Video or Feature Your Product: intheuniverseofaiz@gmail.com 🔥 Become a Patron (Private Discord): /worldofai 🧠 Follow me on Twitter: https://x.com/UniverseofAIz 🌐 Website: https://www.worldzofai.com 🚨 Subscribe To The FREE AI Newsletter For Regular AI Updates: https://intheworldofai.com/ #Gemini31Pro #googleai #gemini #AI #AGI #ArtificialIntelligence #AINews #GoogleDeepMind #AIBenchmarks #NewAIModel Gemini 3.1 Pro,Gemini 3.1,Google Gemini,Google AI 2026,Gemini Pro release,ARC-AGI-2,AI reasoning 2026,AGI benchmark,best AI model 2026,AI news today,Google DeepMind,Gemini vs GPT,Gemini vs Claude,AI benchmarks 2026,new AI model,Google AI update,Gemini API,AI agents 2026,AI february 2026,google is back,Gemini 3 Pro,Google AI news,artificial intelligence 2026,AI model comparison,frontier AI,agentic AI,AI coding 2026,NotebookLM,Google AI Studio WebOS Demo: https://x.com/chetaslua/status/2024434225882238988 0:00 - Intro 0:30 - Gemini 3.1 Pro vs Claude Opus 4.6 4:20 - Early Reactions 4:48 - Windows WebOS 6:39 - Benchmark Results 10:35 - Outro

Оглавление (6 сегментов)

  1. 0:00 Intro 115 сл.
  2. 0:30 Gemini 3.1 Pro vs Claude Opus 4.6 824 сл.
  3. 4:20 Early Reactions 85 сл.
  4. 4:48 Windows WebOS 387 сл.
  5. 6:39 Benchmark Results 709 сл.
  6. 10:35 Outro 66 сл.
0:00

Intro

It's finally here. Gemini 3. 1 Pro dropped today and honestly, I've been waiting for this release for a while. Google has had a rough stretch recently. For a while, it felt like every time they release something, OpenAI or Enthropic would drop something better 2 weeks later, and everyone would kind of forget about it. That's just been the reality for the past year or so. But this one feels different, and I don't say that lightly. I've read through everything they have published today and looked at the demos, and I think Google might actually be back in the conversation in a real way. So, let's get into it. And now, what you're
0:30

Gemini 3.1 Pro vs Claude Opus 4.6

looking at right now is a Voxil formed style game that I've asked both models to create. Gemini 3. 1 Pro Preview, the newest model, and then Cloud Opus 4. 6, the newest model from Enthropic. And we're going to see results from both these models and compare what they look like, the pros and cons of each. And I'm going to say I'm genuinely impressed with both of these models because I have been doing Voxil Farm and Voxil style waterfalls and these type of tests for a while now. And what I notice is the evolution of how clear the worlds look, how interactive they have become and new features that these models are adding. Because if you look at it, first thing I said was create a voxal style farm with animals. Then it created a farm, but I wanted to able to interact with it. So then I said, help me interact or do something with maybe farm. So you can see these are lazy prompts, like very simple prompts, and both of these models are able to create this. So we'll first take a look at the Gemini model. So, I'm going to open this up in full screen so you can see from the get- go, this Voxalo farm looks very aesthetically pleasing and accurate. We have our cow over here. We have our chickens. We have our pigs. And then what is that? I guess it's some hay. And then we have our sheep. We have a farm. And then if you look at it, this animation and everything is so smooth. Like nothing is glitching out, anything like that. We have our barn over here. And what I can do is I can click crops to plant and harvest. So, click on animals to collect items. So, if I were to click on the cow, for example, you can see my milk score went up. So, I'll click on a couple of more times. And my cow is going crazy every time I'm milking it. And then I'll click on some chickens to get some eggs. So, you can see that over here. Then I have my sheep to get some wool. More wool from here. And then I have my pig to get some What does my pig give me? Truffles. Interesting. I didn't know pigs give you truffles. But then what do I have? Can I click on the hay? I can't really click on the hay. But then I can also sell my inventory. So I sold it and I got $440. I'm not complaining. I'll take that money. And then we can plant stuff as well. So you can see all of this was created by the model just from me asking it to hey create a voxal farm. Maybe do something I can interact with. So we can see my tree is growing and now I can get stuff from here. So I can plant stuff and everything like that. What am I getting? I'm getting crops. So now I can sell my crops. Can I plant it all over again? Does that work? That also works. So I can sell my crops. Oh no. Since I planted my crops, I can't use it, which is nice. So I can use my money and I can sell it or I can plant it again. So this is cool. So this was the generation that 3. 1 created. Now we'll take a look at what Cloud Opus 4. 6 created. So let's open this up in a new browser. So click on any animal to select it then feed them cost one food pet them or collect their produce for coins crops. Click on any crop plant to interact water growing crops. So we can see that the opus 4. 6 at least at the moment has added more elements when it comes to interaction. But let's look at how the map looks. The map looked like it's not bad at all. I'm not going to complain. But we can see like our cows are kind of floating around. Sheep is It definitely added more animals compared to the other one. And then let's see if I can do something. If I were to click on this, oh, I like this gamified feature like happiness and everything. So, you can see the happiness is going down. So, if I pet it, it goes up if I continue petting it. If I feed it, I can feed the cow and everything. I can collect. So, this is cool as well. Like, added all these features. Uh, which I'm not complaining. So, both of these models like obviously they both have their pros and cons. We are reaching a level where all of these models are getting so capable, it's going to be hard to kind of differentiate themselves. But I'm happy with both
4:20

Early Reactions

results. And if you're wondering how the AI community is reacting to this, well, they are pretty fired up. And it's mostly around the reasoning score on ARC AGI 2, which was designed by Francois Charlotte specifically to be the benchmark models couldn't cheat. You either reason or you don't score well. The best models were stuck under 50% for a while. 3. 1 Pro hit 77% and people are starting to use the AGI word carefully, but still the jump is hard to ignore.
4:48

Windows WebOS

What you're looking at right now is Gemini 3. 1 generating a Windows OS in a web OS format. And what you're seeing is that number one, this new generation looks way more professional and accurate to the Windows system compared to the other generations we have seen from other models. You can see that it's able to generate new desktops, create this sidebar that looks kind of similar to what Windows actually looks like. We have our Teams application and everything like that. We also have these new features at the bottom where we can adjust brightness, sound, and everything. So, this model is very capable when it comes to coding. We can see this clearly from this demo compared to other demos that we've seen it in the past. But just to show you guys more, I'm going to look at the file explorer. And what you will notice is that not only is this modelable capable of creating these components, they kind of look realistic. They look similar to what Windows would actually look like. And it has added small little Easter eggs throughout the project. If you look at the downloads and everything like that, we have this installer, music application, pictures, and we can change our wallpapers. We have a photos application. What else can we do? We also have the weather application set to Kolkata India, which is kind of cool. But we can see that this is cool. we can move in there or expand it. Then we also have a snipping tool, a control panel which has the system and security icon that's similar to I think Windows. I actually use a Mac so I'm not 100% sure. Then we have our settings where we can personalize everything. We can personalize our background, light mode, dark mode, which is cool cuz this feature is something that I haven't seen other modes add, at least at this capability level. So this is kind of nice to see. What else do we have? We also have our calendar feature. This one kind of looks like a Mac OS calendar compared to a Windows one, but I'm not complaining. This is not bad at all. And we have our calculator. Uh the calculator once again looks very Windows-like compared to other versions
6:39

Benchmark Results

I've seen. All right, let's get into the numbers. And there's actually a lot more to cover here than just that one benchmark Google highlighted in the blog post. Starting with the headline one, ARC AGI2 is designed to test whether a model can handle entirely new logic patterns, things it hasn't seen before. that you can't just pattern match your way through. It's one of the hardest reasoning benchmarks out there. And Gemini 3. 1 Pro scored 77. 1%. Gemini 3 Pro scored 31. 1% which is their old generation. So that's not a small jump. That's going from 1/3 of the test to 3/4 of it. For context, Claude Opus 4. 6 is at 68. 8% and GPT 5. 2 is at 52. 9%. On this one benchmark, 3. 1 Pro isn't just ahead of its older version, is clearly ahead of the competition, too. And just for fun, on ARC AGI1, the older version of the same test, 3. 1 Pro score 98%. That one is basically solved at this point. On the science GPQA diamond, which is a test of graduate level scientific knowledge, the kind of question that requires you to synthesize across multiple fields and not just recall facts, 3. 1 Pro score 94. 3%. Which leads the field once again. Cloud Opus 4. 6 is at 91. 3%, GPT 5. 2 at 92. 4%. So, it's not a massive gap, but Google's on top. On the coding side of things, the two benchmarks worth paying attention to are the software engineering bench verified where a model can autonomously fix real bugs in real GitHub repositories. 3. 1 Pro scored 80. 6%. Essentially tied with Cloud Opus 4. 6 at 80. 8%. That's a genuine contest between the two models. On the Live Code Bench Pro, it's competitive at coding. Think algorithms problem solving under pressure. 3. 1 Pro hit an ELO of 2887. Gemini 3 Pro was 2,439 and GBT 5. 2 at 2393. So on raw coding ability, this is a pretty clear upgrade. On Agent Task, the Apex Agents one, this one is interesting because it's measuring long horizon professional tasks, not oneshot questions, but multi-step workflows where the model has to plan and execute over time. 3. 1 Pro score 33. 5%. Gemini 3 Pro was at 18. 4% and GBC 5. 2 2 at 23% and Opus 4. 6 at 29. 8%. So again, Google leads here and the jump from the previous Gemini version is pretty dramatic. There's also MCP Atlas, a benchmark for tool use where 3. 1 Pro scored 69. 2% and Browser Comp, a web browsing and research benchmark where it scored 85. 9%. Worth being honest here, it's not a clean sweep. Cloud Sonic 4. 6 and its extended thinking mode tied 3. 1 Pro on long context performance. both scored 84. 9% on MRCR version two and actually led on expert task evaluations using a separate ELO style benchmark. So there are still areas where the competition holds up but is close at the top. One more thing worth mentioning the pricing stayed the same as Gemini 3 Pro. So this is a major performance upgrade at no extra cost for API users compared to Anthropic's open tier models. Google also comes in significantly cheaper per token. Bottom line though, Google leads on 13 out of 16 benchmarks evaluated. The three it didn't top were mostly edge cases or categories where only partial competitor data was available across reasoning, science, coding, and agentic task. This is right now the strongest set of benchmark results Google has put out and is competitive with or ahead of the best from OpenAI and Anthropic. So yeah, Google is back at least for now. Whether 3. 1 Pro holds up in day-to-day use compared to what's out there from OpenAI Anthropic, we'll see. But on paper and in the demos, this is the most compelling thing that they put out in a while. I'll drop the blog post link in the description if you want to read through it yourself. And if you're a developer, AI Studio is free to try. Worth running your own prompts against it and seeing how it feels. Make sure to
10:35

Outro

subscribe to our channel. We do real tests, not just headlines. Make sure you're also subscribed to The World of AI. And don't forget to check out our newsletter for deeper breakdowns you won't see on YouTube. And I'm growing my Twitter following, so make sure you follow me on Twitter as well. Hope you guys enjoyed today's video and I'll see you in the next

Ещё от Universe of AI

Ctrl+V

Экстракт Знаний в Telegram

Транскрипты, идеи, методички — всё самое полезное из лучших YouTube-каналов.

Подписаться