Introducing Gemini 3.1 Pro

Introducing Gemini 3.1 Pro

Sam Witteveen

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Оглавление (2 сегментов)

Segment 1 (00:00 - 05:00)

Okay, so it's been over 64 days since Google released one of the Gemini 3 models and that was the Gemini Flash and about 30 days before that they released Gemini 3 Pro. So we're coming up on nearly 100 days of Gemini 3 being out and you got to think in the current AI sort of ecosystem that is the equivalent to like a 100red years. So today, Google is introducing Gemini 3. 1 Pro. And in this video, I'm going to go through some of the updates with this model, talk a little bit about how this fits in with their release schedule, and we'll have a quick look at what this model can actually do as it's now being rolled out to most of the Google apps that use a Gemini model. So if we come in and look at the blog post about this, we can see some sort of interesting things. So the first off, just the fact this is a 3. 1 as is kind of interesting, right? Gemini has never had anything except sort of like a zero release or a 0. 5 release. This is the first time we've seen a 0. 1 release. I do think that's kind of interesting in the fact that it is nearly 100 days since the ver first version of this model has come out. Clearly in that time they've made a lot of impact with the Gemini deep think models and they are clearly taking some of those ideas and some of that technology and putting it into the main pro model here. Okay. So if we come in and look at the benchmarks here I think the real takeaway here is not just comparing this to other models. So they certainly compare this to sonet 4. 6 and Opus 4. 6 six as well as the GPT models. But it's really comparing to Gemini 3 Pro that we can sort of see stuff. So if we look at, for example, the humanities last exam, this is a huge bump over Gemini 3 Pro. Yes, it's, you know, more than Sonnet 4. 6. Yes, it's more than Opus 4. 6, but look at the bump that we're seeing compared to Gemini 3 Pro. And don't forget this is just a you know. 1 release right and I'd say the main reason for that is like I'll show you in a second when we actually do some look at some examples that this thinking high mode in here really is like a deep think mini kind of thing that they've got going on here. We can see this also when we look at like the arc agi 77% compared to 31% with Gemini 3 pro obviously uh you know the anthropic models were doing a lot better than that Google probably hadn't really sort of even thought about optimizing for this maybe with Gemini 3 Pro but we can see that this has got a huge bump here and while Google's not saying this sort of you know in their blog post this definitely looks like that they're starting to get really good RL environments for training these kinds of tasks which then translate to better benchmarks. We see this also when we look at you know some of the other things which would have RL environments like coding bench like things like this MCP atlas where you could imagine that you're doing that like the agentic search etc. So clearly we're seeing you know just not just benchmarks show this off. We can see here that okay you know getting it to generate things for designs 3. 1 Pro is already looking a lot better than 3. Again this is something that you could imagine having being done with a really good RL environment. Same for sort of you know graphic designs on coding and stuff like that. we can see that okay the model's gotten better with those sorts of things now apart from this Google's not actually saying a lot here right so it is kind of interesting that like I said this is a 0. 1 release perhaps in the past they would have just basically said well here's another new Gemini 3 pro preview like they did with say the 2. 5 previews where there were multiple previews before we got to GA it does seem here now that rather than just have another roll out of a new dated preview with a newer date kind of thing. They've actually decided to call this Gemini 3. 1 Pro and those benchmarks I think really do justify that. So, let's jump into having a play with this and see how it performs. And I want to show you how you can actually take advantage of the different thinking levels that this model has so that it can go from, you know, very quick thinking right up to things where you're looking at 5 minutes plus before you've got a full answer back. Okay. So to get started, you come over here and you basically just select the latest model. So if you don't see it, just click all. They're certainly rolling it out. You should be able to see it pretty quickly in here. So, I'm going to start off with one of the

Segment 2 (05:00 - 08:00)

questions from the International Math Olympiad problem. Now, this question when I ran it last year with the deep think model, it was able to give us the correct answer, but it took a long time, right? I think it was from memory 17 plus minutes before we were getting time to first token here. So, you can see here we've got this set up. I've got thinking level set to high and we can see that it's definitely taking, you know, time to get to, you know, to the answer. And this is partly because we've got the thinking level, you know, set to high. With the previous version of Gemini 3 Pro, you could only have low or high. Now, you can have low, medium, or high as the setting in there. So you'll notice as it's going through this, we're already sort of 2 minutes in of thinking, but the answer we're looking for here is basically 0 1 and three. So let's see if it's we're going to get the answer. Okay, so I paused it while it was actually going through. It did finally come to the right answer. It took, you know, over 8 minutes to get to this answer. That's roughly half of what deep think used to take. But this shows one of the things that is really kind of interesting with this model and that's that if you have thinking set to high this acts almost like a mini version of Gemini deep think. And this is one of the things that they've kind of emphasized that this model now has taken sort of lessons from Gemini deep think both the earlier versions and the more recent versions. Now at the same time if I set this to low right I we should get something that the thinking is much quicker on this. Okay. And so while the thinking was much faster here it actually didn't get the correct answer you know in this case. So you do want to make use of the thinking level when you're doing different tasks for this. So another task that people have been doing a lot is creating SVGs. And this one is I've just asked it to make me the SVG of a cat riding a bicycle. So if we render this out, you can see that okay, yeah, it's perhaps not the best cat in the world, but it comes out, you know, quite good. It looks like the cat's wearing a scarf. We've got our bicycle, which looks pretty accurate. We've even got a chain. We've got the legs of the cat actually on the pedals, which is kind of good in there. Okay, so if you want to play with the model yourself, you can just come into AI Studio and try out the model for free and try out your own prompts and sort of see. Remember, the big thing here is you should be experimenting with the different thinking levels. If you've got it set to high, it can take a lot longer to give you an answer, but you're definitely getting sort of like a Gemini Deepthink Mini out of this model when you do that. So, the model is rolling out to the Gemini Pro plan. It's already out on Google Cloud. If you want to try it out there, you can certainly use it there as well. And I'd say that while this is an incremental sort of 0. 1 step, it is a big update that basically gets the model back into the same sort of competitive area as Opus 4. 6 and the latest GPT models as well. So just as 3 Pro spurred a whole new sort of takeoff in both the sort of proprietary models and also the open model weights over the past 3 months. You got to wonder now that Gemini 3. 1 Pro is out and actually has seriously bumped its performance, are we going to see other models release new versions to try and catch up? Anyway, let me know what you think in the comments and as always, I'll talk to you in the next video. Bye for now.

Другие видео автора — Sam Witteveen

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник