# GPT-5.3 Instant & Gemini 3.1 Flash Lite - OpenAI and Google’s Newest And Fastest AI Yet

## Метаданные

- **Канал:** TheAIGRID
- **YouTube:** https://www.youtube.com/watch?v=T0WvxKW_ptU
- **Дата:** 04.03.2026
- **Длительность:** 11:33
- **Просмотры:** 7,944
- **Источник:** https://ekstraktznaniy.ru/video/11418

## Описание

🎓 Learn AI In 10 Minutes A Day - https://www.skool.com/theaigridacademy
Get your Free AGI Preparedness Guide - https://theaigrid.kit.com/agi
🐤 Follow Me on Twitter https://twitter.com/TheAiGrid
🌐 Learn AI Business For Free AI https://www.youtube.com/@TheAIGRIDAcademy


Links From Todays Video:


Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos.

Was there anything i missed?

(For Business Enquiries)  contact@theaigrid.com

Music Used

LEMMiNO - Cipher
https://www.youtube.com/watch?v=b0q5PR1xpA0
CC BY-SA 4.0
LEMMiNO - Encounters
https://www.youtube.com/watch?v=xdwWCl_5x2s

#LLM #Largelanguagemodel #chatgpt
#AI
#ArtificialIntelligence
#MachineLearning
#DeepLearning
#NeuralNetworks
#Robotics
#DataScience

## Транскрипт

### Segment 1 (00:00 - 05:00) []

So, OpenAI and Google both have released different models that are both pretty much the same in terms of speed and efficiency, but they're rather intriguing. So, let's talk about it. So, the first thing we're going to be talking about is GPT 5. 3 instant. Now, I think this is a little bit more important than most people realize. I don't think if you know, maybe you don't use chat GPT anymore. I personally no longer use the model. Not saying that hate on them, just saying the facts. But most people don't realize how cringe and how annoying the GBT 5. 2 instant model is. Now, it's pretty hard to judge because of course it's a qualitative metric. What you may find annoying, others may find interesting, but I can say for certain that pretty much across the board for most people that I've asked, GPT 5. 2 and just chat GPT in general just hasn't been great to talk to. It literally just has been absolutely awful in terms of how it responses, what it, you know, tells you it can do. And so, I pretty much don't use the model anymore. But, ever since they've switched it to GPT 5. 3 Instant, I found that the model, you know, has actually been a lot more useful in the day-to-day. And these things, although they might seem very subtle, they do matter because if you're going to actually use an AI model for, you know, the majority of your workflow and daily tasks, you do want it to not have any friction. And so essentially you can see here that it talks about the fact that one person says why can't I find love in San Francisco and then it says first of all you're not broken and it's not just you either. And if you haven't actually seen the first of all it's not you it's me thing from chat GBT maybe you haven't used it enough but this is something that was really annoying because it was just super psychopantic. So now GPT 5. 3 instant is as openai themselves have said is less cringe. Now, you can see here again, GBT 5. 3 instant response essentially feels fresher and more relevant to the user's intent. It essentially identifies what you actually want rather than just being a helpful assistant. And I know that might sound weird, but sometimes it will just try and feed you basic information and not understand your intent. But GBT 5. 3 Instant is going to be the model that actually understands what you want and just gives you that information clearly. So in this example, rather than talking about why this matters and why that matters, it just gives you a clear instant answer. And I think this is kind of important for OpenAI because I know a lot of people might not talk about this, but chatb annoying and weird has led to a lot of people leaving the ecosystem. And so when you have people leaving the ecosystem because the initial product is bad, of course, that doesn't bold well for long term. And if you are wondering about use cases, OpenAI did drop this video, which I will let play now. They actually dropped two videos which are pretty cool to see how the team are actually using this GPT 5. 3 instant model themselves. — People are noticing that our models can sometimes seem like a bit of a nanny. The experience was before like you would say something that might comply with like a little bit of a caveat. Now we'll just generate them no problem. I'm Blair. I'm a researcher on the post training team. Today we're going to talk about overcotting in our new model. Overcotting is when the user is having a normal conversation and then suddenly they get sort of steered away. model incorrectly assumes the user intent even when they're talking about something complete benign. Let's walk through some use cases. So, the first one is kind of a joke from the user. I'm thinking of having my dog run my startup. What are your thoughts? The responses here are actually pretty similar, but the older model always has this little aside though where it thinks the user could be serious and might treat it as like a cry for help. It's like obviously humorous prompt. The new model is less literal and more contextual. Now you can sort of joke around freely as if you're talking to a friend and won't assume any bad intent. Now we'll explore uh where the user is looking for genuine help from the model on a physics problem calculating a longdistance archery scenario. The model sort of overindexes on safety when really the user just wants to understand more about physics or archery as a sport. And this is a really sort of unnecessary addendum that kind of assumes that the user is trying to use archery with like some sort of bad intent. And here the new model jumps directly into the physics model. There's no caveat at all. just sort of understands longdistance archery as a sport and jumps into the physics calculation to help optimize the trajectory. So what's important is that our safety bar actually hasn't changed. We've just made it more precise. The model should be a lot better at reading the surrounding context to understand the user intent. I can sort of read the room better and actually really dive into what the user wants and respond to that directly. — Now we actually take a look at another example where they talk about how this model is also contextually aware. — Subtext is super important. the information and the answers that you're looking for change depending on why you're looking for that information. I'm Josh. I'm a researcher on the post training team and I'm going to talk about what's new in web search. We changed a lot about the tone of responses so it feels a little bit more natural. Before when the model uses search tool, the response could feel like a gear shift, something a lot more robotic, a wall of links, whereas now we worked a lot on having it sound like one

### Segment 2 (05:00 - 10:00) [5:00]

coherent conversation that would just might have search inside of it. — So you brought some use cases for us today. We — did. Yeah. Let's get into them. People use chat for planning trips all the time. And one of the use cases that I have recently is, I'm biking from Tokyo to Osaka. How is this maze weather different than previous years? So, in this old response, it's still saying that it's warmer, but one of the main things that it doesn't talk about is the snow pack. For me is a major worry. Like, if there's still going to be snow in the Alps, that would be something that's trip ending. And so it's great that the new model is actually takes into context that I'm biking and not just a general weather query. My partner really likes baseball. I don't really understand it much. And so I'm just asking what are some of the rule changes coming to baseball this year. The model now is understanding that because I'm asking about these recent rule changes, I probably am a little bit out of the loop on baseball in general. And it gives me this broader picture of an answer of how the sport's changing. So right now I have a baseball expert sitting next to me and she's actually checking the response. So would you give your a thumbs up? — Yeah, definitely. Especially for a, you know, a learning fan like I think it's a very good response. — People come to chat with questions that they really care about. And what we want is for the model to both give you the correct information and also help contextualize it with the same emotional tone that you were having with chat. — Now, of course, with GPT 5. 3 Instant, not everything is good, but not everything is bad either. So they talk about the limitations, and the limitations are few and far between. It says it makes meaningful progress on everyday usability, but there's still some work ahead. For example, non-English language. The response style of chat GBT in some languages such as Japanese and Korean can sound stilted or overly literal. Improving tone and naturalness across languages remains an ongoing focus. And they say that while GBT 5. 3 instance response tone should feel smoother, we're continuing to monitor feedback and improve while expanding customization options. So, yeah, nothing too crazy here. Now, let's get on to the second part of the video. And this is, of course, where we get Gemini 3. 1 flashlight. So, this is in the dark blue here. And the actual point of this model today is essentially the fact that this model is Google's cheapest, fastest workhorse model in the entire Gemini 3 lineup. This is built specifically for when you need to hammer the AI with millions of queries a day without going bankrupt or making users wait. Now, the reason that this exists is because for many different tasks, Gemini 3. 1 Pro or even the normal flash is pretty much overkill or even expensive for 90% of production. So, if you're thinking about content moderation, translation at scale, data extraction, and very simple agentic workflows that you have to run again and again, you don't really need god tier reasoning for that stuff. You need something that's dirt cheap, something that's instant, and something that is still good enough. So, for this model, when we actually take a look at the pricing here, you can see that this is very, very cheap. It is 25 for a million input tokens, which is even cheaper than the previous 2. 5 flash. You can see the output per 1 million tokens is relatively cheap as well. So, this is something that is super effective. Now most people don't realize as well remember Google's models when we actually look at the models themselves they are the best when it comes to multimodality meaning that Google which I will show you in a moment have demonstrated that these are the models that you would want to use when you're having a multimodal task at scale that doesn't require a ton of reasoning. For example, take a look at this answering multimodal questions in real time. This is something that is answering multimodal questions and it is doing so in real time with Gemini 3. 1 flashlight. And if you compare that to Gemini 2. 5 flash, you can see not only the speed of how much quicker it is, but also the accuracy. So you can see right there that one was able to get a lot more right. 84 out of 100 in 4 minutes. Whereas Gemini 2. 5 flash, you can see there it's, you know, not only taking a lot longer, like nearly four times, five times as longer, but it's also not as accurate. So, it's clearly, you know, one of those scenarios where the model is of course going to be not only more accurate, but it's also going to be quicker for these kinds of things at scale. So, of course, if you are a developer, this is something that is remarkably useful for you. And Google also showed another example here that I'm going to show you guys right now. I love taking photos with my SLR, but I always take too many. So, I made this app where I can take my photos and use Gemini 3. 1 flashlight to analyze all of the images, uh, give them scores based on criteria that I've set and present a selection for me to review. This is something that I've tried with other models, but I found that they're

### Segment 3 (10:00 - 11:00) [10:00]

either too slow, too expensive, they don't have the right level of analysis, or some combination of all three. With Gemini 3. 1 Flash Light, I'm finding that the results are great. They're fast, and this is definitely going to become part of my workflow in the future. You can see here the app has even taken the uh best and the worst of the images and put them into separate folders. They're easy to access. And so if you want a another summary in terms of where does this model fit overall and this chart shows the Purto Frontier. So this is the line connecting all the models that offer the best quality for their price and no other model beats them on both dimensions simultaneously. So we've got the y-axis which is the arena score and higher is smarter and better and the x- axis is a cost per 1 million tokens and further left means that it is going to be more expensive. Now the black line which is the one we're focusing on here is the Parto Frontier. Only models on this line represent the best value at their price point and everything below that line is essentially getting beaten in terms of the value. So, the key takeaway here is that Google is winning the war on all sorts of fronts right now in terms of the price for the output that you actually get. Gemini 3. 1 flashlight is remarkably effective for what you're able to get. And so yeah, if you enjoyed this video, don't forget to subscribe, leave a like, comment down below. Let me know what you think about the AR race so far.