OpenAIs New GPT 4.1 Model Is Even Better Than I Thought...

12:30

OpenAIs New GPT 4.1 Model Is Even Better Than I Thought...

TheAIGRID 15.04.2025 13 564 просмотров 343 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

Join my AI Academy - https://www.skool.com/postagiprepardness 🐤 Follow Me on Twitter https://twitter.com/TheAiGrid 🌐 Checkout My website - https://theaigrid.com/ Links From Todays Video: Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos. Was there anything i missed? (For Business Enquiries) contact@theaigrid.com Music Used LEMMiNO - Cipher https://www.youtube.com/watch?v=b0q5PR1xpA0 CC BY-SA 4.0 LEMMiNO - Encounters https://www.youtube.com/watch?v=xdwWCl_5x2s #LLM #Largelanguagemodel #chatgpt #AI #ArtificialIntelligence #MachineLearning #DeepLearning #NeuralNetworks #Robotics #DataScience

Оглавление (3 сегментов)

Segment 1 (00:00 - 05:00)

So, OpenAI just released their model GPT4. 1 and it's confusing everybody. So, in this video, I'll give you guys the deep dive on what exactly you need to know. So, first off, there is a caveat that most people won't realize, and it's the fact that GPT4. 1 will only be available via the API. Now, you might be wondering, why is this the case? We're going to get into that right now. Essentially, this model GPT4. 1 is a model that is basically designed for developers. It's not really designed for the chat user interface. And essentially, all of the benefits of GPT 4. 1 have been gradually incorporated into GPT40. So, you can see right here it says that in chat GPT, many of the improvements in instruction following, coding, and intelligence have been gradually incorporated into the latest version of GPT40. and we will continue to incorporate more with future releases. So, what they're essentially saying here is that GBT 4. 1 isn't that different from GPT40 in certain areas, but in other areas, it is actually quite different, which is why we made it specifically for developers. Now, so let's say you ignore this and you still want to use GPT4. 1, you actually can use it in a chat interface. So, if you come on over to openouter. com, you'll see that this is the unified interface for LLMs, and it's really cool because it has great prices, better uptime, and there's no subscription. It's basically free. So, just click chat. Then, just click a new room. Then, you just want to click this icon, and this is where all the models will come up. So, click model, and then this is where you can see GPT4. 1. Then, click apply to then we just click X. And now you can see we can literally talk to GPT4. 1. There's various prompts that we can have here. So it says personal finance. We can simply see what it says. So right there is how we get access to GPT4. 1. If you just want to test out the model, test out various prompts and see where things lie in terms of just being able to talk to the model. Now GPT 4. 1 wasn't the only model that was released. They also released two smaller versions of the model for different use cases. So for the use case that most people will be using it for is the smartest model for the complex tasks. This one has a context length of over 1 million tokens. The maximum output tokens of 32,000 and we can see here that the price is relatively cheap compared to other models. Now you've got text image in and text out. And you can see that the expected latency is similar to GPTO. Now, when we look at the two other models, these are two other smaller models that likely have the capabilities of GPT4. 1 distilled into them. You can see that these are much cheaper in terms of the price and they are also much faster. So, this is GPT 4. 1 mini. It is 40% faster than GPT 4. 0 and it's got affordable model balancing the speed and intelligence. And then this is where we get to GPT 4. 1 nano which is the fastest most cost-effective model for low latency tasks. And this one is essentially a model that is dirt cheap and can do a lot of tasks you need. I suspect that OpenAI had done this because there are many models now that can do many different tasks for relatively cheap. So OpenAI want to bolster their offerings in terms of models that they can provide. Now, when we look at one category of the benchmarks, and trust me, this won't be a benchmarky kind of video because OpenAI themselves have actually said that they wanted to focus on actually having models that have real world implementation. But nonetheless, we do have to take a look at this one specifically because it did outperform every other model in terms of the S. bench. And this was the main discussion on the AI sphere. So you can see here GPT 4. 1 is significantly better than GPT 4. 0 in terms of software engineering including agentically solving coding tasks, front-end coding and making fewer extraneous edits, following diff formats reliably, ensuring consistent tool usage and more. And so this is really surprising because it's even better than OpenAI 03 Mini which is a really impressive model and OpenAI 01 high. So overall we can see that this is a super model when it comes to coding in terms of the comparisons to other models. Now another model and this might be a little bit more interesting to developers but trust me the technical jargon will end in two more slides. This just shows you how accurate GPT 4. 1 is compared to GPT40. We can see that there's a dramatic improvement. 31% here compared to the 52% there. Of course the 01 reasoning models are in a completely different category. But for a model that is much cheaper and much faster, I think this is a really useful thing, especially for developers. Now, if you wanted to visualize what GPT 4. 1 is, this graph is a little bit more helpful in terms of visualizing where the model

Segment 2 (05:00 - 10:00)

is. We can see that, you know, the intelligence is on the left and the latency is at the bottom. Now, with that being said, we can see that GPT 4. 1 Nano is a really effective tool balancing the latency and the intelligence. And GPT 4. 1 Mini is probably the most effective at balancing them all out because it's relatively smart and is relatively fast and cheap as well. So, I would say that this is probably one of the biggest contenders to Gemini 2. 5 flash as that model is dirt cheap and has remarkable intelligence. Now, like I said, the technical developer jargon ends here. One of the things OpenAI wanted to focus on was the realworld utility. And they state that while benchmarks provide valuable insights, they trained these models with a focus on real world utility. Close collaboration and partnership with the developer community enabled us to optimize these models for the tasks that mattered most to the application. And to this end, the GPT 4. 1 model family offers exceptional performance at a lower cost. And these models push performance forward at every point on the latency curve. So when it comes to real world examples, we saw that Windinssurf GPT4. 1 scores 60% higher than GPT40 on Windsurf's internal coding benchmark, which correlates strongly with how often code changes are accepted on the first review. And their users noted that it was 30% more efficient in tool calling and about 50% less likely to repeat unnecessary edits or read code in overly narrow incremental steps. And these improvements translate into fast iteration and smoother workflows for engineering teams. And if you aren't familiar with Windinsurf, it's basically an AI coding tool, Flash Assistant. We can see here as well, we got GPT4. 1 being 53% more accurate than GPT40 on the internal benchmark of this company's real world challenging tax scenarios. And this jump in accuracy, key to both system performance and user satisfaction, highlights GPT 4. 1's improved comprehension of complex regulations and ability to follow nuanced instructions over long context. Remember, one of the things that we spoke about is that GPT 4. 1 has an extremely long context window, which in real world use cases is remarkably important because we often have to reason over long documents. If you want to visualize just how good that retrieval is, they did the needle in haststack accuracy test. This is essentially the test where they look at 1 million tokens and they put a needle in there. So perhaps a small phrase and they ask the model to retrieve the phrase. We can see that it pretty much has 100% accuracy in terms of successful retrieval over 1 million context length. So this is something that is really useful for certain applications and 1 million tokens is more than eight copies of the entire React codebase. So it's really good in terms of fitting entire code bases in because it's able to find small things and change them. And so this is one of the things that they train the model on. And I suspect that this is probably going to have a lot of real world use cases. In other long context areas, the model was surprisingly good at video long context. So, the model is actually decent in terms of its vision capabilities. Right here, we can see it performs a little bit better than GPT40. However, I don't think this is the real model considering that OpenAI recently updated GPT40 in 2025. So, I presume that GPT40 is probably on par with this benchmark in terms of vision. But this one right here is video long context. The model answers multiple choice questions based on 30 to 60 minute long videos with no subtitles. Then of course we have the vision bench and we can see that it isn't excelling in ridiculous areas. But for those of you who are looking to build things that have vision as a part of the application, Gibb 4. 1 Mini looks to be the one thing that people will be choosing considering the model is 73% on the MMU, which is basically similar to GPT 4. 1 in terms of the vision capabilities, but at a fraction of the cost. Now, with all of these benchmarks, one thing that I don't want to say it confused me, but I would have loved to see was how this model fits in with every other AI model in terms on a variety of different benchmarks. And so, we got to see here where GBT4. 1 fits in terms of the coding benchmark. We can see here it is just behind Claude 3. 7 Sonnet and just behind the new 2. 5 Pro from Gemini. So, it's not far behind and it does comfortably sit above these other models in terms of the coding abilities. So, we can see that this model isn't a dud like other companies have faced and it's a really decent model in terms of just being able to code and do a lot of other things in the real world. So, I would say that this model is more so sort of a model that can be plugged and played into different applications, not so much the model you would want to talk about your daily life with. that probably would be GPT4. 5, but

Segment 3 (10:00 - 12:00)

unfortunately GPT4. 5 is actually going away. They mentioned that GPT 4. 5 will they're going to start depreciating that model in the API as GPT 4. 1 offers improved or similar performance on many capabilities at much lower cost latency. So, it's going to be turned off in 3 months on July 14th, 2025 to allow time for developers to transition. Now remember GBT 4. 5 if you didn't know was a really big model. It was super expensive to train and the outputs were just as expensive. So I'm guessing this is why they had to, you know, essentially get rid of this model because the outputs were just not worth the cost and it was just costing them a fortune. So that is probably the biggest reason why GBD 4. 5 is going away. So, if you, you know, did talk to GPT4. 5, you might want to get a lot of your conversations in before July 14th, 2025. And I do want to say I'm a little bit sad about that because that was actually my go-to model in terms of reasoning about day-to-day things because it had a different ability when it came to looking at problems from a human perspective. Now, if you wanted to take a look at visually how much better the coding was, they speak about how GPT4. 1 substantially improves upon GPT40 in front-end coding and is capable of creating web apps that are more functional and aesthetically pleasing. In their head-to-head comparisons, paid human graders preferred GPT4. 1's websites over GPT4's 80% of the time. That is a huge, huge improvement. Now, we can see right here, this is relatively impressive. We can see that this is the old one. On the left, that's the old GPT4. And then on the right, this is GPT new 4. 1. So, this shows us that the model is clearly better at coding frontends. However, I would have to ask, how does this compare to other similar models? Because I currently don't code a lot with these models. I do simply build agents with these models. So, that is going to be something you're about to see in a video in a few hours from now. But, building frontends is of course something that people are doing. So definitely really intriguing to see exactly how that works. So let me know what you guys think about these three different models. Are you going to be using them if you're a developer? Are you excited about this? I know that I'm going to be using this to build agents pretty much today and so it's going to be super intriguing to see how I managed to get on. If you guys enjoyed the video, do not forget to like and subscribe.

Другие видео автора — TheAIGRID

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник