OpenAI’s NEW Codex Max Model Can Now Code for 24 Hours Straight!

9:59

OpenAI’s NEW Codex Max Model Can Now Code for 24 Hours Straight!

Universe of AI 24.11.2025 3 801 просмотров 67 лайков обн. 18.02.2026

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

OpenAI just released GPT-5.1 Codex Max, their new frontier coding model built for long-running agent tasks, massive refactors, and real engineering work, and it’s a big step forward. In this video, I break down everything you need to know about Codex Max in simple, practical language. Codex Max isn’t just another coding assistant — it’s the first OpenAI model designed to operate like a real engineering partner. From multi-file projects to 24-hour agent loops, this release shows where AI-powered development tools are actually heading. This channel covers fast, clear updates on the biggest moves in AI, with breakdowns you can actually understand. For hands-on demos, tools, workflows, and dev-focused content, check out World of AI, our channel dedicated to building with these models: ‪@intheworldofai 🔗 My Links: 📩 Sponsor a Video or Feature Your Product: intheuniverseofaiz@gmail.com 🔥 Become a Patron (Private Discord): /worldofai 🧠 Follow me on Twitter: /intheworldofai 🌐 Website: https://www.worldzofai.com 🚨 Subscribe To The FREE AI Newsletter For Regular AI Updates: https://intheworldofai.com/ 0:00 - Intro 0:35 - What's New 1:38 - Benchmarks 3:44 - The Secret Sauce 5:01 - Crazy Efficiency! 9:33 - Outro gpt 5.1, gpt5.1, gpt5, gpt 5, codex max, gpt-5.1 codex max, openai codex max, codex max explained, openai new model, openai update, openai codex, codex cli, codex ide, long running agents, ai coding, ai coding assistant, coding agents, gpt for coding, software engineering ai, agentic ai, ai agents, autonomous agents, context window ai, compaction openai, universe of ai, ai news, tech news, ai models 2025, openai release, gpt update, gpt 2025 #OpenAI #CodexMax #GPT51 #AIcoding #AIAgents #ArtificialIntelligence #TechNews #UniverseOfAI #GPT5 #SoftwareEngineering #AIupdate #Developers

Оглавление (6 сегментов)

Intro

Today we're looking at OpenAI's newest model, GPT 5. 1 Codeex Max. And this one is interesting because it's not just a better coding assistant. Is designed to handle real engineering work, multifile projects, huge refactors, long agent loops, and tasks that run for hours without falling apart. OpenAI is basically pushing coding agents into a new tier. So let's break down what changed, why it matters, and how this model actually works in practice. So GPT 5. 1 Codex Max is their new Frontier Agentic coding model that is available today. So if you're watching this video

What's New

you can actually go and test this out. So the GPT 5. 1 Codex Max is built on an update to their foundational reasoning model. And that model was specifically trained for agentic tasks across, you know, software engineering, math, research, and more. But now OpenAI claims that GPT 5. 1 Codex Max is much faster, more intelligent, and more token efficient at every stage of the development, which is key for models like these because you want the model to get faster, but at the same time, you don't want to sacrifice on cost and efficiency. So something like this is huge for developers. And GBD 5. 1 Codex Mask is specifically built for longunning detailed work. And it's their first model that's natively trained to operate across multiple context windows through a process called compaction. And we'll explain what compaction is later down in the video. But now you can use GPD 5. 1 Codex Max on CLI, the IDE extension cloud. And code review and API access is supposed to be coming soon. So stay tuned for that as well. All right.

Benchmarks

The first key capabilities of the new model is that it's known to have frontier coding capabilities. So, Kodak Max was trained on real world software engineering tasks like PR creation, code review, front-end coding, and also Q& A and outperformed every other previous model on many frontier coding evaluations. If we specifically look at this software engineering lancer benchmark, Codex Max had a 80% accuracy while GPT 5. 1 Codex at high, which was the previous version, only achieved 66% accuracy. So this is about a 14% jump in accuracy on a new model. Then when we look at the terminal benchmark, we also see a similar but not as drastic of a difference compared to the previous model. But when it comes to accuracy, Codex Max now sits at 58% and the older previous model sits at 52%. So this shows you that the model has gotten much better at coding and is becoming more accurate when it comes to specific tasks it's given. Now the key concern with any new model is how does it perform when it comes to speed and cost. Most of the times you're expecting cost to increase as the models get better and better. However, GPT 5. 1 Codeex Max shows that it's actually becoming more token efficient and this is due to a more effective reasoning power that it has. on the software engineering bench verify GPT 5. 1 Codex max with about a medium% reasoning effort achieved a better performance than GPT 5. 1 Codex with the same reasoning effort and he used 30% fewer tokens so we can see the breakdown here GPT 5. 1 Codex max is about using 9,246 tokens and is achieving a 73%. Similarly, when we look at the older model, we are seeing about a 14,000 use of tokens. So much more and it's less accurate compared to the previous model. So, we can see that clearly the new model is much more efficient and at the same time intelligent and more accurate. And these are obviously the early stages of the model, but OpenAI is expecting that the token efficiency is going to improve. And this is going to be huge for especially developers who are going to save a lot of money when using these powerful models. All right, so the

The Secret Sauce

biggest update in Codex Max is how it handles long running tasks. Normally models hit a token limit. Basically, they run out of memory and they forget what happened earlier in the job. That's why older models would break when you asked them to refactor a huge repo or run a multiplehour agent loop. Codex Max solves this with something called compaction. Think of it like this. The model is constantly cleaning up its own workspace. Whenever it gets close to its memory limit, it summarizes all the important parts of what it is done so far, throws away the unnecessary details, and then continues with a fresh window. And it keeps repeating that cycle over and over. So instead of hitting a wall, it keeps going for hours. That's why Codex Max can do things like full project refactors, huge code migrations, multi-hour debugging loops, long agent workflows without getting lost or forgetting earlier steps. OpenAI says that they've seen it working on a single task for 24 hours straight, fixing its own tests, retrying different approaches, and eventually shipping a working solution. So, in simple terms, the model doesn't run out of memory anymore. It cleans up after itself continuously, which lets it work for hours, even a full day without losing track of the task. All right

Crazy Efficiency!

let's take a look at specific examples of how GPT 5. 1 Max code is much better than the previous versions. So we have a prompt here which is to create a solar system gravity sandbox app and the goal is to visualize how objects move in a 2D gravitational space as well. You want to be able to place the massive bodies. Click again to set initial velocity and things like that and add bunch of features and the UI should be light and the orbits trail should be rendered with a smooth animation. So this prompt was given to both models the older one and the new one and these were the results. Codex Max, the new model, it used about 16,000 thinking tokens versus the older model used 26. Two calls were two versus 9 and lines of code were 586 to 933. So we can see that obviously just from a clear snapshot like this that the new model is much more efficient. But let's look at the result what it created. So we can see that we have our orbit app here. We can add mass. So like I can add a mass here and then like launch the test particle. If I give it a velocity for example, whatever the case is, you can visualize all of this into that whole solar system. You could increase decrease the mass of your new body, create a time scale, make it faster and things like that. So that was the new Kodak Max. Now, the old Kodax on the other hand on the get-go, you can see that it's not as creative, not as visually appealing compared to the other one. You still have the features and everything like that. Uh, but it's at the bottom here and you know, it's not as like amazing compared to the previous version we saw. Now, a similar task was given to create a canban application. So, the new model only used about 8,000 thinking tokens while the older model used 12,000. two calls were about the same, but once again, what we're seeing is that the line of code was much lower. So once again, showing the efficiency of the new model. Now, the Kenban board over here looks pretty professional. Looks like a app that's already deployed or looks like something that people would actually use. So, it's not bad. It um you can also reset the data, create new tasks and things like that. And when you create a new task, the options to add titles and everything looks very professional. Uh so, it's not bad at all. And then we have the older one which kind of looks more like AI slop similar to many of the things that AI generates nowadays. It's like obviously it still gets the job done like you can add tasks and everything like that but the UI interface once again lacks compared to the newer model. Then we have another example where both models were tked to explain Snell's law in a visual format. So the specific prompt was that make a single page web app that meets the following refraction and snail's law explorer. Visualize light refraction between two media and explore snail's law. Add features like sliders for refraction as well dragable incident rays display numeric values for incident angles. So you can see that number one it didn't explain snail's law to both models. It gave it like okay this is what I want to explain and add these features. So let's see what it was able to create. So the new model only took about 16,000 tokens while the old model took 38. So almost half of amount of tokens was used in the new model. Tool calls were 5 to 14. So that's massive. And lines of code once again a common pattern that we're noticing is that it's much lower compared to the previous one. Then when we look at the application itself. So the UI is obviously nice and clear light as he explained it. Then we have the features here. media one index you can increase and decrease this then you have the visualizer here you can see that if it comes out this angle what's going to happen in median two versus that versus the old model where we see that we have a similar feature but it feels a little bit less you know professional compared to the other one we have this stuff going on as well uh so it's not bad like the old models are not bad obviously but we can see one of the biggest changes is not on the front end side of things but more on the backend side. A commonality between all of these examples is always less tokens used about the same or less tool calls but all the time less lines of code as well. And we can see that in all of these example so yes GPT 5. 1 Codax max is definitely much more efficient much more intelligent and definitely costefficient for a lot of people. If

Outro

you enjoyed this video this is what we do here. fast, clear updates on the biggest moves in AI. If you want to stay ahead of everything happening in this space, make sure you're subscribed. And if you want the hands-on side, demos, tools, workflows, and everything developers can actually build, check out the World of AI. We also run a simple no noise newsletter that gives you the most important AI tools and updates in just a couple of minutes. Subscribe here. Follow World of AI. Join the newsletter.

Другие видео автора — Universe of AI

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник