What You Should Use GPT-5 For & More AI Use Cases
31:14

What You Should Use GPT-5 For & More AI Use Cases

The AI Advantage 08.08.2025 41 110 просмотров 1 342 лайков обн. 18.02.2026
Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
Start using Warp today and use code ADVANTAGE to get your first month of Warp Pro for only $5! 👉 http://go.warp.dev/advantage In this video Igor showcases the best use cases for GPT-5 that he and The AI Advantage team have found so far, plus showcasing plenty of new releases in the AI world like ElevenMusic, Claude Opus 4.1, Runway Aleph, Genie 3 and more. Enjoy! Free AI Resources: 🔑 Free ChatGPT Prompt Templates: https://bit.ly/newsletter-aia 🌟 Tailored AI Prompts & Workflows: https://bit.ly/find-your-resource Go Deeper with AI: 🎓 Join the AI Advantage Community: https://bit.ly/community-aia 🛒 Shop Work-Focused Presets: https://bit.ly/AIAshop Links: https://claude.ai/public/artifacts/46f12b47-6960-4383-9047-dc741d124297 https://openai.com/open-models/ https://x.com/chaitualuru/status/1952174534142067092 https://x.com/runwayml/status/1948786648537595911 https://blog.google/products/gemini/gemini-2-5-deep-think/ [https://openai.com/index/how-we're-optimizing-chatgpt/](https://openai.com/index/how-we%27re-optimizing-chatgpt/) https://www.kaggle.com/blog/introducing-game-arena https://app.runwayml.com/video-tools/teams/aiadvantage/ai-tools/generate?sessionId=8ad2027c-4bdd-4b89-a641-d40629348341 https://drive.google.com/drive/folders/15lwRnYTcvvdxFGyFw1_0aOsTncwpqG3n https://www.kaggle.com/game-arena https://www.kaggle.com/benchmarks/kaggle/chess-text/versions/1 https://drive.google.com/file/d/1kLOXtzXH9HrsE3eezbGKHmByDdWuXKxa/view https://www.anthropic.com/news/claude-opus-4-1 https://x.com/AnthropicAI/status/1952768432027431127 https://deepmind.google/discover/blog/genie-3-a-new-frontier-for-world-models/ https://blog.google/products/gemini/storybooks/ https://elevenlabs.io/app/music Chapters: 0:00 Intro 1:19 GPT-5 Use Cases 11:50 Warp 13:50 GPT-oss 16:19 Grok Imagine 18:13 Runway Aleph 20:12 ElevenMusic 23:18 Gemini Storybooks 24:27 Genie 3 26:44 Claude Opus 4.1 28:49 Quick Hit News Stories 30:57 Outro Connect with Me: 💼 AI Advantage on LinkedIn: https://bit.ly/AIAonLinkedIn 🧑‍💻 Igor Pogany on LinkedIn: https://bit.ly/IgorLinkedIn 🐦Twitter/X: https://bit.ly/AIAonTwitter 📸 Instagram: https://bit.ly/AIAinsta #ai #gpt5

Оглавление (12 сегментов)

  1. 0:00 Intro 264 сл.
  2. 1:19 GPT-5 Use Cases 2308 сл.
  3. 11:50 Warp 432 сл.
  4. 13:50 GPT-oss 548 сл.
  5. 16:19 Grok Imagine 424 сл.
  6. 18:13 Runway Aleph 458 сл.
  7. 20:12 ElevenMusic 360 сл.
  8. 23:18 Gemini Storybooks 249 сл.
  9. 24:27 Genie 3 496 сл.
  10. 26:44 Claude Opus 4.1 479 сл.
  11. 28:49 Quick Hit News Stories 470 сл.
  12. 30:57 Outro 61 сл.
0:00

Intro

Hey, this week, what is even going on? It's August. Wasn't this supposed to be the slow phase of the year? We got so many super significant AI releases in just one week that well, I'll just have to rush through some of the stories. So, this video is not an hour long. We finally got the GPT5 release, but then also Claude released Opus 4. 1 and Gemini released their Deepthink model. In other words, new and better flagship models out of the three biggest players. And then there's so much more interesting stuff like runway shipping, a feature that is essentially Photoshop for video. What the heck? Didn't expect that. And a music generator that sounds incredible. You haven't heard anything like it before. All of that and more in this week's episode of AI news you can use. Show where I round up all the AI releases. We filter for the ones that are usable now and then show you real world use cases or testing results. So, let's begin by following up with the GPT5 story. I already uploaded a lengthy video covering the entire release and my first impressions on it. If you want to see comparisons between the competing models on both benchmarks and actual use cases, then that video is your friend. You can find it linked in this card. What we'll be doing here today is following up with that video on how other people have been using it and what the general sentiment around GBD5 is so far. So let's switch to my screen and
1:19

GPT-5 Use Cases

look at some practical ways to use this new model which is even available on the free plan now in ways that you might have not considered yet. Okay. And in here I want to hit three main points. First of all the interface because it's a bit different than they actually presented. Look in the pro plan I have three different models to select from. Then I want to talk about some examples of its main use case which is development. I have a really fantastic practical example here comparing it to Claude's results on a very big repo that I'm working on. And then lastly, I want to give you my take on what models I will be using for some of my main use cases day-to-day. Okay, so starting out with the model selector. In their presentation, they essentially said that there's going to be no more model selection, but this is not entirely true. If you're on the free plan, which is most people, then you only get GPT5 and you get this additional option to think longer. So, also the fact that you're not going to be able to select its thinking depth is also not entirely true. Even on the free plan, you can always make it think longer, think a bit harder, which by the way caused a lot of confusion with some people because in some cases it's referred to as reasoning. Here thinking. The technical term would be chain of thought. But then in reality, thinking and reasoning in humans is different from what the models are actually doing here. It's sort of just a term that has been used to express that it's going to be circling back to its ideas and reviewing them in multiple rounds. Either way, there's more options when you upgrade through the plan. So, on the plus plan, you're going to have access to the GPT5 thinking, which just takes more time to think through the answer. And then on the pro plan, which is the $200 plan, you get GPT5 Pro, which they didn't even talk about. And actually, I've been using this ever since release. This model is insane. And I think a lot of the opinions you hear online are not based on this one. Like, just keep this in mind. If you're on Reddit, on Twitter, heck, even on YouTube, most people are not on the $200 plan. They're going to be judging this model with thinking longer, probably turned off. And then yeah, a lot of the opinions now are, hey, it's not as impressive as people think. Well, I would say two things. First of all, they're probably not using one of these options. And secondly, most of those opinions that I've seen over the past day come from people who don't do development at all. That is the biggest selling point and the biggest upgrade for power users of this model. And that's why the two examples I want to look at now will be development related. But I really wanted to make this point because you got to take the opinions of people on the internet with a grain of salt. I mean, if I just look at the video that I uploaded yesterday, the opinions range all the way from, "Wow, great examples. Thank you for showing how it actually performs. " All the way to first 10 minutes are useless. But, you know, one of them probably comes from a person who watched the entire 90-minute presentation. So, they have the basics and they do not need a summary, which literally is timestamped as a summary. So, that's the problem with the internet. But, here we look at actual examples. So, let's do that. Let's move into the second part, which is kind of evaluating its ability to help developers. In short, holy guacamole, it's so good. Like, first of all, I managed to resolve a issue that no other model before this could solve. Well, so for a bit of context, I'm working on this application, which is essentially like an operating system for my company. This is just a little preview, but it has these different modules and then I can combine multiple automations and custom tools and prompts, everything that I have into operating system for my people to actually use without them being developers. And this project has become quite lengthy over the past weeks. Matter of fact, it's 27,000 lines of code. Most of it generated with clot code, but I'm at the phase where I'm actually working with proper developers to harden it and make it solid so I can use it for myself and then also help other companies with actually implementing AI. Now, here's the thing. I took that entire repo, downloaded the zip file, and I wanted to harden the module management, something that I've really been struggling with inside of Clot Code over the past days. I tried Gemini, I tried Claw Opus 4. 1, but none of the results were satisfactory. When I ran this prompt through GPT5 Pro and then I gave it the entire application, it fought for 15 minutes and came up with a fantastic plan. If you're a developer and interested in this, you can kind of pause and have a look at this. Point being, this is an excellent and very realistic plan considering where I'm actually at with the application. So to compare afterwards, I went ahead and ran the same prompt through Claude. Now, one problem there was it couldn't even take the entire context of the repo cuz it's so big by now. So I had to leave out certain files. it did end up working and I used 4. 1 to create the same type of plan. Now, it created a good plan, but look, I think this is where it really gets interesting because what I did is I went ahead and asked Chachi5 Pro to compare the plan it made to my second plan that I took from Claude. Can you compare it to the alternative plan that I have, which one is better and what are the differences? Kind of purposefully leaving a bit of ambiguity around which one is better. Then I pasted Claude's entire plan which is right here. Again, you can kind of pause and look at this if you want some of the details. And here's the difference between the two plans. GPT5 met me where I'm at right now with the app. It gave me a plan that, you know, with developers would be executable with a small team, which is in a few weeks or months. Whereas Claude gave me a plan that I would need 6 to 12 months for. And it completely refought the entire application. It figured that hey, I will want third-party modules, different model packs depending on the user, cross module workflows, an entire marketplace. It basically thought of it as a startup and how it could take it all the way to production, which is not a bad thing in general, but it's a terrible thing for me because I'm just trying to make this application work a little better rather than thinking of a project that I would execute with 10 developers. It's kind of sweet that GPT5 Pro also gave me a plan that kind of mixes those two approaches. But ultimately what I was really looking for without even specifying it was this near-term plan to just harden the app, make it better, make things that are there right now work properly so I can move on to the next phase rather than thinking of the big picture which might be like a one to two-year vision for this thing. And I say that because if I took the plan from Claude and I gave it to Claude codes to actually execute on, it would never be able to implement that gigantic vision. Whereas right now, I'm working through this plan step by step with clawed code to actually implement it. Look, I have it on my second screen. I'm working through this plan step by step. I'll leave a comment below on how that went and if it actually fully worked. Again, I've been struggling to resolve these issues with other models, but I think this is a really good illustration of how these models operate and what I noticed in GP5 so far. It's really good at inferring the instructions that you did not give to it and it's really good at adhering to the parts that you did tell it. It respects every part of your prompt. request and executes on it at a level that I haven't seen before when it comes to code. So for development, I'm still using clot code, but I will absolutely be using GPT5, especially this pro mode to plan new phases to review the code that cloud code creates to create realistic plans. I mean, I guess ideally I would have claude code with GPT5, which would be a different product and no, the OpenAI CLI doesn't work as well. Point being, it's insane at development. And to really drive that point home, one more quick example that actually Sam Alman tweeted. It's to build a little beatbot. And I also learned something while testing this in both GPT and Claude. So, first of all, if you go to the pro mode, you cannot create this interactive apps right in here. When I prompt this in the pro mode, it did everything. Even created the song for me that I can download, but it does not have this ability to create a little interface. So, that's just something to consider. And now let's have a practical look and compare it to the same prompt inside of Claude. Okay, so I suppose I could shift things here, huh? And add a little note. Okay, that's kind of sick. Nice. I like the snare drum there. So, this works super well. Beautiful little interface. Now, let's do the same thing inside of Claude here. They've been the best at these little interfaces, no doubt. And even this publish button is something that OpenAI doesn't have yet. But let's get to the app now. Okay, let's add a triple snare here. Okay, so I like the way this is self-contained and it doesn't go off screen. So I'll give that to the artifact. But then also I kind of prefer the simplistic interface here. No, this is more colorful. Both work equally as well. Ah, there's even presets here. So there's a techno preset. trapped preset. To be fair, I didn't ask for that. Usually Claude is ambitious like that and just creates these master plans that it tries to execute on. So, I think that just reflects the first point that I made really well. But yeah, both of them did really well. I mean, if I had to pick one based on this prompt, I mean, with the ability to add new instruments here and the music sounding better, I got to give it to GBT5. Interesting. So, let's round out this segment for now. Obviously, I'll be following up in the coming days and weeks with more videos, more comparisons, but as of right now, when it comes to these five use case categories, what would I be using for writing? I really like GBT5. I think it sounds great. I haven't fully made up my mind it if I like Opus better, but I just use GPT all the time and it's right there. So, I think for writing, it's going to be GBT5. For business use cases, 03 was actually my go-to. Now, with GPT5 being similar to 03 and this ability to do GPT5 Pro in here, from what I've tested so far, I don't see this changing for business, marketing, sales use cases. GPT5 still, it's a slightly better free and I love that. So, there you go. So, I'm going to be using that. For development, it's going to be a mix. I'm going to stick with Claude Code because just nothing matches its agentic ability to multi- aent orchestration, tool use. I'm sure there's going to be more competitor products in the future, but as of right now, it is cloud code. But as I showed and discussed here, I'm starting to kind of wish I had GPT5 inside of Claude Cod. And I think that's why a lot of people are raving about GPT5 within Kurser and saying it's next level. Look, that's the top comment on yesterday's video, for example, with many more reflecting that sentiment all across the internet. So really, it's going to be a mix of GPT5 and Opus 4. 1 in cloth code. From research, from what I've seen so far, I think still Gemini's deep research is the best product on the market right now. And for coaching and psychological use cases, this thing is super empathetic, as I pointed out in yesterday's video. And this is the one where I don't feel fully confident in making a recommendation just yet, but I like the tone of it. I think most people are going to enjoy it, just like they enjoyed GPD40. And from what I've seen so far, its ability to really meet you where you're at right now, having an understanding for your situation, and inferring some of the parts of the prompts that you might not have specified, I'm heavily leaning toward GP5. the other competitors on coaching or psychological use cases. So that's my GBD5 verdict for now. Now let's look at some other pieces of AI news that you can use because there's a lot this week.
11:50

Warp

Okay, so next up we're going to talk about a term that was brand new to me and that's because it's something that hasn't existed until recently. So you've probably heard of an IDE, an integrated development environment. That's what developers use to edit their code. But have you heard about an ADE? It's a new term that is going around for software that isn't just designed for humans, but it keeps agents in mind every step of the way. So instead of an IDE, we're talking about an AD here, an agentic development environment. And one of the biggest companies in the space right now is Warp, the sponsor of today's video. Warp is a top coding agent with really impressive benchmark results on the ones that matter. top five on Swebench verified and number one on terminal bench which even beats out some of the big players like Clot Code and Gemini CLI. So the unique thing here is that Warp lets you develop with multiple agents and in that sense it's a lot like Claude Code but with way more userfriendly UX and code editing capabilities because it's a standalone app rather than being locked into your terminal. So it's not an IDE with integrated AI like cursor and it's not a command line interface like clot code. It's a true ad combining these two models into one and it's built from the ground up with agents and humans in mind. So let me tell you over the past few months I've been using claude code a lot but not just that warp is more beginner friendly but it also brings several things to the table that claude code doesn't. Most significantly you can pick from various models. So if enthropic might be having an outage you can use other models or you can use something like Gemini 2. 5 Pro for its strengths in reviewing code. The entire visual interface is way richer because it's a native standalone application and it's built for this multi-step workflows where you build, test, refactor, push to get, do it again. Want to check out what this whole AD thing is about for yourself? For a limited time, our friends over at Warp are giving away the premium pro plan for only $5 for the first month. If you want to check it out, go to the link at the top of the description and use the code advantage to redeem that offer and get started with Warp today. And now, let's look at the next piece of Aanos that you can
13:50

GPT-oss

use. So the next story here are open-source models from OpenAI that came out earlier this week. If you missed this, this is actually a big deal. And this one is kind of getting buried under all the other significant stories this week, but it actually matters because yes, we've seen a lot of open- source openweight models that you can download, run locally, and use fully privately, fully offline. We've seen this. I've done tutorials on how to use them on this channel. There's a plethora of them with new ones coming out every week. But these are different because they passed the vibe check, they passed the smell test, whatever you want to call it. People are absolutely in love with these models. The sentiment across the internet is extremely positive towards them. And rather than just going ahead and running a few test prompts in here, which you know I can do and you can do and we could spend a few minutes looking at some results here. These models don't do anything groundbreaking, but they do many things better than all of the competition. namely, they're as efficient as ever, with one of them being 20 billion parameters and the second 120B parameters. But only a fraction of those are active parameters, which means that the small model even runs on some phones. And again, rather than just spending a few minutes prompting random stuff here and being like, "Yeah, they're all right. " The bigger one is almost atree level when you look at the benchmarks, which is seriously impressive cuz almost leading the pack in some of these benchmarks. And anecdotally, I'm a friend of mine and a friend of the channel, Mark, that has been helping accountants build their businesses for most of his life now, actually was struggling with one question recently, and that is how to use a local model to actually perform data analysis and work with numbers reliably. I was talking to him and he told me that he tried all these models out there with the common recommendation being the GMA models, but they just did not work reliably enough. And guess what? this week when he tried the same use case he's been looking for ever since the release of chat GPT and now that he's using the smaller model here the 20B he managed to get reliable mathematical results out of the models actually calculating revenue correctly based on large tables now this even surprised me but when you look at the benchmarks like this thing has almost perfect scores on competition math and if you crank up the reasoning it has a chance to selfcorrect and all of a sudden these models can do mathematics whereas none of the other open source models he was trying and he said tried almost all of them have been able to do this reliably. So I just kind of wanted to leave that little story here for you to consider and generally the sentiment around these just has been that yeah these are the best open source models out there that you want to be using if you get the choice. So if you're going to be building something definitely consider these. Okay, let's see what's
16:19

Grok Imagine

next which would be Grock Imagine. This is a image and video generator by Grock and we did thorough testing on this and let me just show you the results of this testing without wasting too much of your valuable time. So let's start with image testing. So as you know we put together this sheet and publish a monthly ranking on the best image generators over in the free area of our community. As a part of this sheet we now also have Grock imagine and you can see that on logo design. This doesn't even come close to some of the other models here. I think these are some of the most recent and significant ones. If you want to compare it with Flux, you can just scroll to the right here. I'll leave a link below. Point being, yeah, at that it's pretty bad. Portrait photography, super blurry, not even close to realistic. The cinematic still actually looks really good. That's the one where I would say, okay, good job, Grock. It does text well, as do all the other top models now. And the comic is okay. I don't know. I'd say just another pretty good image generator. Nothing that blows me away, but that is kind of hard at this point. So, let's have a look at some of these video clips on some of these benchmarking prompts that we run on every single video generator that comes out. Then, we also publish a monthly rating just as we do for LLM platforms like Chat GBT2. You should check those out, but for now, we're looking at these videos. Actually, super realistic. Now, the weird thing is they do include sound, but the sound is just kind of trash. Let's be real. I mean, they just added some random sound effect. This has nothing to do with the quality of V3. Car looks good. I always look at these wheels. They kind of show you the quality of the generator immediately. And even this robot looks good. I'm usually most critical of this shot cuz it shows both water and human anatomy. Both of those did okay on here. And that's kind of the story of this entire video and image generator. It's okay. It's good. It can keep up with some of the better models, but it's definitely not surpassing any of them. And that is sort of it for Grock Imagine. Nothing crazy, but okay. They're in the game now, too. Whereas the next story is way
18:13

Runway Aleph

more interesting in my opinion. When I saw this, I immediately wanted to try this and now I finally have the chance. So, I signed up for a premium account to runway alf. And this is essentially Photoshop for video. Doing the stuff that this can do with some text prompts is extremely tricky with visual effects. So, let's not waste any more time. Let me just select ALF here and do something like add rain to the scene. And I just took some random video of me presenting something here in the studio. We'll add rain to the scene. And then we'll do one more. Set the microphone on fire and generate that. What I'm going to be looking out for is the physics of the rain. Is it going to pick up on the fact that there's like a floor in the background? Is it going to be bouncing off my chair? I'm really curious to see this. The demo video of this looks super impressive. Have you seen this? Like seriously, you can do different angles. It's kind of crazy. Attach the camera to the suitcase. That's great. You can change the lighting. Oh, I love this art style actually. So, let's see for ourselves. Okay, they're done generating. Let's have a look. Oh, so it just added a window with rain. That's not bad. It actually looks like a window with rain. Maybe my set should have a window with rain. I kind of like the idea. It's not exactly what I expected from it here, but then I wasn't too precise with the prompt. I just said add rain. Not that it should add rain anywhere. And this one set the mic on fire. Look at that. Look at what it does to my face. There's reflections. I mean, there's a bit of morphing, but this is actually pretty damn good. Wait, let's do one more. Okay, I'll say add heavy rain to the entire scene. Let's go. Oo, here comes the heavy rain. That does look heavy. Whoa. Yeah, I mean it's heavy rain. — I am under the water. Please help me. — Fair enough. I like that it kind of like mirrors my body and reflects off of it. Interesting. I mean, this is certainly not bad. Like this fire effect. You can really see the microphone melting here. It's not perfect, but I don't think it needs to be. It's visual effects for next to no money. That's kind of it. And except for the waiting time, it's almost instant and it's super low effort. going to think about creative ways of using this in coming videos. Let's see what's next. So, for the next story, I want to
20:12

ElevenMusic

show you a category that we haven't been looking at all over the past few months, and that's AI music generation. And before you skip to the next story because you feel like all the music generators kind of don't have a used to, hear me out, or more like hear the new 11 Labs music generation model out because this thing is well, I'll just say it head and shoulders above anything that we've heard so far. The clarity of the voices, it's kind of unmatchable. — So, let's just quickly generate something here and have a listen together. Cinematic orchestral epic vocal opening scene with choir and female solo. What I'm looking for here is the vocals because let's be real, soundwise, a lot of these other music generators were kind of almost flawless, I would say. But the vocals were the part where you could really tell. All the door singing noise from silence away. — Okay, sounded great. Maybe one more. Amy Whouse style female singer on a Gunkk style beat. Oo, that's a combination I've always wanted to listen to. Okay, they needed to change the prompt a little bit. Fair enough. Cannot use popular celebrities as I would expect. Now, let's have a listen to this. Ooh, funky. I like this. Night falls on the city lights in my eyes. Cruising through memories every turn. That just sounds good. Chasing shadows where you meet every moment and feels the same. If it ever feels the same. Yeah. Hold me in the warmth of your midnight blow. Let that soulful river take me where I need to go. Keeping this groove. I'll find the truth untold. In your arms I'm anchored, never letting go. — I mean, let's be real. You hear this on the radio, you're not thinking, "Ooh, that must be AI. " No way. It just sounds so good. — This is great. — Come on. So, look, I'm no music expert, but god damn, have I tried all of these generators. And yep, this one to me sounds like the best available today. Let's see what's next. So, next up, we
23:18

Gemini Storybooks

have a cute little app from Google. And this is something we haven't seen before. And while not revolutionary, I did want to show you. It's called Gemini story books. And we did some testing here. So you can see some of the inputs that we did here. Very simple. Create a story book about why Pluto is no longer considered a planet. And then the reason I say we haven't seen this before because sure AI generates a whole bunch of things but have you really seen text to story book yet in this format where it generates the full book with illustrations cover everything sort of ready to be printed? I know I haven't and I wanted to show you because this little story about Pluto is actually great as you would expect it to be. But the illustrations are even better. They're what really set this apart. And honestly just reading through this little book right here I kind of enjoyed it. And you can make this too inside of Gemini. So Google has been really going strong with all of these experimental implementations. Last week Google Opal now this story book notebook elem also recently got the video overviews just features we haven't seen out of any of the competition and I always just love to see and share that back with you. So yeah, if you want to create little story books yourself, you can do that in Gemini. Now and talking about Google
24:27

Genie 3

innovations, well let me just add on this story. This does not really fit the show, but I do need to talk about it because we usually only speak about news that you can use, applications that are available today and practical use cases that you could consider for yourself. In this case, we're talking about Genie Free, which is not available today, but this is a brand new category of AI product. And not just a little story book generator, but this is a text to world generator, literally. And these worlds are interactive. So, you might have seen the previous versions of this like Genie 2. They got a lot of attention already, but this is on a whole new level. So, it generates these interactive worlds that you can then navigate with arrow buttons and it essentially operates like a computer game, but all of it is generated. And the interesting thing about this, I really thought about this is technology looking for a use case. We don't know what this is going to be good for yet. Sure, if you can do this in real time, VR headsets are kind of going to completely pop off because, well, you would be able to generate worlds on demand. But as of right now, I can't really think of a way to kind of implement this meaningfully into any workflow that I'm aware of. I mean, we're really at a phase where the technology is getting ahead of what humans can do with it. Even if AI development stopped right now, we would have years and years of figuring out all the different use cases for this tech. And this one concretely just goes beyond anything we've seen it. If this interests you, I would strongly recommend you check out their blog post. There's a lot of little videos that will help you develop a feeling for what even a AI world generator might look like. And while you might not be able to use this today, I think this is going to be a big product category once people eventually figure out what problem that we can't even properly name today this will eventually solve. I mean, honestly, between the two of us, this in the not safe for work category is obviously going to be huge and probably not so great for humanity. But then in gaming, we kind of have better versions of this. So, I don't really see it catching foot there. If you have an idea of what this could be used for, please leave a comment below. I'd be super curious to hear kind of your takes on what a world generator could be used for. And I think I should add the limitation that I was just speculating when I said that it could generate in real time for VR headsets. Right now, it can't. Nevertheless, some insane results that you should totally be aware of. Let's
26:44

Claude Opus 4.1

see what's next. And that is the Opus 4. 1 release. I can't believe this is sort of a side story. this week, but it is because this is literally my most used model. Heck, if I look into my second screen, I have a version of clawed code running here with multiple agents working in tandem since I don't know like half an hour. This is probably going to run for another 30 minutes until it completes what I gave to it. And I am actually using the Opus model on this and they just upgraded that. So, it has slightly better results on software engineering tasks and it's just a welcome upgrade to anybody who uses the claw models. probably not a reason to change just because they upgraded from 4 to 4. 1. But for everybody using clot code or their web assistant, this is a very welcome update. I think the most important sentence in all of this is this one. We recommend upgrading from Opus 4 to Opus 4. 1 for all uses. So next time you're building something, make sure to use the new 4. 1 API. And in the web interface, it's just default. And they pointed out that especially the search has been upgraded, which is one of the biggest parts of cloud code. It basically looks over your entire codebase and then creates to-do list and uses multiple agents to get things done for you. But all that starts with the agents actually understanding the codebase which this should help with. And as I'm using cloth code on a daily right now by the way it's absolutely insane and some of these multi- aent workflows I'll be teaching in the community soon and I'll also be creating YouTube content on it especially on how to get into it as a beginner even if you're not a software engineer. I think there's a lot to be unlocked there. You can do these multi- aent workflows like have a researcher, have a writer, have a reviewer, researcher goes out, finds data on the web, writer writes something up, editor comes up with notes and then the writer rewrites based on those notes. And all of that can be done in one prompt and it just like works for half an hour and the results are better than anything you will ever achieve from a single prompt in chat JPT. And that's a writing use case. Development obviously is insane and people even use as a personal assistant. Again, more content on that both on the YouTube and especially in the community cuz this is sort of more intermediate to advanced stuff. But yeah, Cloud Opus 4. 1 is here. Use it if you've been using Opus 4 before. It already was one of the best models out there. Now it's even better.
28:49

Quick Hit News Stories

Okay. And for this week's quick hits, we have a few short but interesting stories. Starting out with chat shipping a feature where if you're chatting for too long, it's actually going to check in on you and suggest a little break. I like this direction that they're taking. I think some people spend too much time at their computer and I don't think humans should be chatting with chatbots all day, although they are super helpful. Interesting update. Expect a little nudge in the right direction if you spend too much time on it. The second one should not really be a quick hit. But I'll be honest with you, I just don't want to upgrade to the Ultra account again cuz I don't use Gemini outside of running deep researches here and there. But they essentially shipped their flagship model finally, Gemini 2. 5 Deep Think, which if you remember from their keynote was that super impressive model that they teased at the beginning of the presentation, but then just said that it's coming later. Well, later is now, so it's out. And on competition math, it gets 99. 2% winning the gold medal in the math Olympics. That's better than humans perform. It's good on humanity's last exam, but GPT5 has it beat now on that and some of the other benchmarks here. I believe GPT5 is around 40% on humanity's last exam now being state-of-the-art. So, this model is certainly impressive, but honestly, with all this competition of OPUS and GPT5, it's just another really impressive thinking model that most likely does nothing so well that you just need to immediately drop the $200 to get access to this. I mean, GBT5 is accessible for free today. So, I think that should change up the way these things are priced, but then OpenAI might also be shipping more specialized models in the future that they'll be charging more for. Who knows, this thing is out. If you're on the ultra plan, you're going to want to use this. The next quick hit is just something I wanted to show you. So, Kaggle introduced various game arenas where they let models battle it out in different games. And they started with chess and you can just look at this bracket that it's playing for. Unfortunately, GBD5 is not on here yet, but yeah, actually this finished. When I was looking at our research, this wasn't done yet, but apparently 03 won 40 against Grock 4. If you want, you can even look at the replay of the chess match between the two models. I love creative evaluations like this. Minecraft bench chess text input. Great stuff. Now, I want to see 03 battle against GPT5. Maybe I'll follow up with that next week. And that's pretty much
30:57

Outro

everything we have for this week. Boy, this has been a bit of an exhausting one. I hope you found something that will upgrade your day-to-day workflows. If you enjoyed this video, don't forget to leave a like. It really does help the channel. And with that being said, my name is Igor and I hope you have a wonderful week.

Ещё от The AI Advantage

Ctrl+V

Экстракт Знаний в Telegram

Транскрипты, идеи, методички — всё самое полезное из лучших YouTube-каналов.

Подписаться