Make some sense of this AI Agent madness with Hubspot's "Master AI Agents in 2025" playbook! It's completely free, grab your copy today 👉 https://clickhubspot.com/d630d7
OpenAI and Google both put out updates to their flagship models this week, so we'll compare and contrast to figure out which you should use and why. ChatGPT and Claude both got some minor but much appreciated upgrades, Mistral released their first thinking model and it's FAST, and Apple...well Apple didn't really do anything, and we need to talk about that too. All that and more in the video! Enjoy.
Links:
https://x.com/OpenAI/status/1932530409684005048
https://help.openai.com/en/articles/6825453-chatgpt-release-notes
https://x.com/genspark_ai/status/1932473797548159006
https://www.genspark.ai/browser
https://x.com/sophiamyang/status/1932451856447586312
https://chat.mistral.ai/chat/664c928c-56b4-4279-beea-ee1fa2d45691
https://academy.runwayml.com/ways-to-use-gen-4
https://www.youtube.com/watch?v=NTLk53h7u_k
https://x.com/AnthropicAI/status/1930671235647594690
Chapters:
00:00 What’s New?
00:55 OpenAI o3 Pro
08:06 Hubspot
09:46 More OpenAI Updates
10:45 Advanced Voice Mode Update
15:10 Genspark AI Browser
19:26 Magistral
20:27 Claude Projects Update
21:07 Apple WWDC 2025
21:42 Runway Use Cases
#ai
Free AI Resources:
🔑 Get My Free ChatGPT Templates: https://myaiadvantage.com/newsletter
🌟 Receive Tailored AI Prompts + Workflows: https://v82nacfupwr.typeform.com/to/cINgYlm0
👑 Explore Curated AI Tool Rankings: https://community.myaiadvantage.com/c/ai-app-ranking/
💼 AI Advantage LinkedIn: https://www.linkedin.com/company/the-ai-advantage
🧑💻 Igor's Personal LinkedIn: https://www.linkedin.com/in/igorpogany/
🐦 Twitter: https://x.com/IgorPogany
📸 Instagram: https://www.instagram.com/ai.advantage/
Premium Options:
🎓 Join the AI Advantage Courses + Community: https://myaiadvantage.com/community
🛒 Discover Work Focused Presets in the Shop: https://shop.myaiadvantage.com/
So, this week in Generative AI might not have been the busiest, but it was certainly the most relevant one in a while because both Google and OpenAI shipped new state-of-the-art models. And today, we'll be looking a closer look at the new 03 Pro and the brand new version of Gemini 2. 5 Pro. And Chipy shipped so many small but significant updates this week. I can't wait to tell you about those, including a overhaul to their voice mode, projects, and more. And afterwards, we'll have a quick peek at a new Agentic product that is essentially Google Chrome with an AI agent built in. Quite interesting. And a few more quick stories in this week's episode of AI news you can use. The show that rounds up all the AI releases of the past week and filters for the ones that you can actually use and the ones that actually matter. Just a quick side note before we begin. As you might already hear, my voice is a little deeper than usual, and that's because I'm a little sick right now. But hey, we haven't missed a single Friday upload in more than one and a half years now. So cuz some stupid sickness is not going to stop me, isn't
it? And with that being said, let's get right into the first and the biggest story here. 03 Pro is here finally. Now, let me start with the key facts here and then let's move into how this actually performs in the real world. So basically 03 Pro is their new state-of-the-art model. It's still 03, but it has a lot more compute, meaning it will run for way longer. And if I say way longer, I mean it. I'm going to present you some testing data, but for example, in some organizational tasks that we tested this on, 03 took 21 seconds and 03 Pro took 10 minutes. It's not like it's 10 times slower. Sometimes it's up to 50 times slower than what you might be used to from 03. But what you get is more reasoning, a little bit more reasoning. That's at least what the data suggests here. As you can see from these evaluations, and I won't be focusing on these here too much, the jump from 03 to 03 Pro is a few% in most cases. It's not like this day and night type of difference. Right now, it's available to all Pro users, the $200 plan, and to all teams users where you need multiple seats signed up. That's actually surprising. and they've been shipping more and more to the team's plan, but you can use it through that, too. Normal pro users at $20 a month do not get this as of now. And you can also access this through the API, although that is just prohibitively expensive. Okay, so let's talk about the actual model and how it performs. And I'm going to tie that together with the second story of this video, which is the Gemini 2. 5 Pro. Concretely, the 5th of June release, they updated it to a new version. So that's Google's best model, and often people consider that the main competitor to Open Eyes models. The third one in that category being Anthropics Claude 4. And the way it works is simple. If you're on the pro team themes plan, you just select 03 Pro. This replaced 01 Pro, which is not available in the model switcher anymore. And if you run a prompt like how many hours are in strawberry, this damn thing is going to run for 4 minutes straight just to figure out the answer, which is free. Yes, that's correct. But then immediately you might be like, okay, but I could get the same answer from 03. But then hold up, that's not actually the case. I actually expected it to get it right. But okay, second try. But as you can see, even from these two tries with 03, one of them got this simple question wrong. And that's because of the fact that they accelerated 03. There was a discussion on Twitter. They claim it's still the exact same model. It's just faster. But as you can see, 03 now answers in seconds, but then you're kind of running the risk of it thinking that yeah, there's an extra R. Now, let's move beyond this basic example. I just wanted to demonstrate the runtime length with this. Now, let's talk about the interesting part, use cases, because let me just show you what we did here. We've been kind of playing with this thought over some time of mapping the main use cases that we and our community that watches this channel might have and we're still developing the data set here. But basically this is what we have right now. We mapped various prompts against the most used AI use cases as they have been recently published by anthropic study on how people actually use their product. So there's things in there like therapeutic use cases, enhancing learning, generating ideas or organizational prompts. This data set is not really focused on coding although we have one prompt in there. But the base idea is we take some real world prompts and then we manually rate the results on relevancy and how well it adhere to the prompt that we gave it. And by doing that for now with O3, 03 Pro and the new Gemini, here are our findings. So let's first talk about O3 versus 03 Pro. In many cases, it actually produces the same result. And by many cases, I would say four or five out of 10 times, you're just going to get the same thing out of 03 Pro than 03. The difference is going to be this though. O3 Pro is going to run for 13 minutes while O3 runs for 1 minute. Now, what about the other cases though? Well, there is a tangible quality difference there. If you have knowledge around what you're prompting for, you can really discern the difference between okay, this is pretty good and a wow response. And often that's a fine line. It might be one fact that it incorrectly inserts or interprets. But what we found in our testing is that 03 Pro is just one, so much more reliable. Two, it fact checks a bit more and hallucinates a bit less. And three, at the things that Ofrey was already bestin-class at, like I always used to say, all the business related tasks, all the planning related tasks, O3 Pro is the undisputed king in that category now. And I understand why some people are actually referring to the system as AGI already. If you define that by being smarter than a human, then I can't speak for everybody, but it's certainly smarter than me in many of these cases, especially when only given 10 minutes to work for something. Okay, so that's O3 versus O3 Pro. Let's switch gears into talking about O3 Pro versus Gemini. Now, there's certain things that Gemini is certainly better at, and I wouldn't even argue. That's why many people flock over there. I think for anything coding related, I would not be recommending either O3 or O3 Pro. I think the battle there is between Gemini 2. 5 Pro and Claude Opus/Sonnet. Really depends. Different people have different preferences, and I guess if you're a developer, you just have to try those two for yourself. But here's the thing that surprised me on some prompts that we ran for generating ideas. Gemini 2. 5 Pro consistently performed better than both 03 and 03 Pro. And this has been the case before already with 03 for generating ideas. My personal go-to was always cla although we haven't included it in this data set here yet. I just personally took some idea generation prompts and then compared claude to Gemini 2. 5 Pro and 03 Pro. And yeah, when it comes to ideas, it's really between Claude for Opus and Gemini 2. 5 Pro. I personally prefer Claude for Opus. But when it comes to brainstorming brand new ideas, well, I'm not saying O3 is bad. that is just not the best you can get. I think objectively and multiple team members that worked on this agree Gemini 2. 5 Pro or Claude are the best for finding novel ideas. But, and here's the big butt, and this is what I want you to take away from this video. When it comes to anything organizational, anything business related, anything that involves strategic planning, anything that involves creating schedules, anything that's business focused, Oree Pro is the king. So we have these organizational prompts here and you can see a clear difference between the rating of 03 and 03 pro down here. Here just from these results you can see the consistency five out of five for 03 pro in every case. Healthy living by the way is kind of giving it a bunch of personal context and then planning a lifestyle shift over the next few weeks. Again if it's planning related pro just delivers. This is for example one prompt on designing the week. And here, this little change in a bit more quality and a bit more thoughtfulness from 03 to03 Pro really made a difference. This weekly schedule is the best I've ever seen and the best compared to all the other models. So, that would be my conclusion for the model. If you're looking for the best model in the world to plan with, you got to be using this. And if you can justify the team's plan, which I suppose starts at $60 a month cuz you need at least two seats and it's $30 at all, then get that for you and another teammate and just start using this. Start tinkering with this. And whenever you have some sort of business or planning challenge, just throw it at it and see what you get. It is incredible. And 03 was already incredible. This is even better. And the new Gemini 2. 5 Pro is as good as always. Great at coding, great at generating ideas. So if I had to pick one overall model to use from now on forever and only one model, I mean, that's a tough ask. My daily usage right now is between 03 Pro and Claude for Opus. But if I just had to pick one O3 Pro really gives me the confidence and information it gives me. And in my super busy life, that extra bit of confidence and knowing that it's not going to tell me that strawberries have two Rs ever is to me worth the money. But you got to make your own decision. And I hope the segment was helpful in that. And I might follow up with a separate video just focusing on the model comparisons as I dive more in the singular prompts and how we rated them, etc. If you've been
keeping up with this channel or really any news in general, you surely came across the term AI agent. And if you were slightly or very confused by this entire AI agent madness, don't worry, you're not alone. This term has been the most confusing thing in the entire AI space. And that's essentially because definitions vary, and there's many reasons for that. But today, I wanted to share a playbook called Master AI agents in 2025, the strategic advantage. This playbook is made by HubSpot, who I'm partnering with on this video, and it consists of two guides that actually complement each other. So, let me tell you about them briefly. The first guide is called AI agents unleashed playbook to success in 2025. And it gives you real AI agent implementation stories straight from HubSpot seuite. It's kind of like a sanity check on how AI agents actually can add value in 2025. And it also shows off common mistakes that people have already made with AI agents so you don't blow your own budget on trial and error. And it includes this decision tree that I personally really like titled, is this an AI agent job? helping you quickly make a first decision of if this is a task that you should automate with an agent or keep in the hands of a human. The second guide is titled how to use AI agents in 20125 and it's just a logical next step turning the insights from the first part into actions with helpful repeatable checklists that include the essentials like assessment, implementation, integration, and measurement. This one's short and sweet, but really focuses on an implementation, but it builds on the first guide. So, make sure to start there. Both of these are 100% free. So, if this caught your interest at all, make sure to hit the first link in the description below to grab this playbook and to move you forward on this journey of turning AI agents into real strategic advantages. A big thank you to HubSpot for the free guides and the sponsorship of this video. And now, let's get back to more AI news that you can use. All
right, so next up, we're going to talk about some extra updates to chat beyond the 03 Pro release, concretely to the projects feature. And this is my personal favorite. I use projects for most chats that I initiate in chat and they added a few things. You can now run deep researches inside of projects. The new and updated voice mode is also supported in projects finally. It was a bit of a mystery why that wasn't there yet. And they have an improved memory function that also looks into the chats inside of the project. This wasn't the case before. So that's all fantastic. Plus, it works better on mobile yet. These were all sort of quirky restrictions that made no real sense inside of projects. And I'm really happy to see they're gradually ironing these out. Plus, they shipped one more update to Canvas. Holy moly. ChachiV is getting so many upgrades this week. Incredible because they updated canvas and now you can actually download the results just like they added to the deep research function recently. You can pull a PDF or markdown file if you wanted to use it as context for another project or a new chat. That's what you would want to use. And if you're using Canvas to generate code, you can pull it in a native format right away.
Had to pick up a quick honey and onion delivery. Okay, back to it. The next story is something that I'm really excited for too because I love using these voice assistants and all voice related features just because it's a way faster way of inputting things into apps like chat GPT than typing it. And I often find that I think faster than I'm able to type, especially on my phone. But if I use my voice, I can keep up with my thoughts. And last week, 11 Labs released these voices that were just above and beyond anything we've ever seen from voice assistants before. And turns out less than 48 hours later, OpenAI followed up with their own release, their updated voice assistant. So this is me trying that for the very first time, actually. So yeah, let's just give this a go. Hey, how are you doing today? Hey there. I'm doing great. Thanks for asking. How about you? How's your day going? N I'm a little sick, but it's all right. Now, tell me this. How good are you at interrupting? Oh, no. I'm sorry to hear you're feeling under the weather. Uh, as for interrupting, I try not to interrupt people if at all. I'm all about listening. Yeah. What if I interrupt you? Well, if he interrupted you, totally I'm glad you think so. It's ain't about having a fun and easy conversation. It got way better at interrupting. Okay, cool. So, last week we did a little multilingual test. Let's do that again. I want to see how you respond to me dynamically switching to other language. Ready? Absolutely. I'm ready whenever you are. Okay. Uh I meant languages. And what we're going to do now is Okay, that German is a bit broken, but it works. Okay. That got so much better. Okay, now let's get back to English. All right, back to English we go. That was really fun and you definitely kept me on my toes there. Let me know if you want to switch it up again. God damn. Now tell me, what's your hobby in all three languages? Your favorite hobby. Oh, that's a fun one. In English, I'd say one of my favorite hobbies is reading. Uh, in German I'd say and in slowback, god damn, that's really good. So, they have all of these features built internally. They're just waiting for the competition to ship something that's similarly as good and then they just release their own stuff. Also, did you notice that it's sort of like switch up the voice to be a bit more, how do you say, sandpapery to kind of match my sick tone today? Was that a coincidence? I want to try that one more time to see if I just start a brand new chat if it does the same thing. Hey, what's up? Hey there. It's going great. Thanks for asking. How about you? Oh, very well. Um, how you doing? I'm doing really well. Thanks. It's always nice to chat and catch up. Are you uh working on anything exciting today? Yeah, I think that's actually a thing. What do you think, though? I mean, there was this like second of like this sandpapery raspy voice now, too. But I do think that it actually switched its tone to be more empathetic towards my current state. Interesting. Multilingual capabilities, incredible. The switching actually worked even better than 11 Labs last week, I got to say. Just a bit better. Both were really good. And finally, you can kind of hop into it talking and it just immediately stops rather than talking for another second, which just makes it awkward. That was my biggest gripe with this voice assistant in Chat GPT and all other apps actually. Does this mean this will be a legitimate universal translator now? because the interruption thing was really preventing it from being practical and the language switching was not as smooth. So yeah, if you want to speak to somebody with a different language, I think you could just prompt this to be a universal translator for you. I mean, just use that word. Just say act as universal translator. I speak English, they speak German. Let's go. And it works. Maybe that's worth following up with in a different week. But yeah, impressive updates from the advanced voice mode if you kind of abandoned it because it wasn't good enough. To me, it seems like this is worth revisiting. Again, if you're getting something out of this coverage, I would really appreciate if you hit that like button. Every time I'm surprised by how much every single like actually helps the channel. And now, let's get back to the next piece of news
that you can use. All right, so for the next release this week, we have an interesting one because Gen Spark AI is shipping like crazy. Now, we started covering this company ever since the release of their aentic product. It's really the sort of new category. I think the first one showing off the capabilities that went really viral was Manos, if you remember that. But from all our testing, long story short, Gen Spark is the one that is actually the most useful. And they released a brand new thing this week called the Gen Spark browser, which right now seems to be randomly switching between dark and light mode. That's why you might see the lighting differences here on camera. But what this is a fork of Google Chrome that has this agent built into it and you can do various interesting things. For a comprehensive list of everything they claim that it can do, you can check out their release post. What I want to try out here is a simple research workflow. So if you download the browser and you log in, I'm on a free account here by the way. It gives you a few tests right away. This is the most interesting thing that they ship now. MCP tool integration. So if you go to choose tools, you will not just see the standard integrations that we now have across most AI assistants platforms, whatever you want to call them. But we also have all of these MCP tools. And there's some recommended ones by them like the Twitter content explorer or a browser automation tool or Reddit and Hacker News integrations. But there's also all of these community ones, over 600 of them, even with a server that helps you determine which MCP servers could work for your client. Haven't tried that yet, but that sounds interesting. Point being, there's just a bunch of these extensions that you can add to it that are all set up in here for you. And all it takes is a simple click. So, what I did here is I enabled the Reddit MCP, the Hacker News MCP, and I gave it access to browser automation and the Twitter content explorer and web search. In my first run, I tried notion, but my database just has thousands of entries. And I think it got a little tripped up along the way, so I had to abandon that first run. But the second one actually worked. And within a few minutes of me running this prompt, find this week's AI news and use cases from my Twitter feed, the chatbt, Reddit, and Hacker News, it actually went to all those websites. And you can see this in great detail here. For example, here it used MCP to check out the Reddit board. And then you can see every post that it looked at, all the comments that it considered, etc. Did the same for hacker news, ran a few more times and based on the research across my Twitter feed, chatbredded and hacker news and recent AI news, here's a comprehensive overview of this week's AI news and use cases. And now you don't just get a simple web search, you get something that is enriched by the post on Reddit, the post on hacker news by the accounts I follow, which I curated over years on my Twitter. It read all of that and gave me an overview here. So let's see what are the product launches this week. Gemini 2. 5 Pro, that's right, 11 Labs is last week, but that still counts. It's less than a week ago. Okay, OpenAI image Jango better. I didn't know that. I need to look into that. And yeah, things that we covered in this video like enhanced voice features by OpenAI and file access that we covered last week. Now, honestly, I don't think this is at a point yet where I could solely rely upon this. I think one of the biggest values we add here on the channel is by actually multiple people researching these stories and then we have a meeting where we discuss all of this too. It's not just about finding it, it's also about forming opinions on it. But initially, this looks really promising. And then if you get onto one of these paid plans, you can do further things like turn it into a podcast, etc. that they showed off in their demos. But this is a very interesting product category that is growing like crazy because players like this are just shipping like crazy. And then the last thing I want to point out before we move on to the next story is that it doesn't just limit itself to this agent that kind of does this research and can use Excel and download video files, analyze them for you, things like that. We covered that in previous episodes. It also includes a bunch of what I would call AI tools that previously were packaged in a Google Chrome extension. So, for example, if you're on YouTube and you check out any video, it gives you these tools where you can summarize. You can turn the video into slides with the click of a button, and then all of a sudden, the built-in tool will just do it for you. Now, after me playing with it for a bit, it wants me to pay, which makes sense. But, as you can see, it actually used this instance of Twitter to look for AI and machine learning news use cases this week. And then it used this actively, so you don't have to use the browser yourself. Now, again, big disclaimer, all of these tools, including this one, they're not at 100% yet. But when it comes to generating a first draft of something like this research, certainly helpful. Creating slides, sheets, amazing. Downloading files in bulk, also amazing. But usually it's these internet research task that this really shines at. And some other use cases might work, but they're a bit experimental. But it's so interesting because you can kind of use this as executive assistant that manages your calendar, looks at your emails, and does internet research. Also, it's just not super reliable yet, but probably one of the funnest things to play with in the AI space right now.
Let's start off this week's rapid fire segment. If you're not familiar, we kind of do a section on all of the pieces of AI news that you can either use or something that is maybe relevant to your usage today, but these are the stories that we don't want to spend multiple minutes on. So, we just quickly go over them. And let's start this week's segment with the quickest release of them all. Mistlei releasing their first reasoning model. Now, while this might not be crushing any benchmark or standing out in any category of use cases, I do want to show this to you in practice. So, I paste my prompt here, hit enter, 3, two, one. And mind you, this is a thinking model. Okay, boom. Done. If this isn't the fastest output of a thinking model ever, then I don't know. These answers usually take 20 seconds to 20 minutes. You can enable it here on a free account. You get free of these. It's got some tool use like web search. Let's just do the classic. Well, that's a lot of reasoning. And that was under 10 seconds with the correct answer. Right, then that's mistful for this week. And another super
quick update which you should be aware of if you're a anthropic cloud user is that the projects which is probably the best way to organize your workspace whether in cloud or chatbt can now hold 10 times more content meaning you can add way more context. Now as per usual whenever these capabilities expand don't expect them to retrieve every single little thing perfectly especially as you max out these increased capabilities. So if there's like a line on page 65 of the document can happen that it doesn't pick it up. And overall with AI tools, we kind of have this rule of thumb of, hey, somewhere between 15 to 25 pages of documents, that's where it can pick everything up. Beyond that, it starts getting diluted a little bit. Nevertheless, clot ship is this and you can put way more docs into it. Now, the
next quick story is not really about AI release, rather about the absence of any. I'm talking about Apple's WWDC. And as you might have heard, the expectations from them are super high, but they're not actually shipping anything yet because they state it's not ready yet. Now, I found this one video from the Wall Street Journal with the most ruthless journalist that I've seen in a long time interviewing two Apple execs, and I strongly recommend you check this out if you care about Apple's AI strategy. There's this moment here where she just doesn't give up and relentlessly keeps questioning them. And Craig is just the fake smile. Great interview, but no releases here yet. Let's look at the
next thing, which is quite the opposite. It's runway compiling every single use case that people have been sharing on Twitter. So, if you wondered what different kind of things you could use AI video for, well, this is one of the best inspiration pages I've actually ever seen. Pattern recreation, background removal, storyboarding, scene expansion, and so much more. If you care about this type of thing, check out this site. All right, and that's really everything for this week. I would really say that if you have access to Ofree Pro or the new Gemini, go ahead and rerun some of your favorite prompts in it. You might just be surprised. And with that being said, I'll take some time off now, get better soon. But nevertheless, I hope you have a wonderful