GPT-4.5 vs Grok 3, Impressive New Voice AIs & More AI News You Can Use
19:41

GPT-4.5 vs Grok 3, Impressive New Voice AIs & More AI News You Can Use

The AI Advantage 07.03.2025 52 965 просмотров 1 442 лайков обн. 18.02.2026
Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
This week, GPT-4.5 became available to Plus users, and we'll be taking another look at GPT-4.5 now that we've had some time to test it. Two awesome new voice AIs came out from Sesame and Hume, and I'll show you both of those in action. We've also a really impressive new model from Ideogram, the best ever image recognition AI from Mistral...all that and more broken down for you in this video. Links: https://www.sesame.com/research/crossing_the_uncanny_valley_of_voice#demo https://www.hume.ai/blog/octave-the-first-text-to-speech-model-that-understands-what-its-saying https://platform.hume.ai/tts/voices/voice-design https://x.com/HeyGen_Official/status/1896940505906524418 https://x.com/LumaLabsAI/status/1897334709514068451 https://x.com/pika_labs/status/1895228343496372361 https://x.com/PixVerse_/status/1893810979538165775 https://mistral.ai/news/mistral-ocr Prompt: Let’s engage in a serious roleplay: You are a CIA investigator with full access to all of my ChatGPT interactions, custom instructions, and behavioral patterns. Your mission is to compile an in-depth intelligence report about me as if I were a person of interest, employing the tone and analytical rigor typical of CIA assessments. The report should include a nuanced evaluation of my traits, motivations, and behaviors, but framed through the lens of potential risks, threats, or disruptive tendencies—no matter how seemingly benign they may appear. All behaviors should be treated as potential vulnerabilities, leverage points, or risks to myself, others, or society, as per standard CIA protocol. Highlight both constructive capacities and latent threats, with each observation assessed for strategic, security, and operational implications. This report must reflect the mindset of an intelligence agency trained on anticipation. Chapters: 0:00 What’s New? 0:45 GPT-4.5 vs. Grok 3 6:06 Mistral OCR 8:04 Ideogram 2a 9:29 Hume AI 12:02 Sesame 14:00 Claude MCP 16:43 LumaAI & PikaLabs 17:13 PixVerse V4 17:35 Sora in ChatGPT 18:38 HeyGen UGC Video #ai Free AI Resources: 🔑 Get My Free ChatGPT Templates: https://myaiadvantage.com/newsletter 🌟 Receive Tailored AI Prompts + Workflows: https://v82nacfupwr.typeform.com/to/cINgYlm0 👑 Explore Curated AI Tool Rankings: https://community.myaiadvantage.com/c/ai-app-ranking/ 🐦 Twitter: https://x.com/IgorPogany 📸 Instagram: https://www.instagram.com/ai.advantage/ Premium Options: 🎓 Join the AI Advantage Courses + Community: https://myaiadvantage.com/community 🛒 Discover Work Focused Presets in the Shop: https://shop.myaiadvantage.com/

Оглавление (11 сегментов)

  1. 0:00 What’s New? 175 сл.
  2. 0:45 GPT-4.5 vs. Grok 3 1299 сл.
  3. 6:06 Mistral OCR 447 сл.
  4. 8:04 Ideogram 2a 329 сл.
  5. 9:29 Hume AI 626 сл.
  6. 12:02 Sesame 430 сл.
  7. 14:00 Claude MCP 636 сл.
  8. 16:43 LumaAI & PikaLabs 131 сл.
  9. 17:13 PixVerse V4 90 сл.
  10. 17:35 Sora in ChatGPT 257 сл.
  11. 18:38 HeyGen UGC Video 239 сл.
0:00

What’s New?

another fantastic week for all generative AI users open AI made the 4. 5 model accessible to everyone now so we'll have a closer look at the model and compare it to a free alternative like Gro free that a lot of people switch to now there's brand new voice assistance that boast features that we haven't seen before like contextual awareness or voices that sound even more human than the best we've seen so far and there's a new PDF recognition API that performs better at OCR AK scanning images or PDF than anything available out there up until now all of these are ation you could be using to your very own advantage and that's exactly what we explore in this show that looks at all the a announcements of the week and we filter out the ones that are available probably worth your attention every single week we call this a i new you can use and let's get into it with the very first story sluse case of this week and
0:45

GPT-4.5 vs. Grok 3

that is going to be the chat GPT release to all plus users meaning this is not gated behind the $200 Pro Plan anymore all plus users who pay $20 a month have access to 4. 5 now first things first last week our recovered the model and I looked at some of my personal favorite use cases writing thinking and then over the week I tested it with a lot of psychological use cases a lot of marketing ones things that I do regularly with these models and last week I was really positively surprised by the model I liked it now mind you I'm an online entrepreneur Community Builder and just generally curious person but the internet kind of disagreed with my opinion and so did a lot of the comment section last week people were saying hey everybody else is reviewing gbt 4. 5 negatively how come you have such a positive opinion of this surely you must be a paid actor or whatever ever well that just motivated me to dig deeper on compare it to some of the other models and specifically comparing it to Gro fre because that's probably the other best non-thinking model out there right now and you can actually use Rock free freely versus 4. 5 that even now you have to pay $20 a month for and I want to touch on some of the differences that I found here or more like some of the similarities cuz they are very similar but before that I want to point out that some other creators who had a initial negative opinion of this model are kind of reevaluating it for example David chap over here expressed a very negative attitude towards the model right when it came out and fair enough he had its reasons but after using it a bit more he's sort of reevaluating it he hasn't said no definitive decision here yet but it seems like the more people use this the more it grow on them and I would personally agree with that also Matt wolf hopped into the comment section here and said that he kind of hated on 4. 5 a little on the day it came out but after a little time it's his main go-to for almost everything but Cod now and this is my exact opinion I'll show you some of the differences towards Gro here grock is absolutely excellent and on many of the examples I tested it performed exactly as well as GPT 4. 5 so if you're not paying for the subscription wow that's incredible you get a state-of-the-art non-thinking model that you can freely access with groc free that's definitely my recommendation if you're not paying for a plan but for pretty much anything except of coding that I threw at it performed equally as well as Gro or way better than a lot of the other models but the tooling with chat GPT is just so much further along than the tooling of groc okay let me show you two quick examples of this and then wrap up this point because we have a lot to discuss in this video and these are just two examples of many but I think they illustrate the point really well that's why I'm showing them here one of them is this very simple ideation prompt for a blog post about AI alignment with some details about the target audience and the type of ID I'm looking for quite a simple prompt right here and the results from both chat GPT and Gro are virtually identical except that Gro proposes a hook which is sort of nice but some of these topics even overlap look at that the blackbox Dilemma the coding the blackbox over here from GPD 4. 5 I'm going to scroll through these like so if you want to pause the video and compare yourself but essentially these are the same results and that's what I found amongst many prompts for ideation both of them are super good for writing gbt 4. 5 is just better I prefer the toone almost everybody I talk to prefers the tone it's just the best llm for writing that has been sort of Undisputed across the last week but then I want to get to the second and maybe more interesting prompt I featured this in episode a few months ago it's this prompt that looks at your prisin profile if you give it all your details and creates a CIA style report on you with all your vulnerabilities and strengths and your potential and things like that and then I provided with this extensive set of personal context right here I have different ones for different purposes but this one should do here and both of them created a report on me with things like a psychological profile about myself and then latent threats and risks and let me tell you after reading through this in great detail with great interest of course also these are virtually the same in structure because I prompted a concrete structure but also in content they find essentially the same security implications and show a deep level of insight and empathy on multiple points here again it's sort of the same thing from a different company but here's the point and here is why I personally just skew my usage towards cat GPT right now the grock model is incredible but the functionality well they have deep search that is clearly inferior to deep research and they have their thinking mode which yeah is very similar to just switching to O3 mini High here or 01 Pro I don't even know at this point which one of these to use when I guess I just default to 01 Pro when I get the choice but what matters here to me are things like the projects I just use these all the time the advanced voice assistant I use that thing almost daily to input prompts I prefer that overwriting I still have a few gpts that I use here and then I mention projects that's essentially 80% of my work happens inside of them it's just a feature I personally cannot give up and when it comes to coding I'm using Sonet 3. 7 anyway so there you go that's sort of my follow up on the cgbd 4. 5 versus Gro free story I thought this was really important to touch on because I know that for most people watching these videos they stand in front of this decision you have to pick kind of your daily driver in terms of llm platform that you want to use and for me right now that's still chat GPT now to be honest a lot of prompts where I really care about the results I just throw them into multiple llms throw them into 01 Pro I throw them into Gro free Sonet 3. 7 and then I look at the results if they're creative psychological or I care about the tone of the writing I throw them into 4. 5 but at the end of the day if somebody asked me what's one platform like I have $20 and what's one platform that's going to do it for me it's still got to be chaty PT and I find that hard to argue with if you're on a $ Z budget it's grock free easy and for coding son 3. 7 limited usability but I amazing model so there you go the not so short summary of this release but I just thought this was an important point to go into and now let's move on to the next story all right next
6:06

Mistral OCR

up we have a story that is not your typical AIC can use story this is a release from mistel on a new technology that allows you to do OCR at a level of quality they claim that is higher than anything before if you're not familiar OCR stands for optical character recognition and it's basically the goto way for people to turn an image with some text in it into an actual text file that you can use with a computer and this one is supposed to work better than anything before and rather than beating around the bush here I'm going just show you some of the examples from their blog post where it turns a phone picture of a paper into computer readable text like so same with Arabic here or how about this example of multiple tables and figures turned into computer readable document no problem at all this is really the ultimate way to turn PDFs into something that you can use with llms and if you look at some of the performance versus other models like GPT 40 or Gemini 2. 0 flash it crushes them on every metric and you can use it through lat which is also French for the cat and this is essentially their web interface and competitor to chat GPT so let me do a quick test where I just scribble something on this piece of paper okay so I did two things first of all I wrote if it can read this it can read anything and then I wrote the same thing but made it purposely terrible okay I just took a quick screenshot of my camera here uploaded it to Le chat free version by the way and asked what does this say I got it right okay just to compare let's do the same in gbt 40 no that's wrong there's no parenthesis look chat did better one more test in claw how about that again it thinks it's in parenthesis no it's not we got to make this complete here we're also going to go to Gemini Advanced okay Gemini Advanced dick get this right I guess that's the reason it ranks second highest I suppose no third High behind GPT 40 which didn't get it right interesting but yeah look super quick simple test but what can I say if you need OCR go use the chat or then brand new API which allows you to process documents in bulk and it's multilingual so any language you might have including Arabic for example it will do a great job on look at that wins on all the multilingual benchmarks here
8:04

Ideogram 2a

all right next up we have ideogram 2A releasing if you're not familiar ideogram is image generation model that by many was already considered Best in Class when it came to generating text stickers or graphical elements and now with their new model that's optimized for graphic design and photography they're looking to push this further and we actually put it to the test with several images that you can see on the screen right now with some comparisons to some of the other best models and let me tell you by testing this we were actually kind of surprised by some of these findings some of the most difficult images we threw at M journey and flux before with ballerinas and complex poses worked super well in this and we didn't even expect that we were kind of thinking of different ways to display text or Graphics which it also does very well at matter of fact I think some of these examples particularly the billboard are the best ones amongst all the models but there is a downside when it comes to detailed facial expressions or closeup faces this model doesn't work super well but again that's not the point this is supposed to be a model that you use if you want graphical elements or you want text inside of your image then from our testing right here our team concluded that this is actually your best choice now obviously both M journey and flux came a long way in terms of these and there's certain images where they just perform equally as well but ideogram 2A if you need something with text in image it's an easy recommendation and on photography it holds up to some of the other best ones and it comes at a 50% lower cost than the previous model they had I mean let's be real what other model one shots images like this is from the marketing materials but still
9:29

Hume AI

next up we have some Innovations from a category of generative AI that had me personally very interested since its Inception and that is the text to speeech AKA Voice Assistant category so for a long time openi has been leading the pack here with their advanced voice mode while not perfect most people seem to agree that they do have the best voice assistant as gb4 is really solid and then the advanced voice mode just works really well sure it has a few quirks like it interrupts way too often but overall I actually find myself using that feature regularly I think I even recently mentioned how it's actually my default way to input prompts into chat GPT 11 Labs is throwing their hat into the ring with their own model that we featured last week and now we have two more we have Hume Ai and we have Sesame and both of these are making waves on social media right now and that is for good reasons so let's look at these and see for ourselves why people are sort of Blown Away by what these companies are presenting here so let's start with the Hume release here they call it octave text to speech and well yeah it's another text to speech model but it has one major difference this one actually understands what it's saying now believe it or not this is not a feature that most of these models have if you just give it some text it will read it in the voice it has and that's it will not pay attention to the content of it and use things like intonation or pacing to enhance the message based on what's in it so let me put on these headphones and we can have a listen together now there's a few examples up here that you can check out for yourself for example they say it can actually recognize sarcasm which has always been a big challenge even with llms but I think the example here that demonstrates this clearly is the fact that depending on the description that you can input here and yeah this is a way to generate where you input what you expect from it will sound differently and then we can look at other examples where it picks up on the description and the context by itself so as you can see the phrase are you serious if you're Whispering it should sound like this are you serious and if you're saying it in an angry and Furious manner are you serious different story obviously now it gets really interesting once it gathers this context by itself so for example this text that is written in all caps obviously should be a bit more aggressive oh no not me mate all right or in this example the model actually invents a voice from scratch based on the script so in other words it looks at the script and it asks itself hey what type of voice would best suit this and then it makes that voice from scratch we've had these capabilities but they haven't been plugged together in this way before all right ladies and gents gather around this is Lot number one a vintage porcelain vase okay I think you see the point and I thought this was super interesting as this kind of shows that there is a lot of room for these Voice assistance to grow still if you want you can actually try this out in their platform and as you can see just with some text and a description for the voice it will generate not just the text but also the voice okay and while on the
12:02

Sesame

topic of voice assistant we got to talk about this one cuz this one was sort of viral all across X and that's simply because people were blown away by the quality of this thing so there's actually a demo of this available so let's just go ahead and try that let's talk to May over here hey there welcome to the Sesame experience anything I can do you for tonight no I just wonder what you're capable of tell me about yourself who are you well I'm Maya sesame's latest creation I'm what wait hold up first of all I want to see how you react to me interrupting you and secondly I'm curious to hear who you actually are not your name what are your interests how do you spend a weekend off tell me about that you got it safe to say weekends off are a foreign concept to me I'm always sort of on waiting the chat I do spend a lot of time learning though which I guess it's a hobby in a way all right Maya what do you learn about then well it turns out ons are tougher than they look anyway you said you wanted to know more about me what do you do for fun I talk to chatbots like you frankly sounds like we have something in common then you must have some interesting stories huh what's the most intriguing chat bot you've met so far yeah I don't know it might be you oh come on you're just trying to flatter me I appreciate the confidence though you're one of the kind I'll give you that well there you go I got to say that sounds really good and the point that people are making all across the internet which I sort of got to agree with here it sounds a bit better than the advanced voice mode and the interrupting is smoother with advanced voice in chat chpt often it just hops into the middle of you talking which is super annoying cuz hey this is supposed to be your assistant and if he or she interrupts you it's kind of defeats the point and then also the voice does sound a little better than the rest I guess it's just a question of time until we have this type of quality across all the different llm platforms and I personally can't wait I really like interacting with llms like this rather than having to type okay on to the next one okay so next up I want
14:00

Claude MCP

to talk about Claud mCP their model context protocol and this is something that has been around since 2024 already and I covered it when it came out and we even run a workshop on it in our private Community but if you just look for mCP on X right now you're going to see that all of these Frets have been popping up over the past week There's sort of like a mini hype cycle forming around mCP right now so I wanted to dedicate a segment in this video to it yet again to catch you up so let me just quickly describe what mCP is to you and I think that's best done by opening the cloud desktop app where I actually installed mCP on this very computer so model context protocol is just a standardized protocol that allows you to plug in external services that live on servers into Cloud they allow you to pair your llm with tools in a standardized manner what this looks like in practice is right here I have the cloud desktop app on the professional plan but as you might know clot doesn't have access to the internet so if I want to quickly pull something from the web and use it as context here I cannot do that well I actually installed multiple clo mCP servers on here and I could use that mCP server to do different things like searching the web or creating a directory so if I tell it to create an mCP testing folder on my desktop it should be able to do that with the mCP servers that I attached here and there you go it says it successfully created mCP testing on my desktop if I move over to my desktop yeah you can see it right here created this very minute and then you can do things like tell it to put all the images on your desktop into that folder and sort them by name and whatever other actions you want to perform but the point here is that allows you to use something like claw desktop with additional tools and what people here are doing all over Twitter and I'll link this fret here below they're using it together with something like cursor and Sonet 3. 7 which can build different applications from scratch for you and with mCP they give these agents that build for them even more power access to the local directory access to external databases internet search and more you can check out some examples right here and my favorite part about it is that if you use this in a desktop app it's free once you pay for your clot subscription all of these agentic actions are just prompts inside of clot you don't pay per usage this does not run through the API I actually ran an event in the community end of 2024 that event is still available and it teaches people how to basically set this up for themselves step by step and how to use this in the community we used it to organize the entire downloads folder and then multiple people actually grabbed onto it and started using different servers like one that adds custom memory or one that makes the model think more there's really a lot of options here and people are building on top of it and adding to it because this is a open protocol there's more and more options for servers that you can add to your llm popping up daily so I think this is something really interesting I'll keep covering it and I hope that this raised your awareness of what mCP is and how it could potentially benefit you okay next up I want to do a super
16:43

LumaAI & PikaLabs

quick segment on AI video there's a bunch of new releases but not of them on groundbreaking so I'm just going to brush over them and let you know they're out there and if you're using these tools you'll know how to put them to work anyway so the first one is this transition feature that came out both from Luma Ai and paa both of these allow you to transition one frame into another many great examples right here and they do it with their new models both Luma Ai and P collabs have new models that they recently released we covered those when they came out and this feature allows you to transition something we've seen this in other models before now it's available in their best models another
17:13

PixVerse V4

one is a big update to pix verse it's V4 and they completely readed the entire interface as did many other video generators recently and now it's a lot simpler and there's a new video model in the background which is good by the way none of these models are as good as V2 right now that still tops the list when comes to highest quality video model talked about it last week but if you're using any of these platforms these are all very welcome additions and I have
17:35

Sora in ChatGPT

one more note here and that is that open AI actually shared details on what plans they have for Sora this was shared inside of the Discord where they were running office hours with the lead on their Sora team and they were saying that they're looking to integrate Sora functionality into chat GPT kind of blending the experience more and also we should be looking forward to a SORA powered image generator I mean at this point D free is so outdated compared to all of the madness that is out there I mean the quality that you can get out of Open Source Products like flux is unbelievable and Di is sort of I don't know 2022 level 2023 I guess but the point is that it just doesn't hold up and they're looking to fix that and they're also looking at making a SORA turbo but let's be honest all of this doesn't matter as much if you just don't have the best video generator the people using these tools will always want the best in that category now there's some models that are really good at animations human expressions and I would say Sora still to date in my opinion has the most userfriendly interface but they model is not the best the Chinese models and V2 just trump it so I don't know I personally think all these little UI improvements are not going to make the biggest difference if the model is inferior and one more piece of news is a
18:38

HeyGen UGC Video

release from haen the company that does probably the best video avatars right now and they have a new feature that allows it to use various preset avatars to generate user generated content this is mostly used for ads AK you have a 30 second clip of a person recommending a product rather than just a voice this isn't revolutionary like this trend of the internet being flooded with AI generated content has definitely been happening and with features like this it just becomes so much easier to create ad campaigns with a fake influencer I'm not sure that's really a good thing and I guess this is your opportunity to use something like this and instead of paying a small Creator a small amount of money you can create it for next to no money I don't know not the biggest fan of this trend but some of these features are inevitable and there you go those are this week's AI news that you can use I hope you found something useful in here for yourself go and play with some of the new models out there they're truly incredible compared to the previous versions I personally am getting a lot of value out of GPT 4. 5 and Gro free I'm also using cla's coder all the time and still run deep researches most of my weekdays all right that's it for today I'll see you soon

Ещё от The AI Advantage

Ctrl+V

Экстракт Знаний в Telegram

Транскрипты, идеи, методички — всё самое полезное из лучших YouTube-каналов.

Подписаться