AI Realism Breakthrough & More AI Use Cases

25:52

AI Realism Breakthrough & More AI Use Cases

The AI Advantage 16.08.2024 160 298 просмотров 2 707 лайков обн. 18.02.2026

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

To try everything Brilliant has to offer—free—for a full 30 days, visit https://brilliant.org/TheAIAdvantage/ . You’ll also get 20% off an annual premium subscription. This week we'll cover all the most important releases and updates in the world of AI, including Grok 2, the insane realism of the new Flux image generator, surprise updates to ChatGPT and way more. It was a massive week in AI, watch this video to catch up on all the news you can actually use! Links: https://x.ai/blog/grok-2 https://replicate.com/xlabs-ai/flux-dev-realism?prediction=k3yq8zp541rj40chatbbeg1fg0 https://x.com/i/grok https://x.com/letz_ai https://x.com/levelsio/status/1822067118914879808 https://github.com/hacksider/Deep-Live-Cam https://www.reddit.com/r/StableDiffusion/comments/1ep5htc/portraits_of_men_flux_realism_lora/#lightbox https://x.com/lmsysorg/status/1823515224064098546/photo/1 https://x.com/OpenAIDevs/status/1823510395619000525 https://x.com/ChatGPTapp/status/1823109016223957387 https://platform.openai.com/docs/models/gpt-4o https://x.com/GoogleDeepMind/status/1823409674739437915 https://deepmind.google/technologies/imagen-3/ https://www.anthropic.com/news/prompt-caching https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching https://github.com/anthropics/anthropic-cookbook/blob/main/misc/prompt_caching.ipynb Chapters: 0:00 What’s New? 1:02 Grok Images 8:42 Brilliant 10:12 Grok-2 Beta 16:55 New ChatGPT Model 17:59 Google’s Releases 20:51 Viggle App 22:26 Prompt Caching with Claude #ai #news Free AI Resources: 🔑 Get My Free ChatGPT Templates: https://myaiadvantage.com/newsletter 🌟 Receive Tailored AI Prompts + Workflows: https://v82nacfupwr.typeform.com/to/cINgYlm0 👑 Explore Curated AI Tool Rankings: https://community.myaiadvantage.com/c/ai-app-ranking/ 🐦 Twitter: https://twitter.com/TheAIAdvantage 📸 Instagram: https://www.instagram.com/ai.advantage/ Premium Options: 🎓 Join the AI Advantage Courses + Community: https://myaiadvantage.com/community 🛒 Discover Work Focused Presets in the Shop: https://shop.myaiadvantage.com/

Методичка по этому видео

Структурированный конспект

Мастерство работы с ИИ: от гиперреалистичных изображений до оптимизации LLM-запросов

Интенсивный обзор прорывных инструментов ИИ: генерация фотореализма, работа с Grok-2 и оптимизация затрат на API для предпринимателей и разработчиков за 25 минут.

Оглавление (8 сегментов)

What’s New?

okay listen so this week in news we can use is quite different than the usual weeks as you know every week me and the team go in and pull together all the new AI releases research them test them for you and then in this video I present you all the results and usually we start with chbt upgrades llm upgrades but this week I want to lead with Hyper realistic image generation because I think we literally had a breakthrough in this space and the first actual use cases like e-commerce are already popping up so I'm very excited to bring you a packed week of news you can use although it is mid August we're going to be covering hyperrealism and its use cases what happened there since last week but there's also a new cat GPT model that ranks number one above everything else now which is already inside of cat GPT and we have Google releasing a voice assistant of their own that you can actually use on your phone the top comment on last week's video was already saying that it feels like we're beginning to accelerate from Frisco fatsis and I can only agree last week was intense but it feels like we're entering a whole new era and I'm not just saying that lightly let me prove my point here by showing you this week AI news and you can actually use starting

Grok Images

out with the grock release and this is linked to this hyper realism story because grock 2 has released and it includes image generation 2 and the image generation is from the flux one model by black forest Labs that we covered last week and that's where I want to start we're going to talk about grock 2 the llm and how it compares to other llms later on when we talk about that but the fact is that because flux is open source we covered that last week if you haven't seen that check it out it's a legit breakthrough to get a mid journey level model that is open source and people can build up on and in the next few minutes I'll show you why but the point is that it's integrated into Gro and Gro already shipped it's here this is grock 2 mini there's the larger model again we'll talk about that later but you already have this flux integration in here and it's quite unhinged like not completely uncensored okay so you can't do nudity and things like that but you can generate political figures and compromising situations and you can also generate all sorts of copyrighted materials like company logos this is a generated right here I made it live so fair enough now one of the most popular social media platforms on planet Earth can generate copyrighted materials or political images like this one and all sorts of other weird stuff that is related to politics and tragedies and sometimes combining the both I don't even want to show that stuff in this video the point is it's quite unhinged but that's where the story only begins because flux is open source so people can do all sorts of stuff with it and if you've seen last week's video my review of it was wow it's really good it's Best in Class A text generation in hyperrealism it's quite good but M Journey still King but that was last week because people have done a lot work since then and the fact that it is open source allows for something that is called Aura and if you're not familiar let me introduce you to the concept of Aura for a second Laura basically stands for low rank adaptation and what that means in human terms is that you can add extra data to the Imaging model in Practical terms you can add images of yourself and then train the model to generate images of you or you could add a whole bunch of hyper realistic images that look really crisp and super realistic real photos and then the model will be able to pick that up and that's exactly what people have been doing and that's why we have various offshoots of this flux model now because it is open source and you can do things like combine it with luras and we get something like flux def realism which is basically the flux model with a realism Laura attached to it now running this is not free it costs a few cents you need to sign in with giab on this replicates base I'll just briefly do that and also I should note that what we learned since last week's testing is that the prompting is a little more intricate with flux so you need to be using a promp generator or be very detailed in your promptings a lot of the simple prompts that might reduce stunning images in my Journey won't work as well in flux but before we even get into this app I want to address the question of like okay so like hyper realistic images why should I even care like image generation is really good but I have no use case for it either in my work or my everyday life and to that concern that is very common by the way these days I would say fair enough I for myself found this use case of creating these amazing custom thumbnails of me in various situations a lot of the times but most people don't really have a use case but what you do have is the fact that the word photo is kind of a term that everybody uses and everybody has a fixed definition of that now and the point of this might not even be a use case it might be the fact that you need to change your vocabulary or change the definition of what you consider a photo because what we're about to generate with this flux Dev realism model here is indistinguishable from real life like literally and I don't mean sort of indistinguishable if you would see this image and let's say National Geographic no nobody would be able to tell not even a trained eye the fingers are perfect the skin texture the beard the focal plane it's all just like a real photo just like this other images here and what I hope that this segment here in this video does is that you might start questioning what even a photo is because up until now a photo is a moment in real life that was captured through a camera whether that was done back in the day with film or through digital everybody agreed on what a photo is but now also this is a photo and this is not real life and sure you might argue that photoshop took in that direction already but that was still a skill that was hard to access now it really gets democratized like Heck if you're watching this video you can just log in here add a few sense to your replicate account and go ahead and run this yourself like so and all of a sudden you can generate all sorts of fake images but that's only my first point the second point is actually use case related because okay sure we might have to redefine what we perceive as real when we see digital imagery from here on out and there you go this is the generation so the eyes is a little weird no problem I'll just rerun it and another 4 seconds we'll have another alternative cuz that's how simple this is but then certain small companies and Indie hackers already found use cases for this in the real world and they're first because they're the most agile right a big Corporation is going to take 12 to 24 months to actually implement this meaningfully but this is the moment where that process begins this is sort of the Tipping Point of realism cuz this model is open source look at that this one is Flawless except of maybe this little text piece the text is right the background this could be from any conference so what did these small teams or individuals find well I have two examples here one of them is called let's Ai and they basically plugged in flux into their product that allows people to try on various clothing in an online setting and keep in mind this is just the first version of it look this is lonus trying on rayb bands from some e-commerce store without actually trying them on same example with a Monclair jacket like so it's quite easy to imagine a future where online shopping turns into hey upload five images of yourself and then here's the product catalog with you actually wearing the products I mean that will convert so much better than you just seeing a image of some random model wearing it that might have a completely different body type than you so that's one very interesting use case and the Second Use case actually relates to what Peter levels here on X has been experimenting with he's a popular Indie hacker that is always up to some new project and right now he's playing with flux and he added his own Laura to the model and here you can see he generated himself in four different Generations which is interesting but I think even more interesting than that he actually did a little pipeline where he generated an image with flux and then fed it to link to generate the AI YouTuber that looks hyper real and the video aspect here is really the next step but we'll cover that once it's relevant for now character consistency and the lip syncing is just not there yet but hyper real images are with these Fluxx models that we just covered here and just to round out this segment I want to just point your attention towards this GitHub report that popped up over the last week it's called Deep live cam and in case you haven't seen this yet it's very simply described you basically can install this locally and with one image it creates deep fakes of anybody and it creates a webcam image that you could then feed to zoom or Google meets or whatever you might be using and all of a sudden you could potentially Get Fooled by somebody using something like this into thinking that you're talking to somebody else so this is why I wanted to feature this first because these incremental AI advancements often seem meaningless like okay new model who cares I'm not going to be using it but in a case like this I want you to think about what this means for the current digital world and for things that we take for granted like if a family member sends you image you don't question if that's a real image or if they Photoshopped it right with technology like this being accessible inside of WhatsApp Instagram their models are not so good but now with Twitter RX integrating flux into their platform it's just a question of weeks or months until this is widely available to billions of people and not just people who watch this videos and use something like replicate or premium subscribers on X as it is now and one more thing before we move on consider sharing this video with a loved one because no matter how I look at this education is the only way that I can see on how to protect yourself from these technological advancements and these potentially malicious use cases and then on the bright side there will probably transform Ecom very soon here and that should be relevant to everyone involved with marketing or entrepreneurship in any sense so more and more AI tools are

Brilliant

becoming incredible at generating code which is great unless you don't know what to do with it we've seen a lot of people recently hop into something like Sonet 3. 5 by anthropic that is really good at generating code and they ask it to generate something like a snake game just to get an error which they don't know how to resp solve and they completely hit a brick wall and that's why having at least a little bit of understanding of how code works is really beneficial while trying to utilize the latest AI tools and one fantastic resource that you can use to get up to speed on how to get these Basics under your belt is brilliant. org the sponsor of today's video they have beginner level courses to teach you all the basics but then they also have more advanced courses like this one called designing programs that can really take your coding skills to the next level here you can actually learn how to build games and apps that respond to live user input it also teaches you how to properly check for errors and debug if problems come up and by the way that's a skill that's really useful when working with AI tools because they do a lot of the writing for you it's just that bugs make their way into the code sometimes and you need to know how to deal with that one thing that I really like about brilliant is that you're always Hands-On building something or interacting with exercise you're never forced to sit for an hourong lecture on something that you really don't care about traditional education anyway if you really want to level up your own skill set and take full advantage of the tools of available to you today head on over to brilliant. org or click the link in the description to try it for free for a full 30 days if you decide to stick with it you'll get 20% off an annual subscription a big thank you to brilliant for sponsoring this video and now let's get back to some AI news you can use okay and now it's time to talk

Grok-2 Beta

about llms I'm I put my headphones on here to get a little more serious about this because there has been quite a few updates and I'm going to keep it short I'm not going to go too deep I don't think we had anything that is like a complete Game Changer this week I would tell you that but we did have various releases from well on one side x/ Twitter with groc 2 and then we have a brand new model out of chat GPT the chat GPT 40 latest 22488 release you can find that in the new cat GPT app2 and then there was this entire story that unfolded with this new model called sus column r that popped up on LMS Ys Arena and it ranked really high nobody knew what it was people were rumoring that it's a new chb model but now it has been revealed that it was actually the grock 2 Beta release okay and this was the proper grock 2 model as of right now at least for me and the people that I know when you go to x you can only access the grock 2 mini model which is sort of like GPT for om mini so what is unique about grock 2 and what is new about the new cat GPT model well first of all the story really begins with this chatbot Arena here because as I told you this new model sort of just popped up out of nowhere and was ranking really high and this seems to be the new default way like a lot of these companies like open AI now also X test their new models because it's a great way to get it into users hands to get some feedback on how users actually use it and enjoy it without revealing the model so they're released under Anonymous names and this is also how a few recent openai models were introduced and one comment on chatbot Arena actually made a mistake last week that I want to correct this week thank you so much influential studio for this comment here on the video pointing out that when you vote on chatbot Arena and you can see which model you're voting on those votes don't actually count and only counts the anonymous votes that makes a lot of sense as this ranking here is fully user voted so for example in this view where you can actually see what you're comparing these votes do not count only the ones from the arena here actually count where the models are Anonymous just wanted to correct that but back to Gro 2 so it has released and what's the story here well it's a top tier model it's a GPD 4 level model that is not quite Best in Class at anything in particular but it's really well-rounded and the biggest selling point here is the following it's plugged into all of the Twitter data the full Twitter fire Hol all of the news story all of the opinions that go down on Twitter day-to- day they are being infused into the model so you can use it for some use cases that require browsing and that don't work as well with other models like what are the top news stories relating to AI for today and keep in mind this is the Mini model this is not the grock 2 main model that we are looking at here by the way while this generates this will take a few seconds CU it does need to look at the Twitter API and all the data there but as of these released benchmarks it's interesting how they structured it and it's a bit deceptive so I want to clarify this because it Compares it to the turbo model or Gemini 1. 5 Pro and some of these more competitive models like claw 3. 5 Sonet or llama 3E 405b are all the way on the right and the reason I say that is because for example gbd4 turbo this is the release that happened right after the voice assistant announcement back in May and back at that point a lot of people argue that GPT 4 is actually worse than GPT 4 the benchmarks were slightly better than GPT 4 but the point is this is not the fairest comparison and they're right next to each other so don't take this Delta too seriously what you really want to compare is Sonet over here with grock 2 because what you really want to be looking at is Sonet over here and llama 405b those are the most up-to-date versions again this gbt 40 and turbo models are back from May says that down here and if you compare to something like Sonet it actually loses out on all benchmarks closely but it's worse except of MAF Vista over here but again as I always see these minimal differences in benchmarks are not a game changer what matters is how it performs in practice and O okay right now it's actually still loading which is a little weird I'll regenerate this I got to say during my testing of this before the recording of this video this went actually super smoothly but there you go on second try it just gets the stories right away and as you'll see at the bottom it will reference the tweets it pulled it from so this is actually a fantastic Twitter search engine and I think that is the main use case here this combination of the Twitter data fire host and having an llm that has access to all of it is actually quite powerful and as you can see it talks about the search GPT prototype and Google's AI Integrations with the pixel 9 and of course the grock 2 launch so this is a fantastic use case that you can be using today if you're subscribed to X premium which I believe in Europe comes in at 10 a month it's actually 860 yeah that's correct if you go to monthly and then Premium Plus is 20 but Gro already comes with this so yeah this is a paid thing but it's just a brand new way to use Twitter and then of course it also has the image generation features with flux that we talked about in the first part of the video but there's even more here because as it showed what's coming down the line is also some multimodal capabilities where it has Vision capabilities they'll be offering Enterprise API which could be an interesting way I mean they'll give you access to llm that has all of the world's knowledge that is in Twitter hm time will show what that will be used for and what's my personal first impression of groc and its outputs well it's good it's certainly usable and if you only want the text generation features then it can certainly act as a replacement to chat GPT now I personally and many others still prefer anthropics voice it's just more human and less robotic like chat GPT but this is decent but again I didn't get my hands on the full grto version so I can't really give my full opinion but I also say this it lacks the tooling just some of these other tools also do file uploads a functional mobile apps gpts I use these things all of time now I do consider myself a power user but still if you use those features maybe code interpreter or image input then you won't have those in here so what should you use well let me sum it up briefly as of 15th of August 2024 as a general purpose AI assistant chat GPT still is best because of all the functionality that I just named when it comes to writing tone though anthropic Sonet 3. 5 is my go-to when it comes to code generation specifically also Sonet 3. 5 Head and Shoulders above everybody else right now but when it comes to research perplexity is your friend and when it comes to actually using llms with live data well I think grock actually sort of takes the crown here because it is plugged into all the Twitter data and it references all of that and as Twitter is the place where news breaks first I mean heck a lot of this video is just me and the team spending every single day on Twitter and pulling everything together and then me digesting it for you well Gro can sort of do that already too so that would be the one use case where this really stands out and also one more thing is that Gro is actually sort of uncensored and by sort of I mean again it's the same thing as with flux it won't produce R-rated content but it doesn't have problem with cuss wordss or things that are in like ethical gray area anthropic is on the other side of the spectrum they're extremely strict and CH is quite restrictive but not as much as anthropic is really extreme and Gro on the other hand doesn't have a problem with profanities and now moving forward

New ChatGPT Model

there's actually also a brand new Chad GPT model and this is not just the API this has actually been integrated into cat GPT that you might be using every single day because if you look at this tweet from the official chat GPT app there's a new GPT 4 model out in cat GPT since last week hope you are all enjoying it and check it out if you haven't we think you like it and the funny thing is nobody really noticed that's how minuscule the differences between the models are these days they ship a new thing and they have to announce that there's some update because nobody has noticed otherwise but yeah this a slightly upgraded model apparently the biggest difference is and how it handles chat conversations so it has been optimized to interact with users in a dialogue and there's also a brand new API endpoint for people to use but it's funny cuz the dev account still says hey if you're a Dev you probably still want to use the 0806 API endpoint not this latest one that was released for chat GPT that one is just best for chat use cases so there you go a minor update on that front if you were confused by this I will be reporting back once full grock 2 comes out and I'll get to test it a little bit more for my personal use cases for now I only have the min version so moving on to the

Google’s Releases

next story here is a few releases out of Google one of them is image and free and you know I'll keep this as short as possible it's a good image generator it's their best image generator but compared to something like flux or M Journey it just doesn't hold up it does text well but so do others and their open source but yeah it's better than anything that Google has done before with image generation and it will be introduced into their hardware and their software offerings just like this second announcement which might be more interesting here this is Gemini live okay and this is the voice assistant that open AI promised but for Google or is it because the reality of this product is probably the biggest Delta between what some people hyped it up to be and between what it actually is because the reality of it is yes it is a voice assistant and yes it also already shipped Android users already have this on their phone I'm an Apple user but I'm lucky enough that team member Daniel actually went ahead and gave this a shot and tested it and I'll just quote some of the pointy forwarded here to me keep in mind that this is coming from an angle where we're comparing it to what the voice assistant PR and what is available in the open AI app today because if you're not familiar there's a voice assistant already it might not be the sophisticated one with the voice changes that you can interrupt and the multimodal capabilities but there's a voice assistant you can use the voice function to talk to chat GPT right uh quick spoiler that's what this really is Google shipped a voice input and output function that you can also interrupt but it's not great okay so what's the review well apparently the Gemini live voice assistant feels more like a Beta release than something that is actually on the level of The Voice Assistant teased by open ey the voices are good how can I help you today but so are cat gpt's voices today there's no voice modulation and no multimodal capabilities like using the camera to actually infer context and to use it as this advanced voice assistant what it does have is the ability for you to interrupt it which is actually my biggest gripe with the current version of the chat GPT voice features but the problem is it's not great and Daniel reported back that if he has a speaker volume on the phone over 75% The Voice Assistant actually starts interrup in itself cuz it here's the output and then it stops I think you get the point CET GPT never does that and because of this he concluded that the interrupting feature is something that's currently more annoying than useful not sure they can fix that over time but again it just goes to underline this point that it feels like something a little Half Baked that was maybe rushed out but not to bash it too hard what it does have is access to Integrations like your Google calendar or your Gmail and then you can interact with those on your Android phone that is absolutely fantastic and something you cannot get inside of C GPT as of yet so there you go that would be my first little look at the voice assistant feature I do have to add that at the end of their presentation they showed that they're looking at this advanced multimodal voice assistant in the future right but as of what's released today it's just voice input and output with Gemini which is a nice to have but that doesn't mean they're pulled ahead of open AI they just caught up and that's fine but let's call us Spa Spade if you have a different experience by the way please leave a comment below we would love to hear about it all right

Viggle App

this is going to be a quick but fun one vigle the app that came out a few months ago that lets you put yourself or somebody else into dancing type video has a new update where you can actually do it for two people and I think that's sort of fun because you can use it as a fun way to communicate with friends or family I just briefly want to show it to you so if you go in here you can see the new update right here I just logged in with Google on a free account by the way you can just try this right away and if you head on over to this multi tab then you can pick a template I'm going to Simply take a matrix fight that sounds perfect all right use the template and now I can pick the two characters right so one character right here I'll just use the camera real quick okay ideally should be a full body photo but I'll just a selfie here and then as the second one how about a picture of Tyrion Lannister here because you guys seem to enjoy the Game of Thrones clip we did with the sponsor last week and that's it I'll just go ahead and generate and by the way this is not sponsor it's just an interesting release of a cool app let's see what we get here okay no way it's Tyrion this epic well this is as I thought this is really sort of funny and quirky and you could just download it with the watermark lights so no problem you could send it to somebody again this is just possible on the free plan sure they have other tiers I haven't even looked into them so far but there you go I found little use case cuz a I can be that too it doesn't always just have to be useful

Prompt Caching with Claude

and productive and then last but certainly not least there is a very interesting release out of anthropic coming this week and this was sort of shocking to me prompt caching with CLA and what this essentially is that it saves context into a cache memory that goes along with the API with the Practical result of reducing costs up to 90% and latency by up to 85% meaning you can integrate really complex personas into CLA and then call the API it will cost 90% less and it will be roughly 5 to 10 times as fast I mean that's a bold claim and reading through this sounds really impressive all of these use cases conversational agents coding assistants anything that needs a little bit more context like for example here if you upload a book of 100,000 tokens into one of claude's models the latency without the caching would have been 11 seconds to get a response okay so you give it all this context then you ask something about the book and then it takes 11 seconds to reply that's what this means in Practical terms and with the caching 2. 5 at a 90% cost reduction honestly this sounds a little too good to be true so I had a closer look at this and I have to be honest I have one single gripe of this which is like what is the downside here what is the negative part honestly this looks too good to be true they say it's still in beta they give you explanations on how it works and how it's priced one limitation is that it doesn't work on the Opus model but the Sonet model is best right now anyway there's even a prompt caching cookbook on their GitHub if you want to check that out and hey let me tell you what this just came out I didn't really have time to dive deep into this but over the weekend I'll be having a closer look at this and experimenting with this because again it just sounds too good to be true this is sort of like having rag up to a certain context limit I suppose but without the downside of the long loading times and the embeddings being created and retrieved and compared to just adding a lot of context which was the rag alternative it's way faster now too so that's amazing but I want to know what's downside here and what does this compared to for example fine tuning because they do say that one of the best use cases here is actually giving it multi-shot prompts into this cache and then it can consider that as extra context for the generations so I'll have to run a few experiments and I'll report back and if you're interested in this sort of topic I do want to point out the fact that I'm actually restarting my llm Innovations event series before this was called chat GPT Innovations and I used to do it every two weeks for a year straight for all course members and now I hold it in the community and it's a part of the membership and I'm going to hold it once a month we went with this image of eigor Einstein because this is going to be where I'll be presenting some of the experiments that I run to the community it's a long format usually the lecture takes about an hour and then we do a Q& A afterwards and the next session in September we'll be looking at when you should using prompts versus using a GPT versus using finetuning and apparently now I'll have to extend this with when should you be using prompt caching and all of the results I'll be showing there will be evidence-based and include results of the experiments we run internally with the team and then moving forward I'll do one of these a month as this has always been the most popular format within the community so we do a lot of things there but I thought I'd tell you about this one and as I pointed out many times that's the whole idea behind the community we can go deep on single topics rather than doing what we do on YouTube which is brushing over many different topics and that's because this will be assuming that you already took my prompting course and the GPT building course that is also accessible within the community I cannot make that assumption in a video like this but what I can do is test prompt caching more and then come back with a video on that so there you go AI has been wild lately and these are some very exciting developments we'll be playing with all of it and if you find something interesting I'll be reporting back next Friday in our weekly show AI news you can use and that's all I got for today see you soon

Другие видео автора — The AI Advantage

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник