The First AI That Can Analyze Video (For FREE)
15:39

The First AI That Can Analyze Video (For FREE)

The AI Advantage 28.03.2024 72 681 просмотров 1 341 лайков обн. 18.02.2026
Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
Receive Tailored AI Prompts + Workflows for FREE: https://v82nacfupwr.typeform.com/to/cINgYlm0 Today, we explore Google's AI Studio's latest offering: Gemini 1.5, an AI with an unmatched context length of one million tokens and the exclusive ability to analyze video files. Links: https://aistudio.google.com/app/ Chapters: 00:00 Introducing Google AI Studio 00:30 Google AI Studio interface 01:58 Chat prompt 04:38 Freeform prompt 07:44 Structured prompt 10:49 Gemini 1.5 Use Cases #ai #google #gemini Free AI Resources: 🔑 Get My Free ChatGPT Templates: https://myaiadvantage.com/newsletter 🌟 Receive Tailored AI Prompts + Workflows: https://v82nacfupwr.typeform.com/to/cINgYlm0 👑 Explore Curated AI Tool Rankings: https://community.myaiadvantage.com/c/ai-app-ranking/ 🐦 Twitter: https://twitter.com/TheAIAdvantage 📸 Instagram: https://www.instagram.com/ai.advantage/ Premium Options: 🎓 Join the AI Advantage Courses + Community: https://myaiadvantage.com/community 🛒 Discover Work Focused Presets in the Shop: https://shop.myaiadvantage.com/

Оглавление (6 сегментов)

  1. 0:00 Introducing Google AI Studio 125 сл.
  2. 0:30 Google AI Studio interface 355 сл.
  3. 1:58 Chat prompt 679 сл.
  4. 4:38 Freeform prompt 800 сл.
  5. 7:44 Structured prompt 774 сл.
  6. 10:49 Gemini 1.5 Use Cases 1198 сл.
0:00

Introducing Google AI Studio

Google's AI studio just came out of early access,  meaning everybody can use this, except Europe,   but we'll talk about that. And this is packed  with features that I haven't seen anywhere else,   and it includes access to the Gemini 1. 5 Pro  model, which actually really matters because   it has 1 million tokens of context. I'll show you  two really interesting use cases that you could be   doing immediately with this. So as you can see,  I'm not in my home studio. I'm in Las Vegas for   AI conference right now. But that's not going to  stop me of showing you why you should care about   this and what features are not obvious, but really  useful here. Okay. So first things first, this is
0:30

Google AI Studio interface

a developer interface, but this shouldn't stop you  from using it. If you don't care about developing   your own apps at all. Matter of fact, developer  interfaces, just like OpenAI's Playground,   offer way more features than chat interfaces like  ChatGPT. You get to do things like switch models   quickly, set temperature, and you get advanced  features, which you still don't have in ChatGPT,   like these prompt presets. I'll cover all of this  in a second, but the most important part here is   the underlying model. If you go to the AI studio,  you will get access to the Gemini 1. 5 Pro model.    Okay. There's also the Gemini 1. 0 Pro model, but  we won't be talking about that today because the   1. 5 with the 1 million tokens of context is really  the star here. If you're confused by the naming,   they essentially have the Pro model, which is like  ChatGPT 3. 5. And then they have advanced model,   which is like GPT 4. In between lives the Gemini  1. 5 Pro. And hey, I already see all the haters   in the comments typing, Oh, their AI is biased.   Remember what happened with the images? Yes, yes,   I know. But matter of fact, there's one setting  in here, which I've been asking for ever since   the release of ChatGPT, and they actually did it.   So kudos to them. And it directly counters most   of the comments that people are going to have  around what happened. And don't get me wrong.    I'm not trying to sweep that under the rug. That  was a big deal. And if you follow the channel,   you know that my stance generally is let  the people decide, let the market decide,   give us more options. Don't make these moderation  decisions for us. And they did exactly that in   here, which I really like. So let's get into  touring the interface here. And I'll do this as   I usually do on the channel. And this tour will be  focused on non-developers. If you're a developer,   everything we cover is relevant, because these  are the first steps. So first things first, you
1:58

Chat prompt

can go to create new and right off the bat, you  get three different type of prompts that you could   run here. If you're familiar with ChatGPT, which  you probably are when you're watching this video,   the chat prompt is the simplest to understand.   This is a simple prompt that you type in,   and it generates a result with the model that  you select over here. Very simple. So you know,   let's run the classic penguin essay with Gemini  1. 5 Pro in here, you can see it loads for a bit,   and then boom, we should get a result in a second  here. Alright, and by the way, I'm logged in with   my Google account, this is not necessary, it just  works, you can just generate results like this.    Now here's the first interesting part, at the top  you can upload different multimodal file types to   add to your prompt. Now one thing is super unique  here, you can add video. If you're not aware,   there's no other model out there that does video.   ChatGPT doesn't do it, Cloud doesn't do it,   the open source models don't do it. Here you  can straight up upload a video and work with   it. It's gonna recognize what's in the video both  visually, and it's gonna recognize the audio. But   more on this in a second, because beyond this  multimodal features, and this chat interface   that you're probably familiar with, here on  the right we get a temperature setting, as we   do in OpenAI's Playground. If you're not aware,  this is essentially the creativity of the model,   so when you have a high setting, it's gonna be  super creative, but it's also gonna be more prone   to hallucination. These two are really linked  with LLMs, and if you tone down the temperature,   you really limit the creativity, and you're gonna  get similar results every single time. But as you   can see, this is not always editable, so it really  depends on what model you use. For example, if I   go to Gemini 1. 0 Pro, I get to alter this, with  Gemini 1. 5 Pro, this is set in stone. A stop   sequence is that if it recognizes certain words,  it's gonna stop at that point. So in other words,   if you're creating a list of, let's say,  YouTube titles, you could say stop sequence 11,   and when it arrives at point 11, it will stop the  prompt and just give you the output. To be honest,   this one is not very useful, but here we get to  the one that I talked about, the safety settings.    They actually give you control over how you want  the model to behave. Now they don't give you all   the control, but this is a step in the right  direction, I think. And for me, for all four of   these, I'm gonna set these to block view. I want  the model to give me its output, so I don't want   it to censor it for me. I'm a grown man. I can  handle that. I am a man. So I'm just gonna set   this to block view, and there you go. And then if  we go to advanced settings, top peak controls the   cutoff for the token probabilities. Now this  is something I don't really ever use, but it   does work together with temperature, and I know  I told you that temperature controls creativity,   but it really controls the probability inside of  the model. So temperature of one is gonna let it   access the full probability spectrum, and the top  P of 0. 4 is gonna say, take only the top 40% of   those results and get me an answer from within  there. But in giving you the full probability   spectrum of what these LLMs can generate, it's  gonna result in more creative stuff. That's why   I communicated it that way. Anyway, we're just  gonna close top P, and now let's get into some new   features here and the use cases, because there  is some. So if we go over here to create new,
4:38

Freeform prompt

we talked about the chat prompt, right? But  next up, let's talk about the free form prompt,   because this is quite interesting. From day one  of this channel, I've been communicating prompts   in a formula format. If you subscribe to the  newsletter and you got the free chat GPT template,   you're gonna see that every single prompt in  there has square brackets around certain words.    Square brackets are there for you to replace  the words with your very own variables or   use cases or words, whatever you wanna call it.   And I didn't just make that up out of thin air.    That's how you use these things. That's how you  program applications with LLMs and prompts on the   backend. Certain parts of the prompt are variable.   They're gonna differ. And in Google AI Studio,   you can actually set that and really easily add  multiple examples and run multiple prompts. This   is something I've been wanting inside of chat GPT  since a while. So let me show you how it works.    And I pulled up this example that I created  with the help of this getting started guide.    But if you're watching this video, you're  not gonna really need the getting started   guide. I cover everything in there and more.   If you wanna get into developing with this,   this is gonna be very helpful. There's a lot  of good guides in here. But back to this free   form prompt. So what is a free form prompt? A free  form prompt is a prompt with a variable in it. And   that variable can be defined here at the bottom.   Okay, so if I say, look at the following picture   and tell me who is the architect, you can see that  who is the architect is a test input. And the way   I created this is very simple. I just went to a  new free form prompt. This is the prompt. Then I   said test input. And then what I'm gonna do is I'm  just gonna rename the input from inputs to fact,   let's say. And then here, I'm gonna say who was  the architect, I'm gonna add a new example. And   then I'm gonna add three examples that make sense  in the context of architectural image. And then   I can go ahead and go to image here. You can also  take it from your Google Drive. But what I'm gonna   just do here is take one of the sample images  here. Let's say this pyramid from Egypt here.    Amazing. And when I say run here at the bottom,  you're gonna see that we're using the Gemini 1. 5   Pro model to run these free prompts on top of the  image. And I get all three results down here. And   now here's the best part. You get to save all of  this, okay? So all of this work. When I do this   in chat GPT, I would need to take it and put it  inside of a GPT and then access that. Here, you   can just quick save prompts like this. You just  say save. I'm just gonna say architecture analyst.    Fair enough. And I connected my Google Drive here.   And this allows me to save the prompt in my Google   Drive. As you can see, architecture analyst  has been added over here. And I can access it   anytime. And you're gonna see the free variables  are down here. You could delete them like so. Add   new ones. And this is really a great environment  to test your prompts and build them out. In all   the Notion templates I've ever given away or sold,  I always gave multiple examples of what you could   put in there. But here, you can actually put it to  work and create a template where you have multiple   variations of one prompt and you can get the  results in a heartbeat like so. And then you   could even go ahead and say and add another  variable. Variable two. There you go. And here,   I could write a second part. So as you can see,  you can make this as complex as you want because   I can be going in and creating multiple variables.   Very useful stuff where you can get quite complex   results and you can get a lot done in a short  time. And the next time you just want to prompt   on top of a different image, you just come in  here, you switch out the image, and you don't need   to rebuild the whole thing. There you go. I'm just  gonna add this. Now, these variables are obviously   empty, so I'm just gonna delete that. But rerun  it, and I have my new examples right here. All   right. So now we talked about the chat prompt and  the freeform prompt. Now, let's talk about the
7:44

Structured prompt

structured prompt. And this one will be familiar  to people who've been watching the channel   closely. Because I talked about multi-shot, also  called few-shot prompting, many, many times on   here. As a matter of fact, I always say it's one  of the most useful techniques you could deploy   when using a large language model. If you're not  familiar, the way it works is essentially you have   a prompt, and then you provide multiple examples  of how you would like the output to look like,   ideally three or more. Because essentially what  these LLMs do is they just recognize patterns, and   then they recreate those patterns. If you tell it,  this is the pattern that I want you to recreate,   it's gonna recognize that, and it's gonna  give you more of the same. So in other words,   providing multiple examples is a really good idea  if you want predictable and consistent outputs.    So how does this work? I set up a little profile  bio generator. This is as simple as it could be.    But again, you could plug in any prompt in here.   I'm just showing you how this interface works.    And basically what I did is I said, write me an  Instagram profile bio based on Instagram profile   name. Here are some examples. Now, I do recommend  including this block, otherwise the resulting   prompt is gonna give you all the results every  single time. But what I'm doing is I changed the   inputs to profile name and the output to profile  description. Maybe I should rename this to bio.    This would be even more accurate as I use that in  the prompt. All right, so if I say VA advantage,   simplifying AI everyday, let's just say this would  be the profile bio that I expected to generate,   that I wanted to generate. Now this could be way  more complex. This is just a quick example. But   as you can see, I get three of these examples.   So for Butter Chicken Enthusiast 9000, enjoying   life one nan at a time. Excellent. By the way, the  Butter Chicken Enthusiast, that's me. I eat that   dish everywhere I go. It has become kind of a must  for me. I just love it so much. Anyway, the point   here is this. Now that we have the prompt plus  three examples, we essentially created something   that if I go here to the bottom and just say Cat  Enthusiast and I run this, you're gonna see that   I get a result here that is very comparable to  the three examples that I provided. Now you're   gonna see that if I delete these examples, it's  gonna be very different. And look at that. The   format doesn't resemble the one before. And that's  the point of giving it the examples. Now the cool   thing here is this. While you watch this tutorial,  you actually kind of learned how to fine tune a   model because that's what a fine tuned model  is. You just need way more example, like over   100 examples. I'm just gonna cancel out here, not  save my progress because what I already did is I   saved this little structured prompts generator and  I named it profile bio generator, as you can see   right here. And now if I go to new tuned models,  you can see that I can actually pick the profile   bio generator because now I effectively fine tuned  a model because all fine tuning is it providing   it with input output pairs. If I input this, I  want this to come out. If you do that 100 times,   the model is going to sort of add that to its  training data and now be aware of all your input   output pairs. So this is a really good strategy if  you're trying to model to do one specific thing.    If you're only generating Instagram bios, you  really want to give it a lot of pairs of Hey, if   I input this profile name, I want this bio to be  there. And if you do that a lot of times, you're   going to find that the model performs so much  better in that particular use case. But you can   see down here, the minimum is 20 I only provided  free. So for fine tuned models, you really need   more with open AI, you need over 100 examples  for this to work super reliably. But there you   go. This is probably the easiest way I've seen to  fine tune a model, you know how it works. So yeah,   that is the structure prompt. So now we covered it  all. Now the last thing that is left to talk about
10:49

Gemini 1.5 Use Cases

here is use cases for the Gemini 1. 5 pro model  because it really is a unique model. A million   tokens of context is something that is unheard of.   No other model does this. Claude Opus has 200k,   GPT-4 has 128k. If you watched last week's use  case episode that we do every single Friday,   you will have heard about GPT-4. 5 that is rumored  to have 256k tokens of context. This has a million   that is a lot, okay. So I'm going to show you  two ways how you can take advantage of that.    Because in most everyday use cases, this is not  the most useful thing. You don't need that much   context. But here are two where you could use it.   Well, the first one is maybe a little unexpected.    You could upload every manual there is, right? And  this doesn't really work with chat GPT because the   context isn't long enough to accommodate something  like this fridges manual. I don't know. I just   took a random fridge of the LG site. You could  take literally any manual. If you didn't know,   pretty much every manual is available online. So  you could just do this for any appliance. I'm just   going to go here, go to file, file upload. As you  can see, you could also do it directly from your   drive. I'm just going to drag this over and it  shows the token count here. So I'm going to need   for external tokenizer. That's another neat  feature. And now from the 63 page manual,   I could ask something concrete. So for example,  ice will not dispense if any of the refrigerator   doors are left open. I'll just ask, why is the  ice not dispensing? And while it does that,   I mean, just think about the fact that's a very  long document, only 37,000 tokens. So you could   easily use all the other models we talked about  for this. I'm just saying some manuals will be   way too long for something like this. If you have  a 500 page PDF, well, you know how to use it now.    And there you go. LG refrigerator, it gives me all  the different reasons why that could be the case   based on the PDF. Okay. So that's one. And here's  another one. I'm just going to go to the latest   episode of my favorite podcast here. And what you  can do is you can pull the transcript. You don't   even need an external app. It's inside of YouTube.   So if you go down here, show transcript, then here   on the left side, you can toggle timestamps. And  what you can do now is just select all of this,   like so, copy. And now in a new chat prompt, I  can simply add all of this as context. Here's a   podcast I listen to, Paste. Now, one tricky thing  is that if you just press command V control V on   Windows, it won't work right away. You need to  add shift to remove the formatting. So command   shift V on a Mac here, and it's going to paste it  without the formatting. And there you go. Look at   the super, super long transcript of this very long  podcast. I mean, this is over one hour, right? So   just by default, you get a summary here. But you  could do this with things that are way longer,   because I only used 18,000 tokens here, meaning we  could even do a Lex Friedman podcast or multiple   Lex Friedman podcast transcripts in here. And then  you can talk to them. Every single prompt you've   ever discovered will work with this. And you can  do it on long form content like this and do deep   research. What about this one with Bill Ackman,  I actually listened to this monster of a podcast   on my trip to the US, short transcript. This  is where the scroll bar comes in really handy,   because there's just so much of this. And look,  if you want to go beyond this and want to use   this regularly in your company, there's totally  programmatical ways of doing this, of pulling   the transcripts off YouTube. I'm just showing  you manually of how to get your feet wet, how   to do this manually yourself. Here's another one.   I'm just gonna paste this, Command Shift V. Oops,   I just realized I copied the same one in again.   But it doesn't matter. We have a lot of context.    So we can just simply send a new message. Yep,  this is the Bill Ackman interview here. All right,   this is gonna take a second. But as soon as it  loads it in, we have power at our disposal that   simply wasn't possible a month or two ago.   And everybody has this now. Oh, actually,   everybody except Europe, if you're in Europe,  you've got to use a VPN. Now that I'm in Vegas,   I can actually access it for the first time  without having to use VPNs. So that's not the   biggest workaround in the world. And I think  it's worth it. This is completely free. And   playing with long context like this turns out  to be very valuable. I mean, I guess it depends   if it's valuable on what you do. But I would  just say give it a shot. Once you experience   this yourself of being able to copy a three hour  podcast and another one hour podcast into this,   you're going to realize that you could be doing  this with speeches of your favorite people. And   you could be prompting on top of all the wisdom  that is contained inside of that. So even now,   we're just at 85,000 tokens of context. So that  is less than one tenth of what's possible here.    But either way, I'm just going to go ahead and try  and ask a specific question about a point that was   covered in the Lex and Bill Ackman podcast about  a big conflict here with Carl Icahn. So I'm just   going to ask who abused Bill Ackman's short  position. Now, this is something that won't   be in the training data. It was in the podcast.   It was concretely discussed in there. I won't get   into the details. But the answer we want here is  Carl Icahn. Let's see if it gets it. There you go.    Just think about the fact that we have five hours  in here and we're only at 85,000 tokens. So you   could do multiple episodes of a podcast and search  all that wisdom without having to spend hours,   tens of hours listening to all of it and paying  attention and taking notes. This is an incredible   way of researching topics and now you know  how to do it too for free. All right. That's   essentially all I got today. If you have any  other use cases that a long context window like   this unlocks that are interesting, please share  them in the comments below. And other than that,   check out this video that gives you some tips on  how to effectively prompt because everything you   learn there, you're going to be able to apply on a  long context model like Gemini 1. 5 Pro. All right.

Ещё от The AI Advantage

Ctrl+V

Экстракт Знаний в Telegram

Транскрипты, идеи, методички — всё самое полезное из лучших YouTube-каналов.

Подписаться