Экстракт Знаний

The First AI That Can Analyze Video (For FREE)

15:39

The First AI That Can Analyze Video (For FREE)

The AI Advantage 28.03.2024 72 681 просмотров 1 341 лайков обн. 18.02.2026

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

Receive Tailored AI Prompts + Workflows for FREE: https://v82nacfupwr.typeform.com/to/cINgYlm0 Today, we explore Google's AI Studio's latest offering: Gemini 1.5, an AI with an unmatched context length of one million tokens and the exclusive ability to analyze video files. Links: https://aistudio.google.com/app/ Chapters: 00:00 Introducing Google AI Studio 00:30 Google AI Studio interface 01:58 Chat prompt 04:38 Freeform prompt 07:44 Structured prompt 10:49 Gemini 1.5 Use Cases #ai #google #gemini Free AI Resources: 🔑 Get My Free ChatGPT Templates: https://myaiadvantage.com/newsletter 🌟 Receive Tailored AI Prompts + Workflows: https://v82nacfupwr.typeform.com/to/cINgYlm0 👑 Explore Curated AI Tool Rankings: https://community.myaiadvantage.com/c/ai-app-ranking/ 🐦 Twitter: https://twitter.com/TheAIAdvantage 📸 Instagram: https://www.instagram.com/ai.advantage/ Premium Options: 🎓 Join the AI Advantage Courses + Community: https://myaiadvantage.com/community 🛒 Discover Work Focused Presets in the Shop: https://shop.myaiadvantage.com/

Оглавление (6 сегментов)

0:00 Introducing Google AI Studio 125 сл.
0:30 Google AI Studio interface 355 сл.
1:58 Chat prompt 679 сл.
4:38 Freeform prompt 800 сл.
7:44 Structured prompt 774 сл.
10:49 Gemini 1.5 Use Cases 1198 сл.

Introducing Google AI Studio

Google's AI studio just came out of early access, meaning everybody can use this, except Europe, but we'll talk about that. And this is packed with features that I haven't seen anywhere else, and it includes access to the Gemini 1. 5 Pro model, which actually really matters because it has 1 million tokens of context. I'll show you two really interesting use cases that you could be doing immediately with this. So as you can see, I'm not in my home studio. I'm in Las Vegas for AI conference right now. But that's not going to stop me of showing you why you should care about this and what features are not obvious, but really useful here. Okay. So first things first, this is

Google AI Studio interface

a developer interface, but this shouldn't stop you from using it. If you don't care about developing your own apps at all. Matter of fact, developer interfaces, just like OpenAI's Playground, offer way more features than chat interfaces like ChatGPT. You get to do things like switch models quickly, set temperature, and you get advanced features, which you still don't have in ChatGPT, like these prompt presets. I'll cover all of this in a second, but the most important part here is the underlying model. If you go to the AI studio, you will get access to the Gemini 1. 5 Pro model. Okay. There's also the Gemini 1. 0 Pro model, but we won't be talking about that today because the 1. 5 with the 1 million tokens of context is really the star here. If you're confused by the naming, they essentially have the Pro model, which is like ChatGPT 3. 5. And then they have advanced model, which is like GPT 4. In between lives the Gemini 1. 5 Pro. And hey, I already see all the haters in the comments typing, Oh, their AI is biased. Remember what happened with the images? Yes, yes, I know. But matter of fact, there's one setting in here, which I've been asking for ever since the release of ChatGPT, and they actually did it. So kudos to them. And it directly counters most of the comments that people are going to have around what happened. And don't get me wrong. I'm not trying to sweep that under the rug. That was a big deal. And if you follow the channel, you know that my stance generally is let the people decide, let the market decide, give us more options. Don't make these moderation decisions for us. And they did exactly that in here, which I really like. So let's get into touring the interface here. And I'll do this as I usually do on the channel. And this tour will be focused on non-developers. If you're a developer, everything we cover is relevant, because these are the first steps. So first things first, you

Chat prompt

can go to create new and right off the bat, you get three different type of prompts that you could run here. If you're familiar with ChatGPT, which you probably are when you're watching this video, the chat prompt is the simplest to understand. This is a simple prompt that you type in, and it generates a result with the model that you select over here. Very simple. So you know, let's run the classic penguin essay with Gemini 1. 5 Pro in here, you can see it loads for a bit, and then boom, we should get a result in a second here. Alright, and by the way, I'm logged in with my Google account, this is not necessary, it just works, you can just generate results like this. Now here's the first interesting part, at the top you can upload different multimodal file types to add to your prompt. Now one thing is super unique here, you can add video. If you're not aware, there's no other model out there that does video. ChatGPT doesn't do it, Cloud doesn't do it, the open source models don't do it. Here you can straight up upload a video and work with it. It's gonna recognize what's in the video both visually, and it's gonna recognize the audio. But more on this in a second, because beyond this multimodal features, and this chat interface that you're probably familiar with, here on the right we get a temperature setting, as we do in OpenAI's Playground. If you're not aware, this is essentially the creativity of the model, so when you have a high setting, it's gonna be super creative, but it's also gonna be more prone to hallucination. These two are really linked with LLMs, and if you tone down the temperature, you really limit the creativity, and you're gonna get similar results every single time. But as you can see, this is not always editable, so it really depends on what model you use. For example, if I go to Gemini 1. 0 Pro, I get to alter this, with Gemini 1. 5 Pro, this is set in stone. A stop sequence is that if it recognizes certain words, it's gonna stop at that point. So in other words, if you're creating a list of, let's say, YouTube titles, you could say stop sequence 11, and when it arrives at point 11, it will stop the prompt and just give you the output. To be honest, this one is not very useful, but here we get to the one that I talked about, the safety settings. They actually give you control over how you want the model to behave. Now they don't give you all the control, but this is a step in the right direction, I think. And for me, for all four of these, I'm gonna set these to block view. I want the model to give me its output, so I don't want it to censor it for me. I'm a grown man. I can handle that. I am a man. So I'm just gonna set this to block view, and there you go. And then if we go to advanced settings, top peak controls the cutoff for the token probabilities. Now this is something I don't really ever use, but it does work together with temperature, and I know I told you that temperature controls creativity, but it really controls the probability inside of the model. So temperature of one is gonna let it access the full probability spectrum, and the top P of 0. 4 is gonna say, take only the top 40% of those results and get me an answer from within there. But in giving you the full probability spectrum of what these LLMs can generate, it's gonna result in more creative stuff. That's why I communicated it that way. Anyway, we're just gonna close top P, and now let's get into some new features here and the use cases, because there is some. So if we go over here to create new,

Freeform prompt

we talked about the chat prompt, right? But next up, let's talk about the free form prompt, because this is quite interesting. From day one of this channel, I've been communicating prompts in a formula format. If you subscribe to the newsletter and you got the free chat GPT template, you're gonna see that every single prompt in there has square brackets around certain words. Square brackets are there for you to replace the words with your very own variables or use cases or words, whatever you wanna call it. And I didn't just make that up out of thin air. That's how you use these things. That's how you program applications with LLMs and prompts on the backend. Certain parts of the prompt are variable. They're gonna differ. And in Google AI Studio, you can actually set that and really easily add multiple examples and run multiple prompts. This is something I've been wanting inside of chat GPT since a while. So let me show you how it works. And I pulled up this example that I created with the help of this getting started guide. But if you're watching this video, you're not gonna really need the getting started guide. I cover everything in there and more. If you wanna get into developing with this, this is gonna be very helpful. There's a lot of good guides in here. But back to this free form prompt. So what is a free form prompt? A free form prompt is a prompt with a variable in it. And that variable can be defined here at the bottom. Okay, so if I say, look at the following picture and tell me who is the architect, you can see that who is the architect is a test input. And the way I created this is very simple. I just went to a new free form prompt. This is the prompt. Then I said test input. And then what I'm gonna do is I'm just gonna rename the input from inputs to fact, let's say. And then here, I'm gonna say who was the architect, I'm gonna add a new example. And then I'm gonna add three examples that make sense in the context of architectural image. And then I can go ahead and go to image here. You can also take it from your Google Drive. But what I'm gonna just do here is take one of the sample images here. Let's say this pyramid from Egypt here. Amazing. And when I say run here at the bottom, you're gonna see that we're using the Gemini 1. 5 Pro model to run these free prompts on top of the image. And I get all three results down here. And now here's the best part. You get to save all of this, okay? So all of this work. When I do this in chat GPT, I would need to take it and put it inside of a GPT and then access that. Here, you can just quick save prompts like this. You just say save. I'm just gonna say architecture analyst. Fair enough. And I connected my Google Drive here. And this allows me to save the prompt in my Google Drive. As you can see, architecture analyst has been added over here. And I can access it anytime. And you're gonna see the free variables are down here. You could delete them like so. Add new ones. And this is really a great environment to test your prompts and build them out. In all the Notion templates I've ever given away or sold, I always gave multiple examples of what you could put in there. But here, you can actually put it to work and create a template where you have multiple variations of one prompt and you can get the results in a heartbeat like so. And then you could even go ahead and say and add another variable. Variable two. There you go. And here, I could write a second part. So as you can see, you can make this as complex as you want because I can be going in and creating multiple variables. Very useful stuff where you can get quite complex results and you can get a lot done in a short time. And the next time you just want to prompt on top of a different image, you just come in here, you switch out the image, and you don't need to rebuild the whole thing. There you go. I'm just gonna add this. Now, these variables are obviously empty, so I'm just gonna delete that. But rerun it, and I have my new examples right here. All right. So now we talked about the chat prompt and the freeform prompt. Now, let's talk about the

Structured prompt

structured prompt. And this one will be familiar to people who've been watching the channel closely. Because I talked about multi-shot, also called few-shot prompting, many, many times on here. As a matter of fact, I always say it's one of the most useful techniques you could deploy when using a large language model. If you're not familiar, the way it works is essentially you have a prompt, and then you provide multiple examples of how you would like the output to look like, ideally three or more. Because essentially what these LLMs do is they just recognize patterns, and then they recreate those patterns. If you tell it, this is the pattern that I want you to recreate, it's gonna recognize that, and it's gonna give you more of the same. So in other words, providing multiple examples is a really good idea if you want predictable and consistent outputs. So how does this work? I set up a little profile bio generator. This is as simple as it could be. But again, you could plug in any prompt in here. I'm just showing you how this interface works. And basically what I did is I said, write me an Instagram profile bio based on Instagram profile name. Here are some examples. Now, I do recommend including this block, otherwise the resulting prompt is gonna give you all the results every single time. But what I'm doing is I changed the inputs to profile name and the output to profile description. Maybe I should rename this to bio. This would be even more accurate as I use that in the prompt. All right, so if I say VA advantage, simplifying AI everyday, let's just say this would be the profile bio that I expected to generate, that I wanted to generate. Now this could be way more complex. This is just a quick example. But as you can see, I get three of these examples. So for Butter Chicken Enthusiast 9000, enjoying life one nan at a time. Excellent. By the way, the Butter Chicken Enthusiast, that's me. I eat that dish everywhere I go. It has become kind of a must for me. I just love it so much. Anyway, the point here is this. Now that we have the prompt plus three examples, we essentially created something that if I go here to the bottom and just say Cat Enthusiast and I run this, you're gonna see that I get a result here that is very comparable to the three examples that I provided. Now you're gonna see that if I delete these examples, it's gonna be very different. And look at that. The format doesn't resemble the one before. And that's the point of giving it the examples. Now the cool thing here is this. While you watch this tutorial, you actually kind of learned how to fine tune a model because that's what a fine tuned model is. You just need way more example, like over 100 examples. I'm just gonna cancel out here, not save my progress because what I already did is I saved this little structured prompts generator and I named it profile bio generator, as you can see right here. And now if I go to new tuned models, you can see that I can actually pick the profile bio generator because now I effectively fine tuned a model because all fine tuning is it providing it with input output pairs. If I input this, I want this to come out. If you do that 100 times, the model is going to sort of add that to its training data and now be aware of all your input output pairs. So this is a really good strategy if you're trying to model to do one specific thing. If you're only generating Instagram bios, you really want to give it a lot of pairs of Hey, if I input this profile name, I want this bio to be there. And if you do that a lot of times, you're going to find that the model performs so much better in that particular use case. But you can see down here, the minimum is 20 I only provided free. So for fine tuned models, you really need more with open AI, you need over 100 examples for this to work super reliably. But there you go. This is probably the easiest way I've seen to fine tune a model, you know how it works. So yeah, that is the structure prompt. So now we covered it all. Now the last thing that is left to talk about

Gemini 1.5 Use Cases

here is use cases for the Gemini 1. 5 pro model because it really is a unique model. A million tokens of context is something that is unheard of. No other model does this. Claude Opus has 200k, GPT-4 has 128k. If you watched last week's use case episode that we do every single Friday, you will have heard about GPT-4. 5 that is rumored to have 256k tokens of context. This has a million that is a lot, okay. So I'm going to show you two ways how you can take advantage of that. Because in most everyday use cases, this is not the most useful thing. You don't need that much context. But here are two where you could use it. Well, the first one is maybe a little unexpected. You could upload every manual there is, right? And this doesn't really work with chat GPT because the context isn't long enough to accommodate something like this fridges manual. I don't know. I just took a random fridge of the LG site. You could take literally any manual. If you didn't know, pretty much every manual is available online. So you could just do this for any appliance. I'm just going to go here, go to file, file upload. As you can see, you could also do it directly from your drive. I'm just going to drag this over and it shows the token count here. So I'm going to need for external tokenizer. That's another neat feature. And now from the 63 page manual, I could ask something concrete. So for example, ice will not dispense if any of the refrigerator doors are left open. I'll just ask, why is the ice not dispensing? And while it does that, I mean, just think about the fact that's a very long document, only 37,000 tokens. So you could easily use all the other models we talked about for this. I'm just saying some manuals will be way too long for something like this. If you have a 500 page PDF, well, you know how to use it now. And there you go. LG refrigerator, it gives me all the different reasons why that could be the case based on the PDF. Okay. So that's one. And here's another one. I'm just going to go to the latest episode of my favorite podcast here. And what you can do is you can pull the transcript. You don't even need an external app. It's inside of YouTube. So if you go down here, show transcript, then here on the left side, you can toggle timestamps. And what you can do now is just select all of this, like so, copy. And now in a new chat prompt, I can simply add all of this as context. Here's a podcast I listen to, Paste. Now, one tricky thing is that if you just press command V control V on Windows, it won't work right away. You need to add shift to remove the formatting. So command shift V on a Mac here, and it's going to paste it without the formatting. And there you go. Look at the super, super long transcript of this very long podcast. I mean, this is over one hour, right? So just by default, you get a summary here. But you could do this with things that are way longer, because I only used 18,000 tokens here, meaning we could even do a Lex Friedman podcast or multiple Lex Friedman podcast transcripts in here. And then you can talk to them. Every single prompt you've ever discovered will work with this. And you can do it on long form content like this and do deep research. What about this one with Bill Ackman, I actually listened to this monster of a podcast on my trip to the US, short transcript. This is where the scroll bar comes in really handy, because there's just so much of this. And look, if you want to go beyond this and want to use this regularly in your company, there's totally programmatical ways of doing this, of pulling the transcripts off YouTube. I'm just showing you manually of how to get your feet wet, how to do this manually yourself. Here's another one. I'm just gonna paste this, Command Shift V. Oops, I just realized I copied the same one in again. But it doesn't matter. We have a lot of context. So we can just simply send a new message. Yep, this is the Bill Ackman interview here. All right, this is gonna take a second. But as soon as it loads it in, we have power at our disposal that simply wasn't possible a month or two ago. And everybody has this now. Oh, actually, everybody except Europe, if you're in Europe, you've got to use a VPN. Now that I'm in Vegas, I can actually access it for the first time without having to use VPNs. So that's not the biggest workaround in the world. And I think it's worth it. This is completely free. And playing with long context like this turns out to be very valuable. I mean, I guess it depends if it's valuable on what you do. But I would just say give it a shot. Once you experience this yourself of being able to copy a three hour podcast and another one hour podcast into this, you're going to realize that you could be doing this with speeches of your favorite people. And you could be prompting on top of all the wisdom that is contained inside of that. So even now, we're just at 85,000 tokens of context. So that is less than one tenth of what's possible here. But either way, I'm just going to go ahead and try and ask a specific question about a point that was covered in the Lex and Bill Ackman podcast about a big conflict here with Carl Icahn. So I'm just going to ask who abused Bill Ackman's short position. Now, this is something that won't be in the training data. It was in the podcast. It was concretely discussed in there. I won't get into the details. But the answer we want here is Carl Icahn. Let's see if it gets it. There you go. Just think about the fact that we have five hours in here and we're only at 85,000 tokens. So you could do multiple episodes of a podcast and search all that wisdom without having to spend hours, tens of hours listening to all of it and paying attention and taking notes. This is an incredible way of researching topics and now you know how to do it too for free. All right. That's essentially all I got today. If you have any other use cases that a long context window like this unlocks that are interesting, please share them in the comments below. And other than that, check out this video that gives you some tips on how to effectively prompt because everything you learn there, you're going to be able to apply on a long context model like Gemini 1. 5 Pro. All right.

Ещё от The AI Advantage

How To Use ChatGPT by OpenAI For Beginners

The AI Advantage | 08.12.2022 | 1 сегм. | 5 334 269

How To Generate INSANE AI Art For Beginners (Midjourney)

The AI Advantage | 25.12.2022 | 1 сегм. | 2 277 638

26 Incredible Use Cases for the New GPT-4o

The AI Advantage | 15.05.2024 | 21 сегм. | 877 236

5 Secrets to Writing with Chat GPT (Use Responsibly)

The AI Advantage | 19.12.2022 | 6 сегм. | 815 831

ChatGPT in Python for Beginners - Build A Chatbot

The AI Advantage | 07.03.2023 | 4 сегм. | 505 889

How to Make & Edit Images with ChatGPT for Beginners

The AI Advantage | 26.03.2025 | 4 сегм. | 397 200

Ctrl+V

Экстракт Знаний в Telegram

Транскрипты, идеи, методички — всё самое полезное из лучших YouTube-каналов.

Подписаться