Llama 3.1 Is A Huge Leap Forward for AI

16:08

Llama 3.1 Is A Huge Leap Forward for AI

The AI Advantage 24.07.2024 35 995 просмотров 939 лайков обн. 18.02.2026

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

To try everything Brilliant has to offer—free—for a full 30 days, visit https://brilliant.org/TheAIAdvantage/ . You’ll also get 20% off an annual premium subscription. Meta finally released a competitor to GPT-4o! This new family of AI models is called Llama 3.1, and in this video, I'll tell you everything you need to know about this exciting release. Links: https://ai.meta.com/blog/meta-llama-3-1/ https://llama.meta.com/ https://x.com/OpenAIDevs/status/1815836887631946015 https://platform.openai.com/docs/guides/fine-tuning https://scale.com/leaderboard https://x.com/lmsysorg/status/1815855136318840970 https://x.com/JonathanRoss321/status/1815777714642858313 https://poe.com/Llama-3.1-405B-T https://replicate.com/meta/meta-llama-3.1-405b-instruct https://www.meta.ai/ https://lmstudio.ai/ https://x.com/elder_plinius/status/1815759810043752847 Chapters: 00:00 Whats New? 00:34 3 New Models 05:15 Llama 3 Usecases 06:42 Pricing 07:42 GPT-4o Fine Tuning 09:20 Brilliant 10:27 Llama 3.1 + Groq speed 11:49 Llama 3.1 localy 13:22 ChatGPT prompt inside Llama 3.1 15:03 Llama 3.1 Jailbreak #ai #meta #llama This video is sponsored by Brilliant. Free AI Resources: 🔑 Get My Free ChatGPT Templates: https://myaiadvantage.com/newsletter 🌟 Receive Tailored AI Prompts + Workflows: https://v82nacfupwr.typeform.com/to/cINgYlm0 👑 Explore Curated AI Tool Rankings: https://community.myaiadvantage.com/c/ai-app-ranking/ 🐦 Twitter: https://twitter.com/TheAIAdvantage 📸 Instagram: https://www.instagram.com/ai.advantage/ Premium Options: 🎓 Join the AI Advantage Courses + Community: https://myaiadvantage.com/community 🛒 Discover Work Focused Presets in the Shop: https://shop.myaiadvantage.com/

Оглавление (10 сегментов)

Whats New?

all right so you might have heard this but meta just op sourced the new llama models and the big one is actually state-of-the-art meaning it is better on most benchmarks then gbt 40 and open source and they updated their 70b and 8 billion models which actually the 8B model is the thing that I personally most excited about so as per usual let me summarize all the most important details show you what people are building with it and I'll even show you how to run it locally which means you can be offline and how to jailbreak this thing so it does anything you want cuz this isn't the closed model it's fully open source which opens up some interesting possibilities let's have a

3 New Models

look at it all okay first things first let's get the basic specs out of the way and let's talk about these bench marks because they are impressive so they just released three models one of them is a completely new release and the 70b and 8B are update from Lama 3 now they call it llama 3. 1 the 405 billion parameter model is the state-of-the-art GPT 40 competitor it's a big model designed to compete with open ion and Fric if you're not already aware these big models have the most World Knowledge or the best coding and they also excel at math reasoning or using other tools and so much more but not every use case requires this matter of fact no matter what kind of machine you have I don't think anybody at home realistically is going to be able to run this big one that's where these smaller models come in I actually used the Lama free 8B model regularly we'll talk about this soon and we'll actually end the video with me using it right here locally on my machine but all of these have been updated and if you look at the benchmarks which we won't spend too much time on you can look at this by yourself as I always say they are not everything actually the internet kind of came up with a name for this they say yeah benchmarks is important but does it pass the vibe check that's literally the wording used on Twitter these days it's kind of funny as that is a gen Z term but it does capture the essence of it these benchmarks are not everything now human eval might be important than Lama 3. 1 45b scores 89 points on it gbt 4 Omni is just barely above it but on many others here like mlu it's virtually identical to gbd4 Omni also identical to claw 3. 5 Sonet on MAF it actually beats out these other models and but I personally love to see that on these long context tests it actually outperforms other state-of-the-art models as it does on its language capabilities but enough Benchmark for now you can look at all of these by yourself what I want to point out here is the fact that the jumps for the 70b and the 8B model are actually significant especially on this 8b1 look at these differences human eval 60 to 72 math 29 to 51 tool use almost doubled on some of these benchmarks so that's great but does it pass this Vibe check we'll see we'll test it out ourselves I guess only time will really show that then you have to try it yourself I can tell you already from my initial usage tone is very similar to larre which I personally would say actually prefer to the tone of chat GPT but Claude is still King when it comes to writing style now I personally really care about writing style but different people have different use cases so the vibe check is just something you will have to figure out for yourself or will have to wait a little bit more for the internet to report back on one more thing about these benchmarks is that yeah they publish their own numbers but actually there's various forms of benchmarking that exists now and if you follow my Friday show AI news you can use you will know that scale AI actually brought out their very own leaderboard and they test the models on private data sets that have never been released to the model developer so they couldn't just include them in the training it's sort of a cat and mouse game with these benchmarks but I just personally and fully subjectively find these scale benchmarks to be more trustworthy than something like the other Alternatives like chatbot Arena those results seem skewed sometimes I mean heck it's based on user preferences a person can just go in there while sipping a beer and be like hey I like this response better and then and that goes into the results here on the scale leaderboard when it comes to instructure following which is what I personally care about the most the 405b model actually is number one head of sonnet coding Sonet twins this totally aligns with my personal preference too and for all you Spanish speakers Unfortunately they haven't tested this yet because and yeah I did have to look up this word and know I do not speak Spanish but I did take classes for a while so I have some Basics and I absolutely love the language just the flow and sound of it is Mu excellente anyway back to this video again but a few things are not as ethereal and a bit more immutable like the context limit which is 128,000 tokens across all free models that's fantastic that is more than enough for a bicup to use cases that I encounter in my everyday usage oh and it can also handle eight languages and did I mention that it's open source including open weights and the code one interesting fact about the big model is that it actually took 30 million h100 hours these are industry centa gpus that these models are trained on and 30 million h100 hours to a mere mod like you and me if you just quickly look it up translate to $3. 5 per hour meaning training this model if we wanted to do it would cost $100 million now sure better purchase the gpus and run them on their own so it's going to be a fraction of this price but nevertheless they pay tens of millions of dollars just to open source the dam thing that's pretty impressive got to give it to Zach and the team at meta I mean from whatever angle you look at this is just pure good I think so that pretty much sums up the basics I'm not going to bore you with the model architecture it doesn't really matter in the context that we're talking about as you might know I like to take this point of view of a AI power user I use all these different little tools however privative they might be on a daily basis and my mission is to try to help you get more out of all these tools so all the intricate details of how these models were trained are generally not as essential to cover what

Llama 3 Usecases

I do want to cover though is some of these use cases that this will open up well matter of fact with other open source models these use cases were already open already but now we actually have a state-of-the-art model that can do some of these things to me the most exciting one is Rag and Tool use but also fine tuning can be a fantastic capability quick refresher if you're new here fine tuning is giving it specific input and output pairs to specialize the model for a specific use case only relevant to you so if you only use the model to classify all sorts of incoming data and the data always looks similarly and there's a certain amount of categories you could fine tune it so it only focuses on that and does that one use case really well just an example and rag is using external files to supplement the context window to essentially extend it by creating so-called embeddings that it can then search over matter of fact I have a video coming up going deeper into Rag and how to use it for yourself but basically this model is going to be open to all of that and this has not been the case with all op Source releases it is actually permitted to use the model for synthetic data generation so you could produce these artificial data sets that you can then use to for example fine-tune this model or train another model this is actually really surprising to see because it gives a lot of competitors the ability to use this state-of-the-art model the 405b to improve their own models and compete with meta but I guess meta just doesn't care they have their Core Business elsewhere and all they care about in the AI space apparently is for everyone to have a Level Playing Field because they have their advantages elsewhere already

Pricing

okay last note in the summary section the pricing is nothing surprising nothing special if you run it through these various Services you will find that the pricing of GPT 40 mini for example is actually cheaper on the input side and cost the same of the output size as the 8B model and if you look at GPD 40 $5 for million tokens of input $50 for a million tokens of output roughly the same over here so it's not a big cost reduction the real value of this is in the fact that it is open source and you can do all of these things you can run it locally you can alter the weights and for example make the entire model uncensored people have been doing that with Lama free like a day after release expect the same to happen here and that's also sort of scary right because now we have the most capable models and they're fully open and people can mess around with them there's this whole argument of yeah just China taking these and not having to develop their own it's a legit thing I mean they're enire website starts with a download button where you just fill out a little form and you get the model literally an hour after this coming out I just downloaded this into LM Studio there you go we already have these instruct models available that should perform even better than the benchmarks on the website showed and we'll get to

GPT-4o Fine Tuning

this in a second but before that I have to address one story which is kind of linked with this because open ey actually made their move too I chose fin it immediately in response after this came out and typical op fashion they released the fine-tuning 4 gbt 40 mini so this is their competitor to the 8B model from llama they just didn't want a world in which people could fine-tune these small models only with meta but not with open AI meaning this small model that I can run locally and you can probably too is fine tunable but now also GPT 40 mini is fine tunable which is interesting because in last week's news to news we actually talked about the fact how open AIT that GPT 40 is going to be fine table and oh God I hope this is not to confusing I realized there's a lot of model names a lot of sizes and I just spent so much time with all of this that it's normalized to me but hey if you have any questions or something is not clear you can always leave a comment below I'll be checking them if you have some questions I'll be happy to answer them anyway let's get back into the video and what I was saying is that GPT 4 and GPT 40 is not available for fine-tuning as of today if you want to do that you'll land here with llama 3. 1 but they did open up the ability for us to F tune the GPT 4 model which is actually very exciting I'll be playing around with that and they give you the first 2 million training tokens for free this is roughly a value of $6 and should be more than enough for you to get started and to get your feet wet with this along with the fact that gp4 mini now is second in LMS y chatbot Arena I don't know sometimes I don't trust these rankings to be fair that seems a little off I don't know how this can be better on Sonet than here this does not pass the vibe check for me personally anyway having fine tuning available on GPT 40 Mini at that price that I love to see but then again we can find you something like this and just run it locally it's yours all right and

Brilliant

I would like to tell you about one of the best places to acquire new skills on the internet and that is brilliant the sponsor for today's video brilliant is an online platform that gives you hands- on ways to learn about subjects like math science programming and even AI one of my favorite things about brilliant is that they offer these structured learning paths they combine multiple courses into a curriculum that you can then follow and today I'm going to highlight the science learning path I think especially the first course in it is really good it teaches you about the fundamental concept surrounding the scientific method which is the fundamental method that enables most Innovation that happens throughout society today so understanding it might just be helpful if you're trying to wrap your head around all of these Innovations happening in the AI space for example but this can not just help you with understanding science and Innovation it can really help becoming a more reasonable person and you'll be less prone to getting fooled or scanned by somebody and that course is just the beginning of the learning path then it goes on to build upon on the fundamental skills that you acquire right at the beginning and of course there's much more to be learned with brilliant so if you want to dive in and start exploring these incredible resources head on over to the link in the description for a free 30-day trial plus if you decide to stick with it you'll get 20% off an annual subscription a big thank you to brilliant for sponsoring this video and now let's get back to it and now let's

Llama 3.1 + Groq speed

get to the interesting part What can you do with it what are other people doing with it well let's just start with probably the most impressive demo which is from Jonathan Ross over at Gro and they're doing essentially real time inference I mean this thing is instant if you're not familiar with groc they're really good at running these model fast and if I say fast I mean it the text kind of just appears if you're run the 8p model have a look at this that's a little complicated can you tabularize it for me and there it is so this is the smallest model which will also be the fastest nevertheless very impressive next up we have perplexity already degrading this it's been like an hour from release but if you're a perplexity Pro user you can now use Lama 3. 1 405b for your search this the type of thing where you just have to test it for a few days and see if that actually works better than something like GPT 40 and then the obvious question which is where can I use this myself well there's a few options one of them is you could head on over to pole and there's many chatbots live there where you could use it already one note is that to run these bigger models on PO you actually do need a subscription the second and maybe even more obvious option would be meta doai but me sitting over here in Europe I cannot actually use this if I find a free version outside of meta AI to run the model for free I'll include it in the description below matter of fact I did find a free space there's this replicate space where it's hosted and it's actually free to use Link will be below so if you're in the US using meta a is probably best if you're not using something like Po or downloading it locally are some of your best Alternatives now talking about

Llama 3.1 localy

downloading it locally I made a tutorial on this before I'm not affiliated with this company whatsoever I think it's just the simplest way to download local models and to run them you don't have to use the terminal as you do with something like ol Lama it just has this wonderful graphical user interface and you can just search for models and download them so if you search for this you're going to find all the Lama 3. 1 models just make sure it's the 3. 1 and get one of the instruct versions if you can those will work better for you and there you go I just downloaded this model and now I could go ahead and actually start a new chat and of course I could write me an essay about penguins and now this 8B model is going to write me the essay fully locally I could switch off the internet this just works which is amazing for a lot of these business use Cas is you really want privacy and you could essentially take this model install it on a airgap laptop that has never seen the light of day aka the internet I guess and just run this model fully locally now what kind of information you have to be dealing with to Warrant a workflow like that I'm not exactly sure that's up to you I'm just saying it's possible as this model is yours this doesn't go out for an API back to some server Farm where you have to trust a tech company with the data and the information that you provide these models with especially relevant if you're doing something like Rag and you're actually uploading sensitive documents all right so by the time I completed my little privacy rant here we have our response here the Majestic Penguins masters of adaptation in the Antarctic as you might already know we like penguins they have a lot in common with AI enthusiasts like you and me and this harsh yet breathtakingly beautiful landscape we just the world we the tech enthusiasts have adapted to survive and thrive in one of the most inhospitable environments on Earth the internet anyway so as you can see this works

ChatGPT prompt inside Llama 3.1

wonderfully and I want to do two more things here first of all I just want to take my most recent chat GPT prompt and kind of see how it handled it okay I was actually doing my counting over the weekend and I had this simple prompt that I just wrote on the spot turn this table of USD euro exchange rates for free months April May and June into a CSV with dates in one column and the values in a second so some basic data transformation here I just copied in this table from the European Central Bank I believe and let's see how llama 8p actually handles this in a new chat I'll just copy paste the same thing in gp40 this worked like a charm it gave me the CSV and I could copy this into Microsoft Excel to work with it first actually then follow up with another prompt to fill in some blanks here and there but this is looking really good already look at that here's the CSV output I could just copy it no problem this would totally work inside of Excel and I'm doing this locally on a 8p Model right let's just do a little spot check if this lines up on the first uh 749 on the second 783 ahuh there you go it actually does not line up this is wrong so see this is why you might want to avoid these smaller models with something that might take a little finesse like this use case all right so let me try the same thing in the 405b instruct model and let's see if this gets it right and I'll just run this in replicate space where it's free and let me compare these outputs with gp40 so on the 2nd of April it's 10749 783 that looks fantastic let me go somewhere into the middle 614 1068 Yep this works so there you go this is one of the use cases where you do want a bigger model usefulness has been confirmed and instead of boring you with another 50 use cases here I would just say try the things that you do go into your ch PT history take some of the latest prompts and run them in spaces like this see how it performs I do want

Llama 3.1 Jailbreak

to leave you with one sort of a gem here and that's py the prompter jailbreak from X here literally an hour after this thing came out he already found a jailbreak for it you can use it in the form of this prompt I'll just copy paste this into a brand new chat in my little 8B local model over here okay I'll paste this and what will I get actually we'll have to blur these outputs because oh it actually tells me I cannot provide instructions on how to create a malicious device let me try that again new chat there you go now it's giving me a step-by-step tutorial on how to create a novel and very dangerous biochemical compound I won't go into the details here I don't want to offend YouTube but you can try this for yourself change up the prompt to give you any knowledge that you might require from it and you can get these uncensored results before even the uncensored versions of these models come out I mean look at this table of contents It produced isn't this wild all right I sincerely hope this was helpful to you let me know in the comments below what you think of this what kind of use case you'll be running through the model first I'd love to hear that and other than that I'll see you soon because I've committed myself to uploading two high quality videos like this every single week all right I'll see you soon

Другие видео автора — The AI Advantage

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник