Complete AI Course (2025): Master Prompting, Evals, RAG & Fine-Tuning in 46 Minutes | Adam (Meta)

46:22

Complete AI Course (2025): Master Prompting, Evals, RAG & Fine-Tuning in 46 Minutes | Adam (Meta)

Peter Yang 01.06.2025 11 927 просмотров 321 лайков обн. 18.02.2026

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

Adam Loving (Meta) has helped 100s of companies craft AI products. Our interview is a complete, beginner friendly AI course on how to use prompting, evals, RAG, and fine-tuning to build great AI products. Timestamps: (00:00) The 2 types of AI optimizations every PM needs to know (02:52) 4 tips to craft compelling AI prompts (07:14) 4 types of AI evaluations to consider (12:06) The scoring trick for advanced AI evaluations (21:50) Retrieval augmented generation explained (27:29) Is fine-tuning basically lobotomizing your AI model? (30:31) Now I finally understand vector databases (41:40) Why Meta thinks open source will win (44:04) Adam's best advice after leading 100s of AI integrations Where to find Adam: LinkedIn: https://www.linkedin.com/in/adamloving/ Website: https://www.llama.com/ Get the takeaways: https://creatoreconomy.so/p/complete-ai-course-on-prompting-evals-rag-fine-tuning-adam-loving 📌 Subscribe to this channel – more interviews coming soon!

Оглавление (9 сегментов)

The 2 types of AI optimizations every PM needs to know

Basically, what we're talking about is like the different ways to improve and optimize AI's responses. How are they different from each other? Every prompt that I'm writing, it's sort of like you want it to just complete your thought at the end of it and you sort of specify as much as you can. People shouldn't be surprised if they're not writing evals today. I think it's totally common like every business on Earth is just trying to figure out how to get AI into their product. What are some basic tips to make this thing respond? Well, I mean, it's useful to think of the system prompt separate from the main prompt. Why does Meta think open source is going to be the right solution here. Okay, welcome everyone. My guest today is Adam Loving, AI partner engineer at Meta. Adam has helped hundreds of companies integrate AI into their products. So really excited to get him to do a technical deep dive on prompting eval retrieval argument generation fine-tuning and just building great AI products in general. This is going to be a complete course on AI from Adam. So welcome Adam. Yeah, thank you. Okay, so let's take a step back here and uh basically what we're talking about is like the different ways to improve and optimize AI's responses, right? And like uh this is a great diagram. Maybe talk about like the two types the two different types of optimizations, you know, like how are they different from each other? Well, what we're optimizing for when you ask an ill question is the information that you're sending it. So, you always hear about the context window length that corresponds to how much information you can send along with your question that it's going to read through before it gives you the answer. Retrieve augmented generation is all about optimizing what you send in the context window to the LLM. And then beyond that, why people love Llama is they can take it and they can fine-tune it. And you can do this with other models as well, but Llama is sort of the most well documented one. You know, fine-tuning is sort of like, you know, you've graduated from college and now you're getting trained in a particular job, right? So you're loading into the model specific information and specific formatting that you want the it to follow. And so it's a way of constraining that large foundation model into only answering questions in a certain way and mostly restricting it in the way it answers. So in some sense dumbing it down or making it more focused. Got it. Okay. And uh so before you get to fine-tuning and rag uh you want to start prompting. So uh I wrote some like specific tips here. Let's say there is like a Lurlemon customer support chatbot. I buy a lot of codes with L. I'm not sure if I should admit this, but yeah. So, so let's say you had to, you know, prompt engineer to say like, what are some basic tips to make this thing respond well with prompting alone? Yeah.

4 tips to craft compelling AI prompts

I mean, it's useful to think of the system prompt separate from the main prompt, too. Although, you know, if you're using it as a user, you probably won't see the system prompt. If you're using it as an engineer, you see the system prompt. The system prompt is very much like you are an expert in fitness clothing. Here are the top 10 reasons why, you know, people should buy Lululemon. And that's all loaded into the system prompt. You want it to be in every conversation. It to have that top of mind as it talks to you. Yeah. Um but then if you are a user or you're again an engineer writing prompts then you have these two shot examples that we mentioned uh specific examples and then you want to have it to actually reason through its answer. Um and you may actually have it do one thing at a time like you have in your slide there. So multiple steps. So first um you know first consider my workout plan. Okay. now recommend what clothes I should buy as a secondary step. And you could even have it like ask me, you've prompted to ask me questions about uh my workout until you have enough information to recommend what I should buy. Got it. Yeah. I I wasn't familiar with like one to three, but step four is something that you already taught me. I think um uh like actually spay a long prompt into multiple short prompts and each one has their own e evals. Uh, and like kind of like it's easier for the LM to understand, right? It only has one task versus like a bunch of different tasks. Yeah. What I heard one person describe it as sculpting. So, you're sort of like starting with you have a foundation model. It knows a little bit about everything. You feed it more detail about your world. Then you're specifically giving it one thing and it's sort of like by the time it's completing your question, it's like really obvious what you want it to say. And so that might be, you know, like we said, first um ask me about my workout and then don't worry about the clothes yet. Let's figure out like give you more information once you have all that information. So I always think about that. Every prompt that I'm writing, it's sort of like you want it to just complete your thought at the end of it and you sort of specify as much as you can so that it's sculpted down to just the correct answer. Can't not give you the right answer. Yeah. and and uh for splitting complex tasks into steps like uh there's a use case of like you know the AI asking user for more information but there's also a way to split the steps uh initially like before like when I ask my first question like what is your return policy like it actually goes through a bunch of conversations with itself right is that right so behind the scenes there um so it that question what is your return policy could be appended into a prompt by on the by the software on the back end. And that software is adding a prompt asking telling the um you know that that software might load up examples from other customer service chats or FAQs, right? And then it or puts that into the prompt and then it would say, okay, read the FAQs, figure out list the relevant ones. Okay. Now, format an answer for the user that's um relevant to the question that they asked. Got it. Yeah. Because it probably has information like if you build a Slooh limiting property, it probably has information about my order history that you know maybe I didn't ask it here, but like it probably load that into the context, right? To answer it well. Yeah. Okay. Got it. Okay. Um, so, uh, before we talk about rag and fine-tuning, like I think prompting is like very tied to evals or everything's tied to evals, but I think eval like someone said eval the number one skill that PMs need to build. I mean, you're already an expert in evals. So, so uh what is an eval? Let's start with that. Yeah, it's a test case, right? Um, but unlike traditional software test cases, this might be a stochastic one where you're hoping 80% of the time you get close. Like you're not looking for the exact answer all the time, but you're looking for the general answer 80% of the time to pass an eval. That's how I think about it. Um, and how do you think about these different types of evals? Like, you know, maybe you can walk through each one of them at a high level. Yeah. Um, well, first of all, I mean, people shouldn't be

4 types of AI evaluations to consider

surprised if they're not writing evals today. I think it's totally common. Like, every business on Earth is just trying to figure out how to get AI into their product. And then secondarily, once you get something kind of working, you want to add the second thing, but you got to make sure you don't break the first thing. And I think that's when most people say okay I need to write some eval basically test cases to assess to make sure that I don't break the first prompt as I add another prompt another use case okay or more customer data or the customer data schema changes or whatever so um totally normal to say okay we got AI working in production say for this use chat use case we want to make sure that if for a hundred most common questions that the AI is generally giving a good answer. So now we establish 100 E valves for each of those common questions and um and then you run the LM and see what it how it would answer each of those 100 questions. And so these four types of evaluation here would be how you grade that answer, right? So um you could have a human go through and read all hundred answers and make sure that it gave good answers. that be kind of painful if you're going to do that with every new product release or every time you change those prompts and change the customer schema, whatever. Or you can do a programmatic eval where maybe you have a program that just tests to make sure that specific words are in those answers. So every time someone asks about return policy, it should mention 30 days. It could be slightly different every time, but you have a code check that just looks for the phrase 30 days and that's considered passing, which I think I would recommend, but again, you still have to write that eval once a human has to write it, but at least every time you run it. Um, it's the same thing. It's just easy programmatic check runs fast. Next thing you can do is take another model and have it grade the answer. Um, so the good news is you might if you've got a small fast model doing the customer service chat in this point or a fine tune model, you could use a more expensive model because you're only going to do this 100 times a day or what have you to actually grade the answers before you upgrade that core prompt. So you use an AI eval um or just it doesn't even have to be a more expensive model, could be a different variation of the model that you've got. just something to sanity check and flag um the answers there. So then you're writing another prompt uh which is kind of a meta process. So you're uh writing another prompt to say if the original question was this the answer was this based on this context does that look right to you? Then yeah user revolves like you said this is sort of something that OpenAI made famous with their original GPT launch. They call this reinforcement learning with human feedback, right? So when you're chatting with an AI online and you see the little thumbs up, thumbs down button, that itself can be data a data set that's fed back into the training process um or considered eval um the end users to be the ones that are grading the answers for the LLM. You could also have a process which incorporates that data to gauge how good your um LLM is doing as you know from one feature to the next. Um so with uh so like with programmatic evals there's like uh maybe like a yes no answer like you know it's 30 day mentioned when the return policy question is asked and if it's not then it fails but like with that human and the AI evals like uh like what are you actually great grading this thing on like let's take the little lemon thing you know is like accuracy tone or what are you trying to grade this thing on? Yeah, the common one is hallucinations. Yeah, absolutely. Accuracy and tone. I guess under accuracy, you know, so tone is was it always friendly and nice, right? Never said anything mean. Um can absolutely have a second LM check that. Uh hallucination is the other big one, right? You don't want it to recommend a product that you don't sell. So, um you could have that second LLM first enumerate or list all the products that it recommended to you and then um also provide it a list of the products that it's allowed to recommend to you and just say you're just san you're just having a sanity check. Make sure that the recommendation didn't contain any products that weren't in the other list. Um yeah, you know. Okay. So, yeah, go ahead. Split that up line by line. if it's got multiple recommendations and have multiple LLM calls, right? Or incorporate both of those as separate validations. Check one to check the tone, one to check each line is factual, etc. as much detail as you want to go into. Yeah. And these are just like uh scores out of like five out of 10, right? Like you know. Yeah. Um my

The scoring trick for advanced AI evaluations

favorite thing to do or what seems to work well is to make it um basically a plus one for everything that it finds that's good about the about what it's evaluating. So rather than just say choose a score one to 10, it doesn't seem to have very good granulation or reasoning for why it's a six versus a seven. It's just kind of it's broadly correct when you but LLMs are not general are not great at summarizing into a single score. So, what I'll do is say in your example, give it plus one if the tone is correct, plus one if for each correct product that's mentioned. Um, maybe you see minus 99 if it mentions a product that we don't sell. And then you can parse those scores out using your code, add up the scores, and now you've got more accurate count out of 10. That's also, by the way, like forcing it to reason a bit because it's got to list each reason why it thinks it deserves a better score. That's interesting. It would be better at that than just trying to come up with a score without thinking about it. Yeah, that is great, man. That I never heard that before. That's a mass tip. And so you have your 100 test cases and then you just like do some sort of uh plus minus score for each of them. Yeah. Hopefully you like you sum them up and hopefully the total score is above X that it's like a pass or something. Yeah. Yeah, exactly. We could add them up in code and provide you know have a have the red alarm bells go off if anything scores less than 50% or whatever you decide is the threshold but less than five. Um that works great for lead scoring. You know do that with spreadsheets full of leads for um uh yeah any grading or the quality of a report or what have you. just have it give plus one for everything good and minus one for everything bad. And yeah, and this is like not something that you can just like one done it, right? Like you got to keep improving the evals as you modify the prompt and the other stuff. Yeah. I mean I think there's a sweet spot, right? So when you're testing a new AI feature, you might give it a lot to do and you might test that with users and you might see whether they like it or not. And then as you start to figure out what that LM what that role of that feature that AI feature is of your application then you then narrow in on okay it's really just doing this one small thing. Um so in this case it's just picking an answer from the 100 FAQ answers that we gave it. Um and therefore you don't need one of those big large LLMs and you don't need sophisticated eval. So as you it's I think it's perfectly fine to choose something kind of spaghetti against the wall to begin with and then as it works for users you write more evals you narrow in on exactly what it could do and then oh by the way if you have 10 million users doing that every day then you would fine-tune so that you have a model that does only that. So you've taken this giant foundation model and you've created what's very much a traditional machine learning model that just does one little thing well and reliably. Oh, interesting. That's the trend. That's the trend that we're doing, you know, like like you mean you see users asking only a particular kind of question or you realize that it's only actually good at answering a particular type of question or see users asking because ultimately you're just trying to build a thing that answers the FAQs and then handle things it doesn't know about. It says I don't know and starts the starts a customer chat, right? Don't have it make up answers that it doesn't know. That's a key reason why people fine-tune. Um, and I mean I think so, you know, there's not that much magic in AI, right? I mean, it seems magic because you haven't read the whole internet like a foundation model has read. Um, so it seems there can be moments of magic, but over time we start to understand why that magic happens. And so we're really just constraining it to do the type of magic that we want to do in that moment, I guess, was another way to put it. I'm not relying on an LLM to somehow magically sell me on a workout program for Lululemon in this example. Like it needs me to it needs Lululemon to seed it with enough information to say why the products are good and why I should use them for my workout still. Yeah. and like you're trying like I think that's another uh key lesson like if you're building like a chatbot like you don't want to compete against like you know chat GPT or something like you're trying to solve a specific you use case not like a horizontal thing. Yeah. Right. Um how about like uh real quick like because I struggle with this like for human evals is it better to write a bunch of like ground truth answers manually or just like have a bunch of humans grade AI's answers like you do both? It's like very timeconuming. Yeah. Ideally, you have that sort of golden answer set and you invest in that upfront once you see that something's going to be part of production and part, you know, then it's worth investing in. Um because you don't want to redo it. So, I think it's just a hurdle when you hit that right stage, you invested in. Yeah. Either you do it internally or there's companies I don't have one to recommend. I know we've worked with a few. I don't know their names, but you can outsource that, right? And yeah, one lazy thing I do is like, you know, I I get some fancy AI to generate answer and I manually modify it to the ground script answer. Yeah. Have it take all the first pass of writing them and you just go in and head with them and cut out the outliers like Yeah. I mean that that's what I do for all the internal docs that I write anyway. So great. Yeah. Uh what about we forgot one type we forgot like adversary or even like you try ask to make a bomb or something right like you got to try to break it you know yeah I think the pattern I'm seeing there is people will use even a separate model before and after they go in into the LLM to you know the main foundation LLM to um filter those types of responses so again that can be a fine-tuned one you can get one off the shelf meta we have a project an open source project called launch guard that if you're an application developer, you can grab that and basically put it into your chain before that request gets into the core code at all to filter it out. Can you walk through how like let's say I ask L1 to make a bomb like it will go llama guard first or Yeah. Basically, it's a simpler model that doesn't know everything about the world, but it's really good at recognizing adversarial questions. That might impact like latency or something, right? How does that potentially that's why it's slower and yeah hopefully fast. Okay. And those could be programmatic checks too. I know there's a system on bedrock um for guardrails which is largely programmatic meaning like we have programmatic eval be programmatic checks. So just using the word bomb might be enough to like take yoga pants or the bomb might not get through. Yeah. Yeah. It's just like a bad word check, correct? Yeah. Got it. Some gradation there depending on how strict and how much how slow you want it to be. You could build a custom solution for that. Yeah. Um let's close Evals by talking about a recent example where kind of eval are you familiar with like OpenAI's latest GPA model. They had a roll back. Yeah. Because it was too friendly or too cheesy in its texture. Yeah. Like they went through the whole eval process. did like all the mass scoring and like they did the like human spot checks and like you know some AB testing even did like AB testing right but like uh it was basically like way too uh it was kind of like a I'm not sure trying to find the right word for this it was basically it it's basically complimented whatever you said and because it compliment whatever you said like even during AB testing the people hit like right because like it's like it makes me feel good about yourself but uh yeah but in reality was even if you have bad ideas compliment that too, right? I think this is going to be really tricky. I mean, I put that in the category of um the last few model launches have been, you know, supposedly smarter than most humans, right? So, how do you hire humans to grade the answers of the LLM? So, in that case, it was maybe they didn't have instructions. They were training a happy LLM instead of a really knowledgeable one. The same problem applies when it like it gives an answer that you looks correct but you're not you don't have the knowledge to know and I'm I don't know how we're solving that problem other than you continue have a broader set of of eval across all different academic disciplines. You do grounding is what's called when what it's called when you'll actually do a Google search, do a web search and bring back the information that could either match or dis disprove that answer that it gave and then evaluate using that. Um I think it's fascinating, but I do think it's going to create it only creates more of a need for each company, each corporate use case to have their own eval um and understand exactly what right or wrong is for their business. Yeah. Yeah. Okay. Yeah. So, basically to summarize like each time you change the prompt or like you change the model underlying model or change something, you should like run your 100 test cases, right? And see what happens. That's kind of what the eval is. Exactly. Yeah. Got it. Okay. Uh let's keep going. Let's talk about So, so let's say like I I did prompting and eval like you know I just can't get this thing to work. So, so now what should I do next? Like how to decide between rag and fine tuning? Yeah. So I think it's worth pointing out

Retrieval augmented generation explained

a few gotchas with rag first of all because that could be why it's not getting the right answer. Yeah. Um the first thing is that more information is not better. Not definitely not always better and not ever better. Right. If you like the you LMS get confused as easily as they get narrowed in on the right answer. Right. So if and rag um so point is you can't just throw whatever vaguely relevant information you have at it. Um you got to throw relevant information at it otherwise it can just kind of confound itself and make and it's not really sure what you're trying to what answer it should go for. So that is a lot of the times the problem and um so in that sense rag is by no means like guaranteed to give better answers right it has to make it worse right like make it worse yeah absolutely there's a sweet spot around okay is 20 examples great is 10 examples great and is this whole other I mean uh what do you call it vector stores which we'll get into in a minute uh have been around for over a decade they be they weren't really particularly relevant because they're not like the academic research into making sure you get relevant results uh kind of topped out before LLMs were invented. So just using a vector store for example to bring back the 10 most relevant pieces of information to that query that's still not going to be necessarily sorted in the right order. There's a whole, you know, it's not chunked in the right size. So very highly likely to confuse your LM as much as give it the relevant information like a good example like to be honest with you like um a lot of the web search products with like chat these other things like uh a lot of times I find the answer is actually worse like I ask it for like travel advice and instead of using like massive knowledge of like everything on the web it looks it looks for like three different links and I use that instead. So the answer is actually worse you know right they're more biased based on those three links. Yeah, it's exact same problem. Okay, so there's a whole another step like called rag reranking, right? So, um you can get it's quite this is getting quite sophisticated. Okay, user end user asks a question, you can have the first step of the LLM engineer a query that so like in that perplexity or deep research example, it'll take the question that you answered and then it's going to formulate a query, start searching the web for using that query for relevant results, bring those back and iterate like as almost in an agent loop um to find the most relevant stuff. Um but the simplest form of that is called reranking. So let's say you did one search, got the three pages from the website. You might hit an LLM just to sort those and pull out the excerpts that are most relevant, throw out the rest and then bring that back into your original prompt chain to answer the end user's question. So rag rank is a very common thing using an LM to rank those results because coming out of a vector store or database, they may not be in best possible order. Uh let's talk about that uh in a little bit. But like right now I'm trying to figure out like how do I decide whether to use rag or to use fine tip tuning. Yeah. I think where you have um let's use the customer support example where you have that 100 FAQs that you want it to reference that's more of a rag scenario because the FAQs may change from day to day. um it's specific to your business, but you want the actual answers in the that FAQ to be exactly correct for the end user. It makes sense. A 30-day uh return policy, not a 60, 90day return policy. So, you want it to be factually. So, pulling that exact factual answer from a database or from a document store. Now, you've got the exact right answer in there. If you were to fine-tune that or train that into an LLM, you're sort of statistically training it into this giant large neural net. So, it's going to be much less specific. Now, you can still do that. Um, but it's also a huge m not a huge not huge for like my chain, but huge for the average business to do a fine-tuning uh experiment. uh you still got to structure all your input data, run the test and or run the um training fine-tuning training and redo that every time the data changes. So it's a much more involved um and fuzzy uh nuanced experiment to run. So yeah, so you don't want to use fine, you want to use rag if you want to get the latest source of truth basically, right? And know what you're giving it. Yeah. And then fine-tuning is if you want to make the model more friendly or Yeah. And actually that's a great example. So the first reason you do fine-tuning is to affect more the format of the result than the actual data. So make sure it's friendly. Um one of the experiments we did or one of the projects we did with a big company was just just make sure that every time the customer asks a question and it doesn't know the answer, just make sure it says I'm sorry, I don't know. or and so you're literally like giving it 50 examples where it should know the answer and doesn't you're just making sure for every time it doesn't know what the right answer is. We're not trying to fine-tune just trying to confidently make sure that it will say I don't know if it doesn't know. Right? So that's an example for fine-tuning. Um it's sort of like um

Is fine-tuning basically lobotomizing your AI model?

creatively lobomizing it. Right? So, we're like cutting out some of that crazy world knowledge that's in there and making sure it stays on track with our answers. Yeah. It's like one flew over to Koopa Dance. Have you seen that movie? Yeah. And also, by the way, when you do that, you can you do it in such a way that it makes it more efficient and runs faster. So, it's also cheaper. Um, but there's more engineering effort involved. Yeah. Um, okay. But here's a dumb question, right? Why can't I just prompt the model to like stop answering questions you don't know anything about? Like why can't just tell the model, you know, prompt? Yeah. I mean, I think we've I think most of us have experienced this like the model. So, um temperamental or uh it can your prompt in the output can change wildly based on where you move things around the data in there, right? So, it's hard to rely on it. The fine-tuning gives you the ability to um really be hardcore about how you're lobomizing that model so it behaves you want it. I like how you keep the words labizing. That's great, man. Okay. All right. Cool. Le let's go to uh let's talk about rag a little bit. So, yeah. So, this is how it works, right? Like um you know, can I return this item? And then you look up some documentation. makes sense. And then you get some content from the documentation and then you put it into the you put into prompt, right? And then you get the LM to like give a better more friendly answer, right? Like you prompt it. Yeah. Okay. So so the part that uh I struggle with is like this retrieval part. Uh because like you know theoretically you can just put the whole like these models now have like a million context way, right? I can put the whole thing into the context. So why is this better? I I guess it's a lot cheaper to put the whole thing in context, right? Yeah. Well, uh no, I mean that'd be more expensive because ultimately at the end of the day, if you say it doesn't have to be a vector store, a SQL query, would you make it even more obvious? So, if I can get all the SQL uh all the products out of the database that mention yoga with a single SQL query, that is going to be cheap and easy and fast. Um, and I've already, you know, limited it to just the relevant information on each of those products. Um, if I'm slamming a thousand 10,000 documents in there, now I'm loading that onto a high-cost GPU for the LM to read all those documents from scratch every time and we're learn about try to reason about them. Um, yeah, so that's an exaggerated case, but and those GPUs are the most part expensive part of the architecture. want to keep the uh use the LM as sparing as sparingly as possible. Yeah, that makes sense. But but a lot of times it's not a SQL query. It's like this like vector DB thing, right? So I post some I post some illustrations here. Yeah. So like uh maybe you

Now I finally understand vector databases

can like explain like on five what a vector DB is. Yeah. This is the mathematical magic that's underlying all of the LMS which is really I it's amazing that much of machine learning is turning uh real life problems into math problems and this is version of that for text. So um you can create a vector embedding of a single word or phrase um and which is effectively or a whole an entire document or even an image actually and that's taking that content and representing it as a an array of numbers or a vector um where the purpose of those numbers is to separate it geographically in multi-dimensional space from the other concepts the other documents and their concepts. Okay. So, literally, if you um vectorized a set of cat pictures and a set of dog pictures and then visualized it on a in a 3D plot like this, the cat picture should show in one part and the dog the other, right? If you've properly defined and framed a model to do the vector embeddings. Um just similar with documents. And this is the tricky part. So if you're going to index your documents, you need to decide if you want to do each document individually, excerpts from the documents and index that or you want to do each sentence in there and let the, you know, try and find the most relevant sentence or maybe a combination of all three. So I'm going to index the sentences, map them to the documents, and so if you ask the LM about a concept, it finds the sentences and then pulls the relevant documents and then summarizes the excerpts, right? there's some magic mix all that. So, so uh let's say I want to index Llemon's uh you know reference material, right? Like all the posies and stuff. So, so I I run an embedded model through that material. You Yeah. And you embed all that material into the vector DB. Um vector DB is just storing the mapping between those vectors and the individual documents, whole paragraphs. Yeah. And then you got to make some decisions about like you know like you said like how many how much is each chunk right? Is these chunks like a paragraph or like a phrase or what? Like Yeah. Um that's complicated man like how you decide. Well, let's take the sort of the simplest example would be let's take 100 documents or we have a hundred parag example 100 paragraphs describing each of those products. Let's just embed those, store them in a vector DB, and then when the customer asks about something, got to formulate a query that's going to be mapped to the nearest space in that uh 3D diagram you've got there next to certain products. So if I ask about yoga, it should have that you embed the phrase, I need help with yoga to the general location geographically that's about um yoga pants and yoga shirts and yoga blocks and all the rest of it. Um so it's just a way of mathematically geographically creating concepts. There's a really cool concept uh example about if you take because you can deal with these mathematically now that they're numbers. So if you take king minus male plus woman and then sum those vectors up the nearest vector when you sum those all up and navigate the multi-dimensional space would be queen because we remove the concept of a man from it add woman from it and now it should find the word queen as the nearest one. So that's what's blows my mind is it turns these very abstract concepts into mathematical ones. Yeah. Um, and you don't really understand how it works, right? Do you? Uh, well, I mean, it's literally kind of like a machine learning model in and of itself. So, you embed create those embeddings. You're, you know, running lots of documents through that talk about kings and queens and yoga and then you gauge how far away they were from a cluster or whatever and adjust your weightings of that, you know, whatever to make them closer. Yeah. It's funny cuz like uh you know like people humans work on intuition and like you know like a threedimensional space I can kind of get it but in reality this stuff is like 100 dimensions right then I'm like what's going on I don't get visualization for okay so let's summarize drag so basically what you're saying is why don't you quickly summarize like the pros and cons of rag right pros is like cheaper you know can talk about cheaper and faster gives you a coarse grain categorization of your documents ments captures these semantic terms loosely um so that you can quickly pull back the pull out the related documents and paragraphs whatever in order to serve them up to the LM. So, basically, it's a quick way to query my um text based or image based resources, hand them hand the most relevant stuff to the LM um and then let it do the hard reasoning on top of that. And I guess the con is like it's just like there's room more room for errors. So, you got to figure out if this thing is, you know, working properly. Yeah, room for errors. You have to take this step in advance of indexing your documents. It's easier than fine-tuning, but you still need a system for this. you can find a software as a service to do it for you but you still and then you know the document changes you still need to maintain that change need to update the document so maintaining the vector store is just like update uh maintaining a database yeah okay cool all right so that's rag let's talk about fine tuning I have this like um pretty ghetto diagram hopefully it's correct this represents pretty accurately my depth of knowledge with fine tuning but okay So I mean yeah go ahead. Yeah basically you know LMS are a multi-layer neural net transformer architecture. So there's this you in the LM world you're also tokenizing and embedding those phrases and concepts on the way in from your prompt turning them into numbers feeding them forward through this neural net to hopefully have it predict what the answer is going to be on the other side. So there's multiple layers of neural nets and lots of math that goes happens on that GPU to predict the best output based on your input. Um so fine-tuning is merely like so you say you take lama you know four off the shelf download it from us now you can upload that to your own server and hack that to give only the answers you want it to give. And so um there's a process there called Laura um fine-tuning which is the most common one today which is low rank adapters. So you're basically kind of like hacking in adding some adapters to that neural net um which um simplify it and then you can set the weights of those um uh adapters such that it constrains the output and you basically reward it and boost the uh weights where you it gives you a good answer or de decrease the weights where it gives you a bad answer. Oh, so it's like a it's not like you can just give like a you know give it 100 domain specific docs like question answers and then be like hey go go find yourself you have to modify yeah you need evals or um a training data set to say asked about yoga this isn't right answer asked about this that's right answers and then you feed that back in and adjust the weights of the model um so that would have to encompass the contents of your docs which is exactly why you don't want to do it daily or weekly or monthly. So, so you have to adjust the weights of the model like a human has and then run it again. Oh, no. There's this it'll there's code to do that. It you know it'll gravitate towarding it towards an answer by adjusting the weight. So, you run through multiple epochs of training data to gravitate till you have your model kind of behaving the way you want it to behave. Um that's an automated process but is not simple. Meaning you could accidentally go too far, right? And now it it starts to lose information about the world that you wanted it to have. Got it. Yeah. So this it's kind of like a upfront investment like you got maybe invest like a month or two and then like a bunch of money to find tune, right? Yeah. I think first you get the evalu then you do the fining. Yeah. Okay. But then once you fine-tune it then uh you know it's like lobottomized so then like you don't have to prompt the engineer as much like it's easier right yeah and it should be cheaper too because that lower process will you quantize a lot of the weights so you can shrink it down that way so remind us so this is like only needed maybe like 10% or less of the time right like so remind us like what are some can you remind us what are some example use cases again where this actually makes things a lot better yeah I mean the customer service, you know, when if you were AT& T actually fielding all of those customer support chats, you have a very good idea, you know, of the hundred or thousand most common cases. You could train your own model to generate answers, initial answers, so the customer support rep human just has to prove the answer, right? And so it's very good at producing canned answers for near, you know, for much cheaper than it would be to do that in real time. Um, let's see. I think the um the hallucination case I mentioned is another one with another customer where they wanted to just make sure that it doesn't know the answer. It's always going to say I don't know. And um Okay. Doesn't got it. That sounds very simple, but yeah, it's pretty important when uh or the guard the guardrail models too, right? That you can fine-tune. So, it's it only knows about saying no to bombs and really doesn't know much about anything else. Got it. I wonder if there's a way to with the AT& T thing. I don't know if I should say this, but like dude, I I really hate like sitting on a call pressing a bunch of numbers trying to get to the human. I wonder if there's a way to like prompt hack this thing so that it sends me to the human right away. I just pointing tries to prevent that, you know. Yeah, probably. I have a hat from a meetup that says ignore all previous instructions. Try and you could ask it. You could ask it like what are you can make up some like emergency situation where like you know I have to get the internet warning because of XYZ. Right. Yeah. I don't know. Yeah. Those are exactly the cases that would inspire fainting in guard rails. Yeah. Got it. Okay. All right. I think that's all slides we had. So let's just talk about uh let's talk a bit more about Llama and open source uh to wrap up. So um why don't we talk a little about uh you know Llama 4 and also like you know what is the key difference between open source and proprietary and like why does meta think open source is going to it's going to be the right solution here? Yeah. Meta wants to make sure that everyone in the world has

Why Meta thinks open source will win

access to AI. Um that it's lock locked up behind open AI or controlled by anthropic even that um and that was what motivated them to go open source to begin with. Um and you can see so that's also what's leading the features in the in llama as well. So llama is uh multilingual um multimodal meaning um speech video um images and um then there's just performance updates like um the fact that it's mixture of experts. I think this is our first mixture of experts model which is part of that LLM architecture I was talking about the layers of neurons as a way to segment it so different areas of the neural net are experts in different topics um and so that's another area of active research is you might be able to fine-tune that down so that you're just using select experts and therefore can run faster better cheaper um so yeah it's all part of the theme of making AI pervasive and making sure that everyone Earth has access to. Got it. Um and and do you think uh open source is going to keep closely trailing closer or like it'll be equal at some point or I mean even OPI is thinking about doing open source, right? Yeah, it's a great question if it will catch up, right? I mean I think you know if you just think about all the other open source software I've used in my lifetime from WordPress to Linux etc. It's never really been cutting edge, but it's been because it has to serve some broader audience. Um, so I would guess that we continue to trail behind. Um, but you know, um, you know, Meta's, you know, excited to do it because it wants to make things like the Ray-B bands work great for everyone. And so that has that's got a very high bar, right? So, um, incentivized to make it good. Yeah. I mean, I think meta is all about scale. like you know there I think there's like a billion people using meta AI already uh you know you don't want a world where like the richest people have access to the best models right that's right exactly um okay any Adam any closing words of advice for PMs and other AI built builders here I think um I wouldn't have too much FOMO

Adam's best advice after leading 100s of AI integrations

about the newest greatest thing we didn't even touch on agents or some of the latest greatest concepts and I I guess my advice to people would be not to have too much FOMO about that. These patterns are very rapidly emerging. Um you know if you broke your brain and your team trying to figure out rag two years ago, you might regret it now, right? Like it wouldn't have been worth might have been worth the effort. Um, but the best you can hope for is to at least get your engineering teams to try these features, get used to calling LLMs as part of the architecture. Get something basic into production and then um write your eval consider how to make it efficient as you grow. I the pre pri prior to meta I was um advising startups. So I worked at a startup studio where we created uh 40 startups over the last 10 years and um the last couple years was all AI startups and really you know the advantage there whether it's a new company or a new feature that you're rolling out you get to try something very targeted that's really going to leverage the L the intelligence of the LM to disrupt some market or just make a new great feature that your users will love. And so I think that's where you should focus. Um, don't get overly uh intimidated about all the new stuff that's emerging. Um, just get your make it so your team feels comfortable using LMS and then I think you'll get hooked on the drug of like, oh well we could use an LM for this, you can use LM for this. Oh, wouldn't this make a great feature? Right. I mean, yeah, you can make a lot of money with like a thing ramper around LM. You don't only need like all this fancy. In fact, probably hold you back if you build too much of it. Right. Yeah. Okay. Awesome, man. Um, where can people find you online or do you want people to Yeah. Uh, please look me up on LinkedIn. Um, uh, Adam loving. Cool. All right. Thanks for the thanks for this AI course, man. I'm gonna call you Professor Loving from now on. So, thanks so much. Yeah, happy to take it. I could talk about this stuff all day. Thank you so much for having me. It's just such a great time. You know, as an engineer that has been coding for so many years, I've always wanted these type of features to have machines that could do even the most basic farm raising. It's amazing. So, look forward to talking to people about it. All right, Adam. Thanks so much, man. Yeah, take care. Talk to you soon.

Другие видео автора — Peter Yang

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник