How to Fine Tune GPT3 | Beginner's Guide to Building Businesses w/ GPT-3
14:42

How to Fine Tune GPT3 | Beginner's Guide to Building Businesses w/ GPT-3

Liam Ottley 26.01.2023 99 710 просмотров 2 650 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
📚 Join the #1 community for AI entrepreneurs and connect with 200,000+ members: https://bit.ly/skool-ov 📈 We help industry experts, entrepreneurs & developers build and scale their AI Agency: https://bit.ly/aaa-accelerator-ov 🤝 Need AI Solutions Built? Work with me: https://bit.ly/morningside-ai-ov ⚒️ Build AI Agents Without Coding: https://agentivehub.com/ 🚀 Apply to Join My Team at Morningside AI: https://bit.ly/ms-youtube-lo 🚀 Apply to Join My Team at AAA Accelerator: https://bit.ly/aaa-youtube-lo My Vlog/BTS Channel: https://bit.ly/LiamOttleyVlogs Super simple guide on How to Fine Tune ChatGPT, in a Beginners Guide to Building Businesses w/ GPT-3. Knowing how to Fine Tune GPT-3 is one of the most important skills you can have as an entrepreneur to build your own ChatGPT site or get into the AI industry and build a valuable business business. While limited in scope, this video gives you a clear breakdown of the stages in fine tuning these models. #chatgpt #artificialintelligence #openai Kaggle Dataset: https://www.kaggle.com/datasets/thedevastator/unlocking-the-secrets-of-nba-player-performance Prompts & Code HERE: https://docs.google.com/document/d/1jJbk61grxYZV2qbBNtNnVaeEFsqMskWY5Uh6ELlqiB4/edit?usp=sharing Timestamps: 0:00 - Intro 1:07 - Generating Pairs 7:11 - Fine Tuning w/ API 10:18 - Running Our App 12:38 - Outro

Оглавление (5 сегментов)

Intro

my recent video about how to fine-tune gpt3 and build an AI startup in a few minutes got a lot of questions about the Nitty Gritty details on how you can actually do this fine tuning process rather than try to answer you all in the comments I thought I'd pop back on here and make a quick video showing you the step-by-step process of how to go from data to prompting completions to a fine tune model that you can interact with I'm going to be breaking it down a lot more granularly in this video and going A to Z on how any beginner can come in and use data to fine tune it and get a fine-tuned version of gpt3 to use for their own purposes if you're an entrepreneur and are looking to start making money and building businesses in the AI Gold Rush you need to be aware of how these fine-tuning processes work so this video is exactly what you're going to need to learn to get to a basic level of understanding so that you can move forward and start hiring people and know what the process is and be alert and aware of what's going on so we're going to jump back in here where we left off with the NBA players performance data set I'm going to do this all again so we're starting off by downloading the data set links to this will be in the description below fine-tuning these models is one of the biggest opportunities in business of 2023 and I'd say for the next few years because of the power of these models and being able to build on top of them one of the most powerful skills that you can have as an entrepreneur right now is understanding how this process works so that you can start to see opportunities way you can apply data to these models and build a valuable business so don't go anywhere I'm going to be going step by step explaining this whole process and you're not going to want to miss any

Generating Pairs

of it enough talk let's get stuck into it a quick overview of this process from start to finish to start off we need to find a set of data which in this case we're using the NBA players performance started from my previous video then we're going to take that data and create prompt and completion pairs which is the format that we need to provide to gpt3 in order to fine tune it and finally we're going to be able to run a basic app on our computer that is a fine-tuned version of gpt3 that understands the starter that we fit to it to start off we're going to download the NBA player performance data from kaggle the dataset link will be in the description next we're going to import the data to Google Sheets so that we can manipulate it a bit we have an index row here on the left that we can remove now I'm just going to put a filter on this so that we can remove the blanks as I did in the previous video copy this create a new sheet and paste it in now we have it all formatted nicely and Compact and ready to go we're just going to download this data in a CSV format and then open it in Visual Studio code so we can have a look at it now what you need to do is you don't have to visual studio code or your favorite code editor what I'm going to do here it's already opened up but just to help you guys out I'm going to actually close that and then show how you open it so I'm going to go open folder and here is the folder that I've created you need to create a new folder anywhere on your desktop will work and then you need to open that folder by clicking open here and then inside it I've got the I've dragged in the CSV file that we just downloaded and in here's also some of the uh all the data that we download from the kaggle set which we don't really need at the moment because we've got the one that we need which is the scoring then you're going to want to copy the header row and take it over to chat GPT and then we're going to give it just an example row of data so it kind of knows what format works I'll give it a couple actually let's give it four rows of data to have a look at so this is the prompt I'm ready to get started I have a spreadsheet of basketball data these are the column headings in CSV format and I've pasted in the column headings and I've gone here are the rows of data in CSV format and pasted in a few example rows for it to use finally I'm asking it do you understand and I'm hoping after this prompt is sent it's going to say okay I understand what you're trying to do here I've read the CSV and I understand the format of the file great tattoo BT is understood what we've put into it and now we need to ask it for some prompting completion pairs the method I'm going to show you here is actually different to the one I did in my previous video in my last video the way I showed you how to generate these promptly completion pairs using chat GPT is actually not that scalable so I've done a little bit of research and figured out how to get chat GPT to write us a script to generate these pairs and so we can do that with like hundreds of thousands of different pairs so we're going to get stuck into that now I'm telling chat GPT that I want to fine-tune gpt3 using the starter write me a script to create prompted completion pairs within this format and then pasting some examples of the format that I want it back in I'm actually going to insert python in here just to be extra clear okay now after a bunch of missing around and tweaking this just to get it just right so you only have to do it once this is the exact prop that you guys will need to put in I had to really coach it in order to give the right script out the first time so that you guys didn't have to mess around with it so if you're following along head down below to the description I'm going to be pasting these prompts into a Google doc and leaving the share link for you guys to check out down below so if you want to get this entire prompt here that I've had to tweak quite a lot and you can follow along then just kid down below and you can get it off Google doc after submitting that prompt it gave me back the script that I need to get this data into the promptly completion pairs so once you have this code all you need to do is copy it and head back over to vs code right click in here and create a new file and we're going to call it main. pi paste this in save the file and then you're going to want to make sure up here this basketball data. csv is the same name as this so you can just click on here press enter paste it in there and save the file so now we have the name thing all ready to go and then we have it the script all ready to go as well if you're not too familiar with programming and this basic sort of python scripting I'm going to give you a quick run through of what's Happening Here so that you're not completely blind here we have our CSV data and this is comma separated value so each one of these at the top you have the header and a comma separates it to the next column so this is basically a condensed version of a spreadsheet that a computer can read we have all of the headings up here and they're separated by commas so the computer can read along and say okay this is a column and I've got a uh extension installed on this computer that allows me to see these a lot more easily I'm actually just going to show you that now and here it is it's called rainbow CSV if you just install this quickly then it's going to help you visualize the CSV a lot more easily like I am here so it's pretty straightforward we have the header row and then underneath it is each of the data points for that same as a spreadsheet but it's just not formatted as nicely so by using a method called string interpolation and F strings as you can see here this f means that anything that you put inside of these curly brackets here is going to be uh the value of this player here is index player so this is referencing the player column here right at the end so this player one when you see that key player it's referencing that in the script here so for every row of data that we have coming along here it's going to take this player column which is the green thing here and it's going to say okay I'm gonna because we're on this row and it's going to work down every single row through the whole sheet it's going to take the player value there and write the prompt with it write a summary of player values statistics and then it's going to start building the completion for that row and once again it inserts the player's name and it says player name played games starting game started of them so it writes a big long sentence and summary of what the player's data is and every time it comes to one of these blocks here which has got the curly brackets it's reaching into this file and plucking out the right value as the script has made it then all it needs to do is append these prompting completion pairs together into the format that we asked it for back in chat GPT here which is this format and then it just saves it to a Json and dumps it out for us to look at if you're following along I'm assuming you've got python installed on your machine if not head to their website and you can download the latest release of python now that you understand the script let's actually run it and then see what we get the command to run a script and platform is Python and then the name of the script so this case is main. pi and hit enter and then just like that we have a prompt and completion peers. json which looks like a complete mess but it is a ton of data all formatted exactly how we wanted it if you'd like to see it a little bit more pretty what we can do here is type our pritia and this one here if you just install this quickly and then you head back to your uh prompt and completion pairs you can press option shift and if and then it will format it all up nicely like this it looks really good and actually color codes and understand it as Json so you're actually probably going to need to do this so head over grab that extension and come back and press option shift and F now we can see the result of our hard work so now if every player on that spreadsheet that we started with we have the prompt which is write a summary of Luca don't shoot your statistics and in the completion which uses check TPT summary structure and then it's simply done it programmatically and use string installation which is a python feature to pluck data out of that CSV file and put it into the correct place to create this completion I've just taken all of the code out of the main. pi and put it into a new file called generate. pi and

Fine Tuning w/ API

saved it so that we can play around in the main. pi file for the next step another fun part of taking all of these pairs and funneling them into gpt3 and fine-tuning it begins to get started you're going to want to head to beta. openai. com and over to your personal section and view your API key to create an account if you haven't already it's free and then you're going to want to create a new API key here I am just creating a new one now you can see all that on screen but I'm going to delete it you cheeky guys in the comments I'm just going to put that there to save it for later now we're on the documentation page for fine tuning bio open AI so we're going to head down to the installation we're going to copy this and head over to our terminal and paste it in I've already got everything downloaded of course so the requirements are already satisfied for me but it should start installing and show a progress bar for you now you need to copy this export string and just bring it back over to your main. pi file to copy this entire API key and then paste it within these quotations now we need to copy this entire thing hit back and paste this in and this means that it's worked we've exported the open AI key so now it's ready to use for later next we're going to have to prepare our data so you can copy this head over to your terminal and delete all this and this local file means we need to reference our content completion file and what we can do is come over here copy all of this head back paste it in there and hit enter I've just taught it to prepare my data and it says here your file appears to be in Json will be converted to Json L which is the format it needs your file contains 250 prompted completion pairs which is a pretty good starting little batch of data we have here now it gives you me a whole bunch of tips and tricks on how it can make the data better and get better results out of it so you probably want to have a read through this whenever you do this again and uh follow all of the instructions here because it's going to make your model a whole lot better for all of these pairs we should really have a suffix on it that is really unique and like a bunch of slashes and hashtags and stuff too there's a few things here like starting all of your completions with the white space character using a unique ending like pound signs on the end of your completions all of these are really important to do we don't have time in the scope of this video if you'd like a little bit more on that I can shoot a quick loom for you guys and put it in the comment section below but for all purposes of this video we're good to go and we can hit in on this and it's going to convert it from Json into Json l and again add a white space character to the beginning of the completion it's going to do it for us which is great and yes and just like that we have it all made up into Json L format ready to put into the fine tuning process now we can actually fine tune our model uh you need to head over here and copy this and note that you can change the name of the base model that you're starting from so I've got this put in here open AI fine tunes. create I've put Curie at the end to specify the model that we want to use and now I also need to put in the path to the file which is all the data that it's going to use to fine-tune so I've got to go over here copy this entire thing including the suffix and then paste this in here and then hit enter so what this is now going to do is upload all of that data and put you in the queue to fine tune and then it's going to put all that data through their fine tuning method and then the result is going to be a fine-tuned version of gpk3 that is familiar with all of this basketball information and in just a few minutes we've got our fine tune model complete and here down on the bottom of the screen you can see what the name of the model is called you're going to want to copy that and save it for later now we need to head back to chat gbt so that we can get a graphical user interface or GUI so that we can interact with this

Running Our App

now to make things super easy I'm actually just going to grab the GUI script from the previous video that I did so I'm going to copy this is going to be available in the Google talk down in the description so head over there and grab that this is going to have all this code that you need in order to run a basic GUI so that you can interact with your fine tune model and we're going to head back and here we have the name of our model that we want to cut that out of there and replace it within these uh quotations here now before you try to start this app up make sure you save your main. pi file before you start running it python main. pi function ran and just like that we have a fine tune gpt3 window up and we can start giving it prompts if I paste in one of these prompt here Jason Tatum statistics and paste them in it's going to give me out what is the beginning of this completion now it seems to be having an issue where it's not writing out the entire completion I'm not sure if that's an issue with my API key or a limit on the API request or it's an issue within the GUI itself but I'm going to have a dig into that over the next couple hours and get back to you guys in a bit hey guys I just took a little look into why it's not completing the rest of it and turns out it's a pretty simple fix in the completion line here openai. completion. create there's actually a parameter that is a Max tokens that is usually set so 10 or something just so that it limits how much you're charged through the API so it's a built-in safety feature to stop you spending too much money all you need to do is come in and change us here by adding in this comma Max tokens equals I've done 150 and that about that's about right for what we need so if I just run the Python program again here bring it over another thing you need to note as well is that in the preparing process I didn't notice it at the time but what it did is trim out the writer summary part so as you can see here it says write a summary of look Advantage statistics but what it did is because it was shared with all of the different prompts it actually cut it out in the preparation process so all our prompts are really is just their name and their statistics afterwards so if I take that over to our app here and put it in load for a bit now I've actually retrained this on a DaVinci model off camera just because DaVinci is actually a lot better for recognizing what text you want so if you train with Curie make sure you go back and retrain with DaVinci it cost me about three bucks in order to get it retrained but definitely worth it and as you can see here we've got the entire prompt here which we expected and it's actually started giving us information on Karl Anthony towns and statistics so I'm not 100 sure why it's continuing down the list there but we got our result we got our entire print out of janus's Statistics so I've been a result the last episode thanks for sticking around and we'll get back to the video for the purpose of this video I've shown

Outro

you how to go from start to finish how you can take your data you can prepare that data you can get it put into a promptly completion Pairs and then finally fine-tuned your version of gpt3 so that you can start interacting with it of course this is an extremely basic example and the understanding that this gpt3 fine-tuned model has of this basketball topic is very limited and you need to give it uh probably thousands and thousands more variations of these prompts once you've given it enough data its understanding will be flexible of the topic and you'd actually be able to ask us basically any question and start being sort of specific about it and saying hey what are the three players playing for the XYZ team who have the highest field goal percentage and that's the kind of stuff that you'd eventually be able to get to given enough training data so that's all for the video guys I've shown you how to go from start to finish and fine tuning a model with a bit of data so if you have any questions about this or you've got stuck or something's not working on your computer Drop it Down Below in the comments either I'll help you or someone else in the community will the important thing about learning this process is that as an entrepreneur you need to understand that this is what is going on behind the scenes for a lot of the startups that you're seeing just spending half an hour or an hour trying to understand this process is going to put you leagues ahead of other entrepreneurs and other people trying to make money in this AR gold rush because you understand the underlying buying technicalities of how these models are being created and fine-tuned but the understanding of this process you are going to be keeping a close eye out for data sources and understanding how you can get that data source and integrate it into a gpt3 model or a gpt4 model which is coming up very soon so I hope you got something out of this remember that down in the description I'm going to have a Google doc having all of the prompts that I sent to church EBT and also the code so it should be pretty straightforward for you guys to have a play around with us I really hope you enjoyed and got something out of it if you like content like this my name is Liam Motley and I'm a self-made serial entrepreneur from New Zealand but now I'm living in Dubai I make AI entrepreneurship focused content at least three times a week for aspiring and established entrepreneurs looking to get into the AI industry and make money in this hour Gold Rush it's happening right in front of us so if that kind of stuff sounds interesting to you hit down below and subscribe to the channel hit the Bell so you don't miss my next one if you've got something out of this video please drop a like it really helps my channel a lot and of course leave your comments down below and I'll be answering as many as I can that's all for today thank you so much for watching and the best of luck to you as you navigate This Hour Gold Rush I'll see you next time

Другие видео автора — Liam Ottley

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник