Want to get more customers, make more money & save 100s of hours with AI? Join me in the AI Profit Boardroom: https://juliangoldieai.com/GmuA8a
Get a FREE AI Automation Session 👉 https://juliangoldieai.com/70E3Gs
Unlock the Power of GPT-OSS: Host OpenAI's Latest Models Locally!
In this video, Julian Goldie walks you through the latest open-source release from OpenAI—GPT-OSS. This comprehensive guide will show you how to host GPT-OSS models 20B and 120B on your laptop for free, including a step-by-step tutorial for installation using Laama and LM Studio. Julian also tests the performance of these reasoning models, comparing them to OpenAI's more established models like GPT-4 Mini. This episode highlights the privacy advantages of local hosting and provides insights into the models' accuracy, usability, and safety features. Additionally, Julian discusses the AI Profit Boardroom community, a resource hub filled with automation workflows, templates, and step-by-step courses on various AI applications. Whether you're in IT wanting to switch careers or looking for AI tools to grow your business, this video has got you covered. Join the conversation and learn how to leverage the latest AI technology to enhance your personal or professional projects.
00:00 Introduction to GPT-OSS
00:17 OpenAI's Announcement and Model Details
02:02 Benefits of Hosting Locally
04:17 Downloading and Setting Up GPT-OSS
09:57 Testing and Performance Evaluation
16:15 Community and Additional Resources
17:56 Conclusion and Call to Action
GPTO OSS. It's just come out today. This is an open-source release from uh you can actually host it for free directly on your laptop and I'm going to show you exactly how to do it today and exactly what it means, it works, etc. We'll test out as well, see if it's any good. So
you can see the new announcement. This literally just came out 5 and a half hours ago, something like that. and open have said, "We released two openweight reasoning models, GPT OS 20B and GPTO OSS 20B under an Apache 2. 0 license. Developed with open-source community feedback, these models deliver meaningful advancements in both reasoning capabilities and safety. " So, let's give it a cheeky. Now, if we click on that link, we can check it out. If you want all the notes from today and tons of other stuff, just check out the AR profit boardroom link in the comments description if you want to get access to that. We can also explore this on hugging face. And you can also see this is released on Olama, right? So you can actually, and I'll show you exactly how to do this, but you can actually use this directly on Olama, host it for free along with tons of other models, and it's really simple and easy to do. I'll come on to that later. Basically, these are two open- source models that deliver strong, well performance, blah blah. If you want to see the model card here, you can see all the details. So, they've got a full white paper on this bad boy as well. And essentially, if we scroll down here, this is really interesting. So, it says the model 12B model achieves near par with open AF 40 mini on core reasoning benchmarks, which is pretty impressive to be fair. And bear in mind, like most people were tired of paying for all these AI models and that sort of thing, whereas for example, you can get access to this directly. AI VMA says just in he can't believe it. No, I can't believe you. So, if we keep scrolling down here, I'm testing my Spanish. You can see it says GPT OSS models perform comparably to our frontier models on internal safety benchmarks, offering developers same safety standards, etc. Obviously, one of the biggest benefits I think a lot of people don't think about this, right?
But one of the biggest benefits is that if you can have let's say a comparative model of GPT on your laptop, the great thing about that is that it's local. So, you can just access it whenever you want. You can access it offline and also it is private as well, right? Because you're not sending any data to anyone else. And so this is basically how it works. We can run through the models. Here's some interesting information about it. So you got two different models here. You got 120B and 20B, right? It take a lot more power on your laptop. I'm on a MacBook 3. So we can test it out. See how it performs on a MacBook. It's a M Pro, right? So we'll check that out and then we can see comparatively how it performs versus other models for competition. Go. Right. So if you have a look at this obviously it is let's have a look here. So we've got GPTOS and GPT1 12B. All right. So this is how they perform. We've got humanity's last exam and how they perform here. So if we have a look at this 120B is performing the best in terms of accuracy between the two models with tools. Without tools is performing slightly worse. 120B is quite behind especially without tools. And then if you compare this to 03, which is like one of the most powerful reasoning models, then you know when you're checking this out, obviously 03 is a lot better. It's a lot more powerful, but it's comparative, right? It's comparative for a local model. So there we go. AI mass says, "What type of business would you recommend for someone who works in it, but want to start generating? " Yeah. So if you want to start your own business from scratch, we actually have a training model inside the AI profit boardroom. Link in the comments description. And if you join the six week master class here, new lessons coming out this week, we actually show the best ways to just get started. So if you want to know exactly how to get started ASAP step by step, exactly what to do, when to do it, and how to do it with homework every single week, then check out the AI profit boardroom. So let's keep scrolling down here. Let's see what we got. I don't want to keep plugging the AI profit. It plugs itself. So if we have a look here, Amy 2024 competition maths. So 04 mini, which is very comparable, right? 98. 7 versus 120B which is 96. 6 and GPD OS 20B is performing at 96. So they're all very close, aren't they? This is not like a terrible model by any means that's coming out and it's pretty exciting to be able to use this right. So enough of the chithat. Let's get straight into business. Right. And what
we're going to do is we're going to download OAMA right now. Again, if you want all the notes from today's inside profit boarding, but essentially you can just download this. It is free. It's not going to cost you any money. Don't comment saying it's not free. It is free. All right. And then once you've done that, go to models over here in the left. Then you're going to go to GBTIS, right? Once you've actually set up, you've downloaded it and installed it, right? You're going to go over to, if you're on a Mac, you're going to press command and space, open up a llama. So make sure that's running in the background, right? And if you want to know if that's running in the background, you can check the top right, and there'll usually be an icon in the top right with a little picture of a llama. That's how easy it is. And then from here, what we're going to do is we are now going to open up the terminal. All right, there we go. And we're just going to copy this code here. So, it's as simple as this. Now, if we open this up, what we want to do here is let's scroll down here. If I can remember, it has been a while since it's open up. There we go. All right. So, it's building that bad boy in like you can see. And sometimes if you haven't updated for a while, you just need to download the latest version. All right. So, you can see here it recommends me downloading the latest version. No problem, mate. We'll go over here. We'll download that for Mac OS. So, you can select between Linux and Windows as well. Then once it's done, it's going to pop up in the top right. Let's wait for that to load. Here we go. There we go, Sunshine. And then you'll see over here, we can just drag this bad boy in. All right. And then it's going to ask you, do you want to replace or keep both? I'm going to say replace. There we go. We will restart to update. Oh, llama. The new llama's come back in. And then if we grab that terminal command over here. Let's go back into terminal. Hopefully, it works this time. And there we go. Sunshine. All right. Now, it's going to take a little while here. You can see it's at 2% after a couple of seconds. So, what we'll do is we shall just keep that window open in the background. All right. So, we'll keep that running in the background and then once it's done, we could check once it's done. All right. So, you can see it's at 5% over there and we've got that running. The other way that you can do this, I'll show you another way is you can actually go on to LM Studio. All right. And then once you're done, once this is finished downloading, so you can see it's already at 10% since we've been talking. But we can just copy one of these terminal commands and then plug it into alarm later. We'll come on to that in a second. All right. So, you can also run this on LM Studio. Again, LM Studio is available to download. You can just download it like that. Click this. Download that bad boy. That's going to take a little while. Now, the other thing to note here as well is the LM Studio typically has a nicer, from what I remember. Let's just open this up. So, I've got LM Studio already set up and ready to go. Here we go. We can download GBT and that's how we get it on LM Studio as well. Right. So whether you want it on Alama or LM Studio, totally down to you. I think honestly if you're a bit of a beginner or if you like a nicer, you're probably going to prefer LM Studio to be honest with you. We can download both. No props. Let's just open that. Minimize that. Put that on a new window as well. And then we can just see in the background. Okay. Where are we up to on? All right. So we got LM Studio downloading here. We got Lama downloading here. we can just go off and do stuff in the background in the meantime. All right. Now, the other thing to note here is that we've got thinking models, right? So, these are reason models. So, that's pretty good as well because typically if you downloaded lightweight model, it's not going to have the thinking model. And also on Alarm as well, you got like GPOS, Deep Sea Car 1, Gemma 3, Quen 3, Llama 3. 1, etc. Pretty much every open source model that you can think of is over here as well. And then if we go to hugging face as well, we've got this as you can see. So this is all the details on this and we can actually use the inference provider directly on hug inference if you want to get access to this. So let me just put the notes over here inside the aircraft boarding if you want to get access to the notes etc. Just go to the SP section and then GPOS. So we can test out here. All right. So if you don't want to download it or if your laptop doesn't have the power to use the new open source model from GPT but you want to use something for free then you can do it using this method. Right? So all you do is you just go on to hugging face, go to the GBTIS section, go to the inference provider section over here, and that way you can just get access to this stuff. Nice and simple, nice and easy, right? By the way, if you're watching this on Twitter, I'm just going to add the link so you can get access to everything. All right, so let's get back to this. This is 56% done. We're at 4%. LM Studio is way slower. Maybe it's cuz I started downloading from Alarm first. But yeah, you can see we're good to go on that. We've got some examples here. So, we can actually run through some examples of how it works, etc. And also bear in mind here, this is primarily designed for text generation, right? So, I don't think it's going to generate images for you and that sort of thing, but it will generate text. And then let's have a look at the details on this. So, it says welcome to the OS series design for powerful reasoning. We're releasing two flavors of the models. And the model that we're using over here, let's have a look what we got. Yeah, this is 120B. So this is the most powerful version that you can preview on hugging face and also this is interesting as well. So you know in terms of the Apache 2. 0 license some people wondering okay what that is or how it works etc. So essentially the Apache license means that you can build freely without copy left restrictions or patent risk right so it's good for like experimentation customization etc. You can configure the reasoning effort as well so you've got low medium and high full chain of thought which is great. So that's great for reasoning. You can also fine-tune it as well based on the parameters and it has a genic capabilities and yeah that's basically how it goes step by and I think LM Studio is just broken on me if I'm not mistaken and we've had an oh we've had an error that's why I've run out of space. All right bear with me. I'll just make some space on the old laptop. There we go. We're back in the game. It's retrying now. So let's test the
strawberry test on this as well. So inside the chat here we're going to say how many hours in strawberry just to test it reasoning capability. So if we zoom in here, we got the chat on the right hand side. So it says the word strawberry contains three art. One, two, three. There we go. So the reasoning is not bad, right? It's better than previous versions of chatb that were actually paid, which is great. Let's test something else out now whilst we wait for the download of Alama. So if we go on to an example that used for SEO, we'll take this and see if it's any good at ris. It's crazy fast online. We take that and we'll put the keyword as SEO training Japan just as an example. So I'm going to plug that in. Quite a long prompt as well. And you can see how fast it replies, right? That's pretty crazy. Let's have a look at the quality of the content. Hi, I'm Julian Goldie, founder of blah blah AIO nerd. Sounds about right. If you're searching for SEO trading Japan, you're probably wondering why a UK based trainer even cares about the Japanese market. Content is quite it feels quite human to be fair. It's not that bad. Not as bad as I was expecting. I would still use claude obviously but for writing content not bad at all really fast to respond. Let's have a look through the white paper as well. See if there's anything interesting here. So these are some of the benchmarks etc. What blows my mind is like if you look at the aim competition math one of the most powerful models it's right up there with which is pretty amazing. Doesn't perform so well with PhD science questions if you use 20B but 120B is not bad. Expert level questions it really falls behind there but that's okay. What we'll do is we'll ask Jack GBT for the 8020. So something to bear in mind here in terms of limitations. More hallucinations than 04 mini matches or beats 04 mini on coding and reasoning and approaches 03 on health. 20 bunches above its size. All right, let's say that we'll put those inside the notes. Lama is nearly done. Better mass says please just a philosophic question. Do you think that we will reach the AI degree level of the Matrix saga faster than expected? I don't know. — I don't have this the theory that experts actually predict worse than layman's. And the reason for that is like usually they're overconfident and no one knows the future. Personally, if I had to predict, do I think we're going to be plugged into machines like the Matrix and learning kung fu? I doubt it. But I do think like the world is going to look very different. Give it 10 or 20 years. If you go back 10 or 20 years ago now, even I think we're the first generation where you can actually say to us, if you went back in time to your previous self, you could say to us and show us the technology we have and we would say it's magic. If you showed an iPhone, if you showed the power of the internet, if you showed for example the power of AI and how you can run it offline and create images and videos and whatever, if you went back to your former self 20 years ago or 25 years ago, we would genuinely say that's magic. So the world will look totally different. And I think that cycle is just speeding up faster and faster. So, without going off on a crazy tangent, let's get back up. Nearly. We're so close, people. We're so close. If you stayed on, you're an absolute legend. Absolute legend for watching this far. Pew says M3 Pro question mark. Yeah, that's it. We're going to test this on M3. And just bear in mind as well, you can get access to this on Hugging Phase, VLLM, Lama. cpp, CPP, LM Studio, AWS Fireworks, Together AI, and a few others too, including for sale on Cloudflare and Open Router. Oh, look at that. You can get access to it on Open Router. Oh, that's pretty cool. So, you can actually code with that on Visual Studio Code. Just bear in mind it is paid there. You see that's a paid model for 12B. So, if you don't host it locally, that's another option as well. But just bear in mind, it's going to be paid if you want to get it from open router. So, I think we're good to go now. We can test it out. Let's just test if it works. So, I'm running this on an M3 Pro. Let's see if it can actually handle running this locally. It is super I think you're better off if you want to use it online for free, you might be better off just using hugging face unless you got a super powerful laptop. Bear in mind, I am streaming at the minute as well, so that probably doesn't help. I'm actually going to cancel that cuz it took so long. And I can see my laptop was lagging during the live stream. So, honestly, like that is way too big to run on the laptop. You can see how slow it was responding. It broke a little bit. I had to cancel the response. So, if you're on an M3 Pro like me or you're on something less powerful than that, this probably isn't the best model for you. You'd be better off just using the hugging face link that I gave you before this one and just using that directly instead. So, I'm going to X off here. We'll quit alarm as well cuz that was slowing my laptop down like crazy. All right, so let's try another meth. What we can also do is we can run this. I think we could probably run it with client as well if we test this out. So, let's go to client settings. Allow open router. Got my API key plugged in there. And then let's see if we can find OSS on here. So we got 12B and 20B. Let's try 12B just for off that. And then we should be able to code with that directly. So if we just say something like this, hit plan. Go ahead with that. Let's see what we get. And that is struggling with the API streaming. Interesting. Let's just try it on recode as well. Why is client reset the API saying that's weird. Look at that. Getting the server error for now as well. So that's not working so well. It said server had an error whilst processing your request. So I think out of everything that we've tested today, if you want to get access to it, just go to hugging face or just wait for open router to work. You could also, for example, try running the lightweight version as well. But let's test this out. So I'm going to say are you working inside the open router chat? See if it works there. So actually works perfectly inside the chat, not on client directly. So if you want to use this, maybe use the chat, but it's going to be paid inside open router. So probably you're best off just using that hug your face link. Let's try something else now as well. See if we can run the more lightweight coding one. And free media tool says sir according to your experience what is the best model for coding? Just name one. So I think if I had to pick one out of everything I'd probably go with I would say I think Gemini is pretty good. Gemini is probably the best or you can use Claude Opus as well. Claude Opus is very powerful but if you're coding with it using the API then it's going to cost a fair amount. It's quite expensive to use opus. So let's test this out now as well. I'm trying to use the 20B model. See if that works better. So go back into terminal. Just make sure you have a llama running as well in the background. So you can see we got a llama going there. And then we can use 20B inside DIY. We'll run that as a test. This is a lightweight model by the way. This is not even the most powerful one. And you can see it's still thinking there. Yeah, I'm going to shut it down. It's too slow. How are you going to run that? So
that's it basically peeps. Now, if so, if you want to get access to all the resources from today, feel free to get that inside the AI profit boardroom, just go to the SAP section. You see all the latest updates. So, actually added a bunch of AI Asian templates here as well. Tons of stuff on deep think, lovable, quen 3, etc. And this community comes with an awesome bunch of people. There's 1,100 members inside here. We actually just added a week 2 masterass as well. Very active community. So, for example, you see this post was added just 15 hours ago. It's already got eight comments on it and 13 likes. If you have any questions or that sort of thing, you can post inside the community. We also take requests. So, if you have AI automation requests, usually what we'll do is we'll show you exactly how to build it step by step, exactly what to do, how to do it. It comes with all of my best trainings, templates, workflows, including, for example, how to automate your business, my best AI agents. On top of that, all my social media and video automations, email content automations, loads of NA10 templates. If you want to start a AI automation agency, we have a step-by-step course on that. We also have, for example, a YouTube course that shows you exactly how I reach millions of people with YouTube using AI each month, right? So, you can see this step-by-step six week road map inside the YouTube AI section that shows you exactly how to automate your own channel and make money with it and that sort of thing, right? Inside the calendar, we have five calls per week. So you can jump on the weekly live calls, ask any questions you have, some master classes, jump on the weekly Q& A and that is an amazing community where we're all just growing and learning together and sharing the wins together as well. The other thing I would say about this, if you look at all the posts is like it's just a super positive, lots of good vibes in there, lots of friendly people, lots of great people to meet and that's the goal of this really is to all grow, learn, and work together. So feel free
to get the AI profit boardroom link in the comments description and if you're like Julian mate, Julian, I love all this stuff. I don't have time. I'm running a seven figure maybe an 8 figure business etc. then feel free to book in an AI automation session and on that call we can basically look where you're spending your time and then how can you save time by building the right AI automations and how to build those automations and then once you become a client we'll just implement it for you right so if you want to get a custom quote for that directly on the call feel free to book in a call also if you're watching and you have any AI tools that you want to promote as well you can book in the call here and get a custom quote for a sponsored video on top of that on YouTube, but additionally like if you just want us to do everything for you, this is probably the best way to do it, right? Book in an AI automation session link in the comments description and I will see you on the next video. Cheers peeps.