Join AI Startup School & learn to vibe code and get paying customers for your apps ⤵️
https://www.skool.com/ai-startup-school
📲 Stay up to date on AI with my app Tensor AI
- on iOS: https://apps.apple.com/us/app/ai-news-tensor-ai/id6746403746
- on Android: https://play.google.com/store/apps/details?id=app.tensorai.tensorai
CONNECT WITH ME
📸 Instagram: https://www.instagram.com/theramjad/
👨💻 LinkedIn: https://www.linkedin.com/in/rayamjad/
🌍 My website/blog: https://www.rayamjad.com/
—————
Links Mentioned:
- LM Studio: https://lmstudio.ai/
- Official Announcement: https://openai.com/open-models/
OpenAI released two brand new open source models earlier today, and I'm going to be showing you how you can run them on your machine. Basically, the bigger model, which is a 120 billion parameter model, that requires 80 gigabytes of memory, whilst the 20 billion parameter model only requires 16 gigabytes of memory. So many people will be able to run this model if they have enough memory on their computer. I'm using an M1 Max, which has 32 gigabytes of unified memory, and that should be able to run the smaller model, but not the bigger model. Basically, once you have that requirement met, you want to go to LM Studio, and then you want to click on the button in the middle. It should automatically detect which operating system you're running on, so you can just press the button, and then download this program over here. And you can then download the model via the program once it's installed. So we'll just wait for it to finish downloading over here. And once it's finished downloading, you can just click on the program, and on Mac, I will drag into my Applications folder, and then I can run the program. So I can just write LM Studio after it's finished copying. And then you should see a window that kind of looks like this. If you don't see the screen, and it instead asks you which mode you want to run LM Studio in, I would recommend pressing Developer, because you can always change that later. And then after going through the onboarding menu, you should see a screen that kind of looks like this. And you can go to My Models over here, and then press the Search button over here. And then you should see a bunch of models listed over here. And you should see OpenAI somewhere. So it should see like OpenAI's GPT-OSS-20B. Or you might see 120 billion parameter if your machine supports that. And then you just want to press Download over here. And then it should start downloading the model. And this may take a while, because the model can be pretty big. And now after about five minutes, it's finished downloading. So I can press Load Model over here. And then it will take me to a chat interface. So I can say like, Hello, who are you? Press Enter over here. And then you can see it's running locally. So this was generated at 46 tokens a second. And I can disconnect myself from the internet. So I can turn off my Wi-Fi. And then still say, Can you hear me? For example. And then it will still say, it will still give a response, basically. And the tokens per second does depend on how fast your machine is. You can also change the reasoning effort over here. So I can turn the reasoning effort to high. And say, What is bigger? 9. 11 or 9. 9 over here. And then you can see the entire reasoning over here, which is quite interesting to watch. And you get a bunch of other features in LM Studio, such as making another chat over here. Making a folder. And then organizing things together. So you can call this like, Research. And then you can also do other things, such as pressing Branch over here to branch a conversation, which duplicates everything that was done before. And then organizing into different folders, for example. You can also run this model locally. So for example, I'm still disconnected from the internet. But I have a small application that I made earlier. So this is a with Cloud Code, which is just a chatting application. And I can have this application that I have running on my machine use a local version of the model. So if I go back to the code of the application, then basically over here, I want to set the base URL to be the local URL in LM Studio. So if you go to LM Studio, go to Developer over here. You want to make sure this is running. So you want to press the button in the top right over here where it says status running. And you can change any settings, such as which port it's running on, whether it's being served on the local network. So other machines on your local Wi-Fi router, local network, are able to access this model. And you can change other parameters as well. It shows the supported endpoints, which are all the OpenAI-like endpoints, because this is an OpenAI model. And you will see which parameter or where it's running on your machine over here. So you can see currently it says, like, this port number with this address. I can copy this over, go back to Cursor, paste this as a base URL. For the API key, just use LM Studio. And then for the name, just use over here, I can replace the model in the stream text with LM Studio instead. And I can pass in the model name, which is GPT OSS 20 billion or 20B. Press save. And then in the chat application, I can just say hello. Press enter over here. And it seems it didn't work. So if I go back to LM Studio, I can see the developer logs over here. And it says unexpected endpoint or method slash chat slash completions. So it seems I have to put in the, like, slash v1 after the base URL. So if I go back to Cursor over here, write slash v1. And then press save. And then try again. Say hi, who are you? And now you can see it says thinking over here. So this is, like, the thinking process. And because I turned the thinking all the way up to, like, reasoning effort high, I guess it's thinking quite a lot. And now it's going to start typing out the response. And you can change this. So it says, Hello! 👋 I'm ChatGPT, a virtual assistant powered by OpenAI's language models. And you can see I'm still disconnected from the internet. So it seems to be working completely fine. If I go back to LM Studio, I can see that this response was used by the model. It says running chat completion on conversation with two messages, streaming response. And yeah, this is, like, pretty good. The developer logs for debugging and what's happening with the model.
You can explore around and then see, make other changes as well to the context, system prompt, inference, temperature, and so forth. You can also pass in these parameters when you're running it locally. And you can change all these other settings. If you're unsure what any setting actually means here, then you can just ask chat GPT itself, like the actual chat GPT, which is connected to internet, to do the research for you. And I can say, what is the benefit of running an offline open source model? And then press enter. And one of the biggest benefits that people are talking about is especially with data privacy. So in many regions or, like, places on earth, there are laws that restrict where the data can be sent. And sometimes the data is very sensitive. So it can't be sent to, like, random model providers, to OpenRouter, to open AI, and so forth. It could be, like, confidential, like, client data when it comes to health or, like, law-related cases. So what many people can do is they can just run a, like, offline version of a model on their computer, on a, like, more powerful computer, which has more than 80 gigabytes of memory, on a GPU, and so forth. And they know that computer is not connected to internet. And they can pass in any client information. They can pass in sensitive information and then query that information. Knowing that information is not being sent anywhere else. So a bunch of people are making money by, like, setting up this technology for law firms. So law firms can, like, use the power of LLMs on their cases without worrying about confidentiality being broken or, like, data privacy being broken and so forth. We can also see what OpenAI says over here. So it says data and privacy and security. All input and output stay on your own hardware. No user data leaves your premises... No internet connection required, as we said. Zero ongoing API costs. So if you have a model running, like, all the time or something, or you have it running on a machine that's old in your house, if you don't have, like, sensitive, time-sensitive information, you can just have it running in the background overnight and, like, crunching through data or, like, coming up with news articles or whatever else. And you only have to pay for the electricity cost and the upfront cost of the hardware. Latency control. You can control the latency on your network. You're not, like, blocked by other people who are using the same model at the same time as you. You can add some tweaking as well so you can change the model architecture. You can, like, fine-tune models in some cases. I won't be covering that in this video, of course. Compliance, which we covered before. Offline updates and governance. You can decide when to update the model or, like, patch a version so you don't have to rely on the vendor's release cycle. So it does talk about stuff like hardware cost, maintenance, how it's compute-intensive and so forth. So this may be useful in your case or it may be useful for your own company or just anyone else you set up this model for. And if you do do something like serve on the local network, then you will see your local network IP address over here. So on another machine, on the same network that you're connected to, you can just use this as a base URL. For example, to get a chat application that you made with an OpenAI-compatible router, remember to put the slash v1 after it. And then you can run it as you would run any other model. LM Studio installed, you can try downloading other models. So if you go to Discover over here, you can see a bunch of models that may be compatible with your device. So you can see them in the smaller models like Gemma 3n, also the Qwen3 Coder 30B model, and so forth. And that's basically it. If you do have any clarifying questions, then do leave a comment down below, and I will try to address those questions either in a comment or in another video. And if you have found this useful, then do subscribe to the channel as well, because it lets me know that I should be posting more stuff like this.