How to Run OpenAI's New Models For Free
8:10

How to Run OpenAI's New Models For Free

Ray Amjad 06.08.2025 1 523 просмотров 34 лайков обн. 18.02.2026
Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
Join AI Startup School & learn to vibe code and get paying customers for your apps ⤵️ https://www.skool.com/ai-startup-school 📲 Stay up to date on AI with my app Tensor AI - on iOS: https://apps.apple.com/us/app/ai-news-tensor-ai/id6746403746 - on Android: https://play.google.com/store/apps/details?id=app.tensorai.tensorai CONNECT WITH ME 📸 Instagram: https://www.instagram.com/theramjad/ 👨‍💻 LinkedIn: https://www.linkedin.com/in/rayamjad/ 🌍 My website/blog: https://www.rayamjad.com/ ————— Links Mentioned: - LM Studio: https://lmstudio.ai/ - Official Announcement: https://openai.com/open-models/

Оглавление (2 сегментов)

  1. 0:00 Segment 1 (00:00 - 05:00) 1087 сл.
  2. 5:00 Segment 2 (05:00 - 08:00) 746 сл.
0:00

Segment 1 (00:00 - 05:00)

OpenAI released two brand new open source models  earlier today, and I'm going to be showing you how   you can run them on your machine. Basically,  the bigger model, which is a 120 billion   parameter model, that requires 80 gigabytes of  memory, whilst the 20 billion parameter model   only requires 16 gigabytes of memory. So many  people will be able to run this model if they   have enough memory on their computer. I'm using an  M1 Max, which has 32 gigabytes of unified memory,   and that should be able to run the smaller model,  but not the bigger model. Basically, once you   have that requirement met, you want to go to LM  Studio, and then you want to click on the button   in the middle. It should automatically detect  which operating system you're running on, so   you can just press the button, and then download  this program over here. And you can then download   the model via the program once it's installed.   So we'll just wait for it to finish downloading   over here. And once it's finished downloading,  you can just click on the program, and on Mac,   I will drag into my Applications folder, and  then I can run the program. So I can just write   LM Studio after it's finished copying. And then  you should see a window that kind of looks like   this. If you don't see the screen, and it instead  asks you which mode you want to run LM Studio in,   I would recommend pressing Developer, because  you can always change that later. And then after   going through the onboarding menu, you should  see a screen that kind of looks like this. And   you can go to My Models over here, and then press  the Search button over here. And then you should   see a bunch of models listed over here. And you  should see OpenAI somewhere. So it should see like   OpenAI's GPT-OSS-20B. Or you might see 120 billion  parameter if your machine supports that. And then   you just want to press Download over here. And  then it should start downloading the model. And   this may take a while, because the model can be  pretty big. And now after about five minutes,   it's finished downloading. So I can press Load  Model over here. And then it will take me to a   chat interface. So I can say like, Hello, who are  you? Press Enter over here. And then you can see   it's running locally. So this was generated at 46  tokens a second. And I can disconnect myself from   the internet. So I can turn off my Wi-Fi. And then  still say, Can you hear me? For example. And then   it will still say, it will still give a response,  basically. And the tokens per second does depend   on how fast your machine is. You can also change  the reasoning effort over here. So I can turn   the reasoning effort to high. And say, What is  bigger? 9. 11 or 9. 9 over here. And then you can   see the entire reasoning over here, which is quite  interesting to watch. And you get a bunch of other   features in LM Studio, such as making another chat  over here. Making a folder. And then organizing   things together. So you can call this like,  Research. And then you can also do other things,   such as pressing Branch over here to branch  a conversation, which duplicates everything   that was done before. And then organizing into  different folders, for example. You can also run   this model locally. So for example, I'm still  disconnected from the internet. But I have a   small application that I made earlier. So this is  a with Cloud Code,   which is just a chatting application. And I can  have this application that I have running on my   machine use a local version of the model. So  if I go back to the code of the application,   then basically over here, I want to set the  base URL to be the local URL in LM Studio. So   if you go to LM Studio, go to Developer over  here. You want to make sure this is running.    So you want to press the button in the top right  over here where it says status running. And you   can change any settings, such as which port it's  running on, whether it's being served on the local   network. So other machines on your local Wi-Fi  router, local network, are able to access this   model. And you can change other parameters  as well. It shows the supported endpoints,   which are all the OpenAI-like endpoints, because  this is an OpenAI model. And you will see which   parameter or where it's running on your machine  over here. So you can see currently it says,   like, this port number with this address.   I can copy this over, go back to Cursor,   paste this as a base URL. For the API key, just  use LM Studio. And then for the name, just use   over here, I can replace the  model in the stream text with LM Studio instead.    And I can pass in the model name, which is GPT  OSS 20 billion or 20B. Press save. And then in   the chat application, I can just say hello. Press  enter over here. And it seems it didn't work. So   if I go back to LM Studio, I can see the developer  logs over here. And it says unexpected endpoint or   method slash chat slash completions. So it seems I  have to put in the, like, slash v1 after the base   URL. So if I go back to Cursor over here, write  slash v1. And then press save. And then try again.    Say hi, who are you? And now you can see it says  thinking over here. So this is, like, the thinking   process. And because I turned the thinking all  the way up to, like, reasoning effort high,   I guess it's thinking quite a lot. And now it's  going to start typing out the response. And you   can change this. So it says, Hello! 👋 I'm ChatGPT,  a virtual assistant powered by OpenAI's language   models. And you can see I'm still disconnected  from the internet. So it seems to be working   completely fine. If I go back to LM Studio, I  can see that this response was used by the model.    It says running chat completion on conversation  with two messages, streaming response. And yeah,   this is, like, pretty good. The developer logs  for debugging and what's happening with the model.
5:00

Segment 2 (05:00 - 08:00)

You can explore around and then see, make other  changes as well to the context, system prompt,   inference, temperature, and so forth. You can also  pass in these parameters when you're running it   locally. And you can change all these other  settings. If you're unsure what any setting   actually means here, then you can just ask chat  GPT itself, like the actual chat GPT, which is   connected to internet, to do the research for you.   And I can say, what is the benefit of running an   offline open source model? And then press enter.   And one of the biggest benefits that people are   talking about is especially with data privacy.   So in many regions or, like, places on earth,   there are laws that restrict where the data can be  sent. And sometimes the data is very sensitive. So   it can't be sent to, like, random model providers,  to OpenRouter, to open AI, and so forth. It could   be, like, confidential, like, client data when  it comes to health or, like, law-related cases.    So what many people can do is they can just run  a, like, offline version of a model on their   computer, on a, like, more powerful computer,  which has more than 80 gigabytes of memory,   on a GPU, and so forth. And they know that  computer is not connected to internet. And   they can pass in any client information. They can  pass in sensitive information and then query that   information. Knowing that information is not  being sent anywhere else. So a bunch of people   are making money by, like, setting up this  technology for law firms. So law firms can,   like, use the power of LLMs on their cases without  worrying about confidentiality being broken or,   like, data privacy being broken and so forth.   We can also see what OpenAI says over here. So   it says data and privacy and security. All input  and output stay on your own hardware. No user data   leaves your premises... No internet connection  required, as we said. Zero ongoing API costs.    So if you have a model running, like, all the time  or something, or you have it running on a machine   that's old in your house, if you don't have,  like, sensitive, time-sensitive information,   you can just have it running in the background  overnight and, like, crunching through data or,   like, coming up with news articles or whatever  else. And you only have to pay for the electricity   cost and the upfront cost of the hardware.   Latency control. You can control the latency on   your network. You're not, like, blocked by other  people who are using the same model at the same   time as you. You can add some tweaking as well so  you can change the model architecture. You can,   like, fine-tune models in some cases.   I won't be covering that in this video,   of course. Compliance, which we covered before.   Offline updates and governance. You can decide   when to update the model or, like, patch a version  so you don't have to rely on the vendor's release   cycle. So it does talk about stuff like hardware  cost, maintenance, how it's compute-intensive and   so forth. So this may be useful in your case or it  may be useful for your own company or just anyone   else you set up this model for. And if you do do  something like serve on the local network, then   you will see your local network IP address over  here. So on another machine, on the same network   that you're connected to, you can just use this as  a base URL. For example, to get a chat application   that you made with an OpenAI-compatible router,  remember to put the slash v1 after it. And then   you can run it as you would run any other model.   LM Studio installed, you can try downloading other   models. So if you go to Discover over here, you  can see a bunch of models that may be compatible   with your device. So you can see them in the  smaller models like Gemma 3n, also the Qwen3 Coder   30B model, and so forth. And that's basically  it. If you do have any clarifying questions,   then do leave a comment down below, and I  will try to address those questions either   in a comment or in another video. And if you  have found this useful, then do subscribe to   the channel as well, because it lets me know  that I should be posting more stuff like this.

Ещё от Ray Amjad

Ctrl+V

Экстракт Знаний в Telegram

Транскрипты, идеи, методички — всё самое полезное из лучших YouTube-каналов.

Подписаться