How to Run AI Offline for FREE in 2025

8:14

How to Run AI Offline for FREE in 2025

The AI Advantage 26.06.2025 23 250 просмотров 577 лайков обн. 18.02.2026

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

Try for FREE the Docker Model Runner here: https://dockr.ly/4kSzvtM Today I'll show you the easiest way to run AI completely offline using a tool you're probably familiar with already: Docker. They just launched a new Model Runner feature in Beta and in this video I'll show you exactly how you can start using it today. Links: https://www.docker.com/ https://hub.docker.com/u/ai?_gl=1*4wpt01*_gcl_au*ODU5MDU3NTUxLjE3NTA1MzYxOTA.*_ga*NjE5NTczMzU5LjE3NTA1MzYxOTA.*_ga_XJWPQMJYHQ*czE3NTA2MDk1NjYkbzIkZzEkdDE3NTA2MDk1NjYkajYwJGwwJGgw https://github.com/docker/hello-genai https://docs.docker.com/ai/model-runner/ ---------------------------------- Commands Used: docker model pull ai/smollm2 ./run.sh ---------------------------------- Chapters: 0:00 Intro 1:01 Installing Docker Desktop 2:59 Enabling Up Model Runner 3:11 Running Your First LLM 5:49 hello-genai Demo 7:42 Documentation 7:59 Outro #ai This video is sponsored by Docker. Free AI Resources: 🔑 Get My Free ChatGPT Templates: https://myaiadvantage.com/newsletter 🌟 Receive Tailored AI Prompts + Workflows: https://v82nacfupwr.typeform.com/to/cINgYlm0 👑 Explore Curated AI Tool Rankings: https://community.myaiadvantage.com/c/ai-app-ranking/ 💼 AI Advantage LinkedIn: https://www.linkedin.com/company/the-ai-advantage 🧑‍💻 Igor's Personal LinkedIn: https://www.linkedin.com/in/igorpogany/ 🐦 Twitter: https://x.com/IgorPogany 📸 Instagram: https://www.instagram.com/ai.advantage/ Premium Options: 🎓 Join the AI Advantage Courses + Community: https://myaiadvantage.com/community 🛒 Discover Work Focused Presets in the Shop: https://shop.myaiadvantage.com/

Оглавление (7 сегментов)

Intro

One of the most asked questions I get is, "Eiggor, this AI stuff is fantastic, but how do I do this without sharing my data with some company? " Well, in this video, we'll be looking exactly at that. You can get to having this private chat interface on your computer in under 5 minutes, and it doesn't cost anything. You don't even need a credit card, and it all runs locally on your computer. So, here's the deal. As you might know by now, there's a lot of LLMs out there that compete with the likes of Chat, GPT, Clot, etc. But just because they exist, people don't intuitively know how to actually use them. One important thing to know is that all of these open source LLMs are free. And today I'll be showing you a software that can run them. And it's by company you might actually know, especially if you ever did something in the development space. And the app is called Docker. And with their model runner, you can use your computer to run these LLMs locally. I'll show you how to do that. And then set up a simple app that you could then keep on customizing all of it again at no cost. They reached out and told us about the software. because this is one of the most requested questions that we get on the channel. How do I use LLMs locally? I'm excited to now show you how to do this really simply with the Docker model runner. So

Installing Docker Desktop

first things first, you'll head on over to docker. com and you'll download Docker Desktop. As you can see, it's available for all platforms. I concretely am on Apple Silicon here. And the fantastic thing versus some competing products is that when we run the models like this, they use the GPU instead of the Apple Silicone Max to accelerate the model. So, all I'll do is go through the installation. And once you log in with your Google account, voila, you have Docker desktop installed. Again, this is both on Windows and Mac, but on a Mac, it will always run in the background here on top. And before we proceed, I want to just quickly explain to you what Docker usually does and what we're going to do now. And for that, I'm going to use my trusty little iPad here. And I'm going to draw a laptop. Yeah, this is a laptop. Apologies for my drawing skills. So, what usually happens if you type a request into your laptop is that gets sent over to some server in a data center. And once it processes the result, it will send the answer back to you. Now, what we'll be doing with this app that you just downloaded is we'll be cutting out this entire server step because you will be running your model locally. So, you type something in, your computer processes it, and you get your answer right here. And you can do all of this while staying offline. Now, I want to add one more important note cuz usually when you have your laptop and you run Docker on it, and some developers might know this. Oh my god, these laptops are getting worse every time I draw them. How about a desktop computer this time? That's better. Think about this as an iMac. What Docker usually does is create a second computer inside of your computer. Okay. So, as you can see here on screen, when I open up Docker, you can see it has all the controls of a computer. You can pause, you can restart, or you could close the app. And here at the bottom, you can see how much disk space, CPU, and RAM it's using up right now. And that's what Docker usually does and is known for. In this case, we're not doing that, though. We're serving up this LLM on your local computer using your CPU, RAM, and especially the GPU. I thought highlighting this is important because the name could easily confuse you to think that you're running inside of virtual machine. That's the computer inside of a computer. Not in this case though. So now that we have that out of the way and this installed, we can actually proceed to making this work. So the way it works is very simple. You

Enabling Up Model Runner

just need to go to the settings. As I click the settings icon, I can go to features and development. And here you need to enable Docker Model Runner and also enable the second check mark. Now it might be already on by default for you. Just make sure this is on because

Running Your First LLM

as soon as we have that, we can move over to the DockerHub and here we can pull different large language models. If you want a list of all the available ones, you can head on over to this link in the description that lists all of the models here. In our case, you can see the Llama models. You can see some of the Microsoft models and more models that you might know like DeepSeek. In our case, we're going to be using this one. Small LM2 is just a real winner in terms of what you get for the size and you can run this on a lot of machines. But what I'll just do in this case, I'll just look for small LM2. Perfect. That's the one. And once you're here, you can simply click pull. And it's going to pull, aka download the latest version of this model onto your machine. In case that doesn't work for you, you can always go to the terminal and run this simple command docker model pull. And then you have to say AI/ the model name. In this case, AI/Sall LM2. And this way, you can always pull these models simply by using the model name. I think the most interesting ones here right now, but this will change over time, are 54 for development and small LM2 and gem free for general chat or productivity use cases. Okay. And as I talked, it already pulled this little model on my machine. One last tip that you might want to know is that if some models don't work, it might not be the restriction of your computer not being big enough. It might be the fact that you didn't allow Docker to use enough of your computer. That's easily fixable. So you can go to the settings here and then under the resources you will see that this memory limit is the main thing that constrains what model you can run. In my case you can see from 32 GB I gave it 28. If you've got the somewhere around 4 or 6 GB most of these models are not going to work. Okay. And we have the model locally. So now all we need to do is say run and you could already talk to it. Look at that. It gives you an answer. What happens if I turn off my Wi-Fi in this case? I'll just go back to the model, start a new chat, say hi again and I get an instant answer. And look at that speed. It just appears I don't need internet nothing. So that's pretty nice already. This model just lives on your machine now and you can always run it in here. Now let's take this a step further and this is especially interesting to developers but I also think that to nontechnical folks this is something to be aware of because this doesn't just allow you to run the model in here. It also allows you to provide that model to some other application on your computer or you could also do it through the web. essentially hosting the model on your computer and you can access it just like you would a chat GPT model through the API. Again, remember my terrible drawing with the servers and the computer. Well, in this case, our computer is going to be serving as the server here. So, we can use our own application and use our own model. No need to send any data away from your computer. How do we do this? Well, it's quite simple. In this case

hello-genai Demo

I'm going to demo it on a project that Docker provides also completely for free on their GitHub. It's our Hello Genai project. And if you work with GitHub, you could simply clone this to your machine. In this case, we're going to do the super low tech version of it. I'm just going to download this to my machine. Unzip it. And here it is. All I need to do is the following. Here's a little trick on Mac that you might not know. This is a developerish thing. But if you hit command, shift, and dot on a Mac, it will reveal hidden files that you're not supposed to see. And there's thisv file here that I can now open with a text edit. And this is the only edit I need to make. I need to tell it what model will I be using. Again, we need the model name here. So, here you can see AI/lama 3. 2 and then the details of it. In this case, remember we're using AI/Sallm2. So, I'll simply change it to that. AI/S small lm2. I'll click save. Close this out. And then I can press it again to hide those files. By the way, that's going to work everywhere in your computer, but usually you should be editing those files if you don't know what you're doing cuz things might break. They're hidden for a reason. And then what I'll do is I'll rightclick this, go to services and open a new terminal add a folder. Just trust me on this. This is the simplest way to do this. All I'll do is say dot slash run. sh. And as I open the terminal inside of this folder, it will just run this file called run. Nothing is set up to run this entire app for me. Bunch of magic happens in here. I'll give it access to this folder. And down here, you can see that the apps are already running. All I need to do is copy this URL, this HTTP URL, and in my browser, if I open up a chat, it works. And if I click one of these presets, you can see I can talk to it. And there you go. Instant email about penguins. And I'm still offline. All of this happened without the internet. And because I have this model in here, and by just simply running this one run file, it will launch different versions of this in different coding languages that you could then build out for yourself. Docker is serving up the model. And I'm using it in this app. And

Documentation

then if you want the developer documentation, they have all the details on how to integrate this into any other application just like you would a OpenAI API. The only difference being that you actually run your own model. And as promised, all of this is completely free and even GPU accelerated if you're on a Mac. So there you go. That's a

Outro

completely private AI assistant that costs nothing. And you can pick the model depending on the size of your machine. But yeah, you should probably have more than 8 GB of RAM. I hope this was valuable or interesting to you. And with that being said, I sincerely hope you have a wonderful

Другие видео автора — The AI Advantage

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник