A NEW Way to Run LLMs Locally - Docker Model Runner

A NEW Way to Run LLMs Locally - Docker Model Runner

Travis Media

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Оглавление (3 сегментов)

Segment 1 (00:00 - 05:00)

This morning I began building an app. I write a lot and I do so in Markdown. So just for fun I created my own Markdown editor that runs privately on my machine. I can type preview it autosaves. But I also decided to add an AI chat window that can understand my highlighted selection or my entire article and give me suggestions on my writing. It's been a fun project. The model itself is open source. It's actually Quinn 2. 5 I believe. But it's optimized to utilize Apple Silicon Max GPU resources for efficient model inference. It's managed by Docker, but it runs directly on the host, not inside a container. Even cooler, it's now integrated right into the Docker CLI as a first class citizen. So, in my terminal, I can use docker model run to run it. And let's ask it something like remind me of the JavaScript reduce syntax with a real world example. And that's not in a container, by the way. It instead uses a host installed server that runs natively on your Mac. We'll get to all that in a minute. So, I've been sleeping on some of the things Docker's been doing in the AI space. I've been doing my normal building and pushing and all that in my day-to-day work, but was not aware of this new product they call Docker Model Runner. So, in this video, I want to show you why I think this is very significant for developers. And I also want to touch on a couple of other new updates that they presented on the AI front as well. And I think this type of video will be a yearly what's Docker up to sort of thing. Last year, we talked about BuildCloud, Scout, and Anit. And here we are a year later talking about AI. And whenever we do this, Docker kindly agrees to be a sponsor. So, thank you, Docker. So, first, let's look at Docker Model Runner, which again is a new feature that lets you run large language models locally with GPU acceleration on Apple Silicon Max, and it integrates seamlessly with your existing Docker workflows. So, if you open Docker Desktop, go to settings and beta features, and you can enable Docker Model Runner. If you don't see this, you may have to update Docker. It's in version 4. 40 and up. Or you can do this with the CLI by running docker desktop enable model-runner. Once you do that, you have this new docker model command available to you. You can pull, run, remove the usual docker stuff. So let's pull a model. If you go to hub. docker. com/ui, you can see all the models. Let's go with Quinn 2. 5 and the latest 7B models. Fine. So we'll run the command docker model pool and then the name of the model. You can also view and pull models here from docker desktop as well. We can list our models with docker model ls. Then to run it again, we can type docker model run the name of the model and then our message. And actually later in the video, I'll show you an alias to make this much easier. And then if you leave off the message, it'll go into interactive mode and you can ask away. And then in the case of my markdown app or an app that you're running locally, how do I use the LLM in that scenario? How can we use it in local app development? Well, before we get to that, let's get some context. Let's look at the bigger picture. Why did Docker create this and why and who would find this useful? Well, Docker created this for local development, for situations that require data privacy, for performance, and the familiarity and integration with existing Docker workflows. Instead of trying to run this in a container, as we're familiar with Docker, and trying to pack large AI models into container images, Docker Desktop runs a host inference server, which is llama. cpp CPP for now that runs natively on your Mac. In llama. cpp, it's an open- source software library that performs inference on various large language models. It's a C++ implementation with no dependencies and makes LLMs runnable on everyday hardware, utilizing your machine's capabilities instead of needing high-end GPUs or cloud services. Model Runner separates the model from the runtime, allowing for faster deployment. It runs LLMs directly on your machine rather than sending data to external APIs, meaning your data never leaves your infrastructure. Now, I know a lot of you have big GPUs in your home labs, but many of us don't. And this provides a great option for software engineers to run LLMs locally on Apple Silicon Max. Next question is, who is this for? Well, first, for developers building generative AI powered applications who want to develop and test locally without relying on external APIs. Second, it's for teams developing AI prototypes who need quick iteration cycles without constantly deploying to cloud environments. And then of course, data scientists and machine learning engineers who need a simple way to run their models locally with GPU acceleration while leveraging familiar Docker workflows. In fact, if you're building models locally and want to tag and push your own to a registry, the Docker model command will get you there as well. And then of course any Mac users with an M1 through an M4 or whatever is out today who want to leverage their machine's GPU capabilities for running AI models locally, not having to worry about data privacy. Now that you know what it is and who it's for, let's get back to using it in your dev workflow. So there are a number of connection methods once you have a model pulled. You can access it from within containers via the internal DNS name. You can access it from the host via the Docker socket or TCP. And to do that you have to enable this option which I

Segment 2 (05:00 - 10:00)

did. And then modelRunner implements openAI compatible endpoints. So in my app I use that very route. I enabled TCP and I have this URL to work with an OpenAI compatible endpoint. If you've used that before you'll see the familiar chat/comp completions. And you don't need any API key locally. A JavaScript call would look something like this. And with these endpoints I can completely develop my app locally. build a local app that uses these open source LLMs and Docker's made it really easy to do for those on Mac M1's and up. And if you actually want a quick example, a sample application that you can clone right now and run, check out the generative AI application that Docker has available to clone for this very purpose. So, first to make sure that Docker's running, go ahead and pull this model. Then, clone the repo at Docker sample/AI app demo. Set the required environment variables. run the Docker Compose and you'll have a working application to try out and dissect right away. Now, what about Windows users? Well, Windows users can run it with a supported Nvidia GPU. Outside of that, it's the same steps, except you should also tick the enable GPUbacked inference setting. Now, the second big thing that Docker has introduced has to do with MCP or model context protocol. I'm sure you're aware of MCP already, but it essentially provides a standardized way to connect AI agents to tools. What Docker has done is they've recognized the complex setups and security concerns and have created their own catalog of secure containerized MCP servers and there are hundreds here. Now, it seems everyone has their MCP catalog these days, but for those who are in the Docker ecosystem or using it regularly in their workflows, there's a lot of benefit here from being able to run them in a container, being able to manage secrets securely within the Docker platform, being able to submit your own MCP servers to DockerHub, but most importantly, and I think this is huge, is having Docker handle like everything for you. If you look here in the catalog, and let's just scroll down and click on GitHub, you can install this MCP server. You can connect to your client. All of this with normal Docker CLI commands. Even easier is through Docker Desktop. If you open Docker Desktop, go to the MCP toolkit and you can see the entire catalog. Looks to be 138 MCP servers at the time of this recording. And what's neat is you can integrate these with your favorite client like claw desktop or cursor by just clicking on connect. Docker does the work by updating your config for you. So I have a blank config. I hit connect and boom, easy. There's also the config you can enter manually for other options like VS Code. And once that's done, you can literally just click the plus button to add the server and have it available for you to use. And if you don't see this MCP toolkit, then go to settings beta features and just be sure it's enabled. And in doing this, you also get the Docker MCP CLI command if you'd rather interact from there. So instead of Docker model like we used earlier, we can use Docker MCP. And you can connect to clients from here with the Docker MCP client command. So, Docker MCP client connect-G for global cursor. But anyway, back to the desktop client. Let's go ahead and try one of these out. And everyone seems to show off GitHub. So, if I go here, I need to be sure to add a personal access token and do remember the principle of lease privilege here when doing so. And then I can ask it something like see if I have any issues on my YouTube stats repo and give me some feedback on how to address them. And I can see a list of my issues with some helpful feedback for each one. But like I said, everyone demonstrates it with GitHub. So let's try something like the YouTube transcripts MCP and try to get not only a transcript from one of my YouTube videos, but let's have it take that transcript and create a blog post from it. So give me the transcript for this YouTube video. Here's the link. Read it, understand it, and create me an 800word blog post from it in Markdown. And it's writing me a blog post. Awesome. So, Docker makes it really simple to just toggle on a new MCP server. And when you use it, like if I say, what is my GitHub user? You can see that a container spins up and spins down to perform that task. So again, you just add the server that you want. And then of course, make sure that you've connected to a client as well. And again, this can all be managed in the Docker CLI as well if that's your flavor. Now, one final thing I want to mention that's also helpful in new is Gordon, Docker's new AI agent that they describe as an embedded contextaware assistant seamlessly integrated into the Docker suite. Available within Docker Desktop and CLI, this innovative agent delivers tailored guidance for tasks like building and running containers, authorizing Docker files, and Docker specific troubleshooting, eliminating disruptive context switching. So, if you have a question on anything in your Docker ecosystem, no need to pull up a web page, no need to go hunt it down, just ask Gordon. So, you can activate this in settings by going again to beta features and enabling enable Docker AI. And now, wherever a terminal is, you can type Docker AI in the text after it to use Gordon for things like check my

Segment 3 (10:00 - 11:00)

Docker file for issues, how can I optimize my image size, list all running containers, explain Docker networking to me, things like that. But let's try it. So, back to this app that I was talking about at the beginning of the video. Let's say I want to containerize it. So, let's ask Docker AI create a Docker file for my NextJS app here. So, we're going to get Gordon to create us a Docker file. And while he's working on that, who likes having to type the quotes and all? Instead, create an alias like this. So, I'm using DAI. Docker AI. And it calls Docker AI. And then whatever you type after that is going to fall in these quotes. And we can just say DAI and whatever text without the quotes and it'll work now. So DAI, will Bitcoin hit 200K this year? Of course it will. But Gordon is specialized in Docker, not Bitcoin. And Gordon has created my Docker file. I just wanted to mention that because that's a really neat feature also. So if you're deep into the Docker ecosystem, need quick MCP server access, or are building models or AI applications locally, do check out all that's going on over at Docker. And I just saw the other day that they've incorporated AI into Compose and I'm very excited to see them continue to innovate. But that's it for this video. If you found it helpful, give it a thumbs up. If you haven't subscribed to the channel, consider doing so. And I'll see you in the next video.

Другие видео автора — Travis Media

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник