# Agent Gateway: LLM Gateway on Kubernetes

## Метаданные

- **Канал:** That DevOps Guy
- **YouTube:** https://www.youtube.com/watch?v=QZUZCz1Q1Vs
- **Дата:** 20.05.2026
- **Длительность:** 19:13
- **Просмотры:** 1,256
- **Источник:** https://ekstraktznaniy.ru/video/51714

## Описание

Follow the DevOps roadmap👉🏽 https://www.instagram.com/marceldempers
My DevOps Roadmap 👉🏽 https://marceldempers.dev
Patreon 👉🏽https://patreon.com/marceldempers

Checkout the source code below 👇🏽 and follow along 🤓

Also if you want to support the channel further, become a member 😎
https://marceldempers.dev/join

Checkout "That DevOps Community" too
https://marceldempers.dev/community

Source Code 🧐
--------------------------------------------------------------
https://github.com/marcel-dempers/docker-development-youtube-series

Like and Subscribe for more :)

Follow me on socials!
Instagram | https://www.instagram.com/marceldempers
X | https://x.com/marceldempers
GitHub | https://github.com/marcel-dempers
LinkedIn | https://www.linkedin.com/in/marceldempers

Music:
Track: souKo - souKo - Parallel | is licensed under a Creative Commons Attribution licence (https://creativecommons.org/licenses/by/3.0/)
Listen: https://soundcloud.com/soukomusic/parallel

Timestamps:
00:00 Intro
00:03 What 

## Транскрипт

### Intro []

In most companies, users call

### What is Agent Gateway [0:03]

LLMs directly and so do AI agents with different API keys, different providers with the lack of visibility, no observability, and no security. What if we could put a central gateway in front of all our AI systems? This means one endpoint, central API keys, traffic visibility, and observability. This is exactly what agent gateway does. In this video, we're going to be taking a look at what the agent gateway is, the why and the how. We're going to take a look at what it takes to install it on a Kubernetes cluster, configure it, set it up, and the main focus of this video is how to route traffic to different LLM models. So, without further ado, let's go. — So, agent gateway is exactly what the

### The LLM Gateway [0:58]

name implies. It's a gateway for AI infrastructure. Think of it as like a reverse proxy that sits in front of all your AI systems. It features everything you need to connect, secure, and observe AI infrastructure. It can sit in front of LLM models. So, basically route intelligently to LLM providers as well as local models. We can use it as an MCP router. So route traffic to external and internal MCP servers. It can also route traffic between AI agents being a gateway for agent to agent traffic and this allows the gateway to act as an additional security and observability layer. Agent gateway is open source on GitHub under the Apache 2 license. On their GitHub, if you scroll down, they have a very interesting video. This video discusses the major differences between an agent gateway and a regular gateway. If you're familiar with Kubernetes gateway API, you might be familiar with gateways like Engine X traffic, K gateway, envoy gateway, STTO and linkd. The important difference to

### Challenge using HTTP Gateways [2:04]

know is that general HTTP traffic is considered stateless. This means generally with HTTP gateways will route traffic based on URL path and sometimes based on headers. And you also may load balance to different upstream services. This is very limiting for AI workflows. This is because we may want to route traffic to LLM models based on what's in the HTTP body, not just the URL. Also, when an agent is talking to another agent via a gateway, we can't have this gateway acting as a general load balancer and load balance traffic to different upstream services. The gateway needs to be aware that agent A is talking to agent B so that the traffic doesn't end up on random agents. It needs to be AI session aware. Also, when we routing traffic to different MCP servers, we may want the gateway to be aware what MCP tools are available. This means we may need a smart gateway that can contact all the internal MCP upstream services and present the available tools to the client in a smart manner. Now, because agent gateway has a

### LLM Gateway Overview [3:13]

ton of features, in this video, we're going to specifically focus on the LLM gateway features. This allows us to seamlessly switch between providers without changing the application code. So, we can route to upstream Open AI, Anthropic, Gemini, Google AI, and Amazon, and more. Pretty much any open AI compatible provider using chat completions and streaming. Now the cool thing here is if you're using Kubernetes and you're familiar with gateway API in Kubernetes, you can use traditional HTTP routes, gateways, and gateway classes to manage this traffic. Now on the channel

### The Gateway API Series [3:50]

you'll find a full playlist for gateway API with an introduction as well as videos on each different type of popular gateway API available. Agent Gateway uses Kubernetes gateway API objects. So it's important for this video. So to get started with agent gateway, we're going to need a Kubernetes cluster. In my introduction to gateway API, we discussed this. We create a Kubernetes

### Create a Kubernetes Cluster [4:16]

cluster using a utility called kind. So I'm going to go ahead and copy this command. to my terminal. Go ahead and paste that. And that will create a one node Kubernetes cluster that we can use for testing. With that command done, I can do cubectl get nodes. And we can see we have a one node Kubernetes cluster ready to go. Now, agent gateway needs a Kubernetes

### Gateway API CRDs [4:34]

cluster, but it also needs a gateway API enabled cluster. To do so, we're going to enable the gateway API by deploying the CRDs. This is super easy. We can grab the YAML from the Gateway API special interest group guide directly or apply it directly off of their GitHub. I'm going to grab the 1. 4 version of Gateway API. Hop into my terminal, apply the CRDs. This gives us the capability to run commands like cubectl get gateway class gateway HTTP route as well as TLS TCP and UDP routes. This gives our cluster gateway API capabilities. Now on the agent gateway site, if you scroll

### The Docs [5:13]

all the way down, you'll find a pretty cool tutorial section with tutorials on LLM gateway, MCP routing, and more. This covers the standalone install if you're just running agent gateway on a server directly, but also have a bunch of Kubernetes tutorials for LLM gateway, MCP routing, and more. If you go to the LLM gateway tutorial, you'll find pretty cool and clear documentation on everything you need to get it going. The documentation also has installation instructions for Helm, Argo CD, Flux, and all these are pretty well documented. What we're going to do is

### Installation [5:51]

follow the steps using Helm and install it to our local cluster. This is very easy to install agent gateway. It comes as two Helm charts, one for CRDs and the other one for the control plane. At the time of this recording, I'll be using chart version 1. 1. 0. You can get the versions by using Helm show chart for each of the charts available. I pin that in an environment variable. I jump to my terminal and paste that. Then I go ahead and install the two charts. Firstly, the CRDs. This will give us access to special YAML objects for agent gateway. Then we go ahead and install the control plane. The control plane is basically the pods that will be looking at gateway API objects as well as agent gateway objects. So we can go ahead and deploy our proxies, which is basically going to be our agent gateway that takes the traffic. So go ahead and install the CRDs by copying the Helm command for the CRD chart. Jump into my terminal and paste that. And that is installed successfully into my cluster. Then I can copy the Helm install command for the actual control plane. Go ahead and copy that. And that will install the control plane to my cluster. With that now installed, if I clear the terminal, we'll have a new nameace called agent gateway system. And inside there, we'll have our control plane running. The control plane will watch for resources such as gateway classes, gateways, HTTP routes, and then go ahead and implement them as we've learned in our gateway API series. So, it's almost no different to a regular gateway API implementation. The first thing that this agent gateway will do is

### Gateway Class [7:30]

implement what's called a gateway class. I can go and check this in my terminal. We can see a gateway class has been installed. This gateway class resource is just a YAML that infrastructure engineers define what gateways can be implemented. So infrastructure engineers can control what gateways can be implemented in our cluster. Whether it's engineext fabric, whether it's traffic, whether it's K gateway or agent gateway. Now just a friendly reminder that if this guide is useful, smash the like and the subscribe button so you know when the next episode drops. Now in our

### Gateways [8:05]

gateway API series, we talked about gateways. Now gateways are just YAML objects that describe how traffic comes into our cluster and it describes what gateway class to use. In this case, our gateway will be an agent gateway. And we can describe things like the number of replicas for the gateway proxy, the type of load balancer we want to use, and all the settings that should be tied to that gateway. If you're new to gateway API, this is what gateway looks like. Kind is gateway. It has a name. This one will just live in the default namespace. We'll call it agent gateway. It'll implement the gateway class that we have seen earlier called agent gateway. And here we can supply listeners how traffic will come into our cluster. In this case, I'm just going to have a port 80 listener available. And we define what routes can attach to this gateway. In our case, all routes will be allowed. I can then go ahead and install this gateway by just applying that YAML file. go ahead and apply that to my cluster. Then I can do cubectl get gateway. Make sure that it's been accepted. And now the data plane should be in our cluster. So I can say cubectl get pods and we should see the data plane pods coming up. We have one replica as well as a service to expose that pod. And the default type is load balancer. You can use the gateway parameters to override the number of replicas, load balancer type, and configuration values for the gateway. Now, since we don't have a load balancer in kind, I'm just going to use port forward to access the gateway. So, I'm going to jump into my terminal and I'm just going to paste this port forward command, which is going to forward all ports coming in on port 80 for localhost to the gateway API service. So I'm just going to leave this running in the background as this will capture all traffic coming into our cluster on port 80. Now here is where

### Agent Gateway Backends [10:01]

agent gateway gets quite interesting. If you're familiar with gateway API in Kubernetes, we generally route traffic to upstream Kubernetes services or external URLs or internal URLs. Agent gateway allows you to define different types of upstreams other than Kubernetes services like MCP servers, agents and in our case LLM models. This can be a model provider like Gemini or Claude or a local LLM running within your Kubernetes cluster or outside of the cluster. And the pattern is quite the same. We create a Kubernetes secret for that provider like an API key. Then we create an agent gateway backend and attach it to our gateway using an HTTP route. And this is

### LLM Routing [10:49]

how we perform LLM routing. When we deployed agent gateway, the CRD helmchart added this new item called agent gateway backend. Agent gateway backend is an LLM provider as a routable backend instead of something like a Kubernetes service that we're used to. So, in this example, I'm going to show you how to create an LLM backend for something like Gemini. Firstly, we're going to need a Kubernetes secret that stores the Gemini API key. So, we're going to want to go ahead and grab a Gemini API key from Google AI Studio. I go ahead and paste my Gemini API key in there. And then I go ahead and create a Kubernetes secret holding that key. Go ahead and jump to the terminal and paste that. This will create a new Kubernetes secret that holds the LLM provider backend secret that will be used to route traffic. So we'll be creating an HTTP route in a bit. But before we do that, we would generally need an upstream service. But instead of just using a Kubernetes service, we're going to be using an agent gateway backend.

### Creating a Backend [11:53]

And this is what an agent gateway backend looks like. We give it a name called Gemini. And here it has a spec with AI provider details. It supports many providers. In our case, we put Gemini and the model that we want to use by default. And then we put the policies for authentication pointing to our Gemini secret we just created. It's as simple as using cubectl apply to apply that backend. We can do cubectl get agent gateway backend and see that the Gemini backend has been accepted. You can also use describe commands to troubleshoot this. Now we have the backend up and running. To route traffic, we'll need an HTTP route. Now the HTTP route is how we define what our

### HTTP Route [12:33]

gateway will do with the traffic. Here I have a HTTP route called LLM route. So I want all traffic coming into my cluster that hits our gateway. This is the gateway reference that the route will attach to. So any traffic that hits this gateway route to the Gemini backend that we just created, which is our agent gateway backend. So this is just a simple HTTP route that takes all traffic coming into the gateway and sends it to Gemini. We can go ahead and apply that HTTP route. We can do cubectl get see that it's been created. And to test this, we can use

### Testing HTTPRoute [13:12]

curl and send traffic to localhost 8080. This will hit our gateway. And since our HTTP route that we created takes all traffic, it doesn't matter what path we pass in. All traffic coming into this URL will hit our gateway and agent gateway will route this to Gemini. So here we send a simple message which is a prompt saying what is Kubernetes in one sentence. So this will pass all traffic to Gemini. You can see here's our model response. We got a response from the Gemini model, some token details with the answer over here. Now this to me in

### Route by Path [13:48]

my mind is a bit too simple because we're routing all traffic to an upstream LLM provider. very basic but in the real world you might be required to route based on path. This is where HTTP route becomes powerful. So here I use an HTTP route. I'm going to update the route we've just taken a look at. I still attach it to the same gateway but here I'm going to start using rules. This means I have control over the paths coming into the gateway. I want to send all traffic coming to / AI/gemini and I want to URL rewrite that. So all traffic coming to AI Gemini replace the prefix so the upstream service doesn't see it and pass that traffic to Gemini. This means I have control over the URL. So I can say all traffic coming into AI Gemini route all that traffic to Gemini. So I can go ahead and update that route and then test it. And here I have control over the URL. So based on the path coming in, I can route it to different LLM models. Now because I have full control of the URL, I can

### Route to other Models [14:53]

route traffic to different LLM providers based on the URL path. So let's say we want to go ahead and route traffic to other LLM providers that we have. We can support multiple LLM providers through a single endpoint. So we can use the same methodology to create a new LLM backend for anthropic and send that traffic to a claude model. So all I'm going to need here is an anthropic API key and create a new secret for that backend. So in claude I can go ahead and create an API key and paste my API key into an environment variable. And then the same as with Gemini, we can go ahead and create a Kubernetes secret. So I'm going to go ahead and create that secret. Now we have a secret for our anthropic backend as well as our Gemini backend. And then I can go ahead and define a new agent gateway backend called anthropic with new AI provider settings with a default model to use. And here I reference the anthropic secret that we created earlier. I can go ahead and apply that backend. I can do cubectl get. And we can see that both our backends are accepted. Now I can go ahead and update our HTTP route. I still keep the same name. So I'm going to update the existing route. But this time I have two rules in my HTTP route. If I expand the first one, we can see we want to take all traffic coming into AI clawed. We want to URL rewrite that just to strip out that prefix and route the traffic to our anthropic backend. So that is the first rule. And then we have the second rule where we say anything AI Gemini and we URL rewrite that to strip the prefix and route that to our Gemini back end. I go ahead and apply that HTTP route update. And this now means that we have very powerful control over our HTTP routes, our URL traffic. We can route to different models within Kubernetes and to external providers. This now means that all traffic coming into AI Gemini will go to Gemini Flash model and any traffic coming into AI Claude will go to the Claude Sonnet model. And here's an example of routing to Claude. Copy this curl command. Jump into our terminal. Paste that. And here you'll get the response from Claude Sonnet. Some token details with the answer. Now this might just seem like general URL routing like any gateway API can do this. But the power here is that the agent gateway is token aware. We can also route based on the actual HTTP body. And it gets more powerful when we take a look at things like MCP routing and agent to agent gateway traffic. These are things that the standard HTTP protocol doesn't really cater for with general gateway APIs. Now, if you'd like to follow along

### The Source Code [17:38]

with the source code, everything we do on this channel is on GitHub. You'll find the link in the description to the Docker Development YouTube series GitHub repo. If you scroll down, you'll find the Kubernetes folder. And all of the gateway API work is under the gateway API folder. You'll find everything in the series under here, including agent gateway, which is under the agent gateway folder. You'll find a readme. This is the introduction to agent gateway, specifically LLM gateway routing that we've taken a look at today. So, the installation instructions, how to set up the actual gateway, and then perform the routing that we've taken a look at today. So, be sure to check out the link down below to the source code. so you can follow along. Hopefully, this video helped you understand how to route traffic in a Kubernetes cluster using agent gateway. In a follow-up video, I'd like to take a look at something like MCP routing because MCP servers are becoming more popular. So, as a DevOps platform engineer or S, you need to know how to manage MCP servers, especially in environments like Kubernetes, and how to manage the traffic routing. Let me know down below if that's something you're interested in. Also, be sure to check out the link down below to the Ultimate DevOps roadmap as we're getting really close to completing the Ultimate DevOps course for anyone that's interested to learn how to start a journey on DevOps. If you want to support the channel even further, hit the join button down below to become a YouTube member. And as always, thanks for watching and until next time, peace.
