New High Paying skill for DevOps Engineers - AgentOps | Complete tutorial.
20:29

New High Paying skill for DevOps Engineers - AgentOps | Complete tutorial.

Abhishek.Veeramalla 06.05.2026 16 081 просмотров 241 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
- Get started with Flyte for FREE: https://flyte.org/ GitHub Repo for files used in the video: https://gist.github.com/iam-veeramalla/c037ca8cb0b018013da433a7c2cec628 AgentOps is a developer platform for monitoring, debugging, testing, and evaluating AI agents especially agents built with frameworks like flyte, LangChain, LangGraph and similar multi-step LLM systems. The best way to get started with AgentOps is using the Flyte Devbox as shown in the video: https://www.union.ai/flyte/devbox Think of it as the equivalent of: Datadog/New Relic for AI agents or “application monitoring” specifically for autonomous LLM workflows. What AgentOps Does? It helps developers answer questions like: - Why did my agent fail? - Which tool call caused the error? - How much did this run cost? - What prompts were sent to the model? - How long did each step take? - Which agent performed best in production? Free Course on the channel ============================== - DevOps Zero to Hero Playlist: https://www.youtube.com/playlist?list=PLdpzxOOAlwvIKMhk8WhzN1pYoJ1YU8Csa - AWS Zero to Hero Playlist: https://www.youtube.com/playlist?list=PLdpzxOOAlwvLNOxX0RfndiYSt1Le9azze - Azure Zero to Hero Playlist: https://www.youtube.com/playlist?list=PLdpzxOOAlwvIcxgCUyBHVOcWs0Krjx9xR - Terraform Zero to Hero Playlist: https://www.youtube.com/playlist?list=PLdpzxOOAlwvI0O4PeKVV1-yJoX2AqIWuf - Python for DevOps Playlist: https://www.youtube.com/playlist?list=PLdpzxOOAlwvKwTyYNJCUwGPvql0TrsPgv About me: ======== Instagram: https://www.instagram.com/abhishekveeramalla_official/ Telegram Channel : https://t.me/abhishekveeramalla LinkedIn: https://www.linkedin.com/in/abhishek-veeramalla GitHub: https://github.com/iam-veeramalla Medium: https://abhishekveeramalla-av.medium.com/ Disclaimer: Unauthorized copying, reproduction, or distribution of this video content, in whole or in part, is strictly prohibited. Any attempt to upload, share, or use this content for commercial or non-commercial purposes without explicit permission from the owner will be subject to legal action. All rights reserved.

Оглавление (5 сегментов)

Segment 1 (00:00 - 05:00)

Hello everyone, my name is Abhishek and welcome back to my channel. Do you know 95%age of AI agents never reach production? They work absolutely fine on the local machines but the moment you start putting them in production they start breaking. So in today's world building an a agent is not a big deal. The main challenge is running the agents reliably making sure you handle retries timeouts. You can schedule them and most importantly you can monitor them so that you can go back and see what exactly went wrong. Now how do you achieve all of this? This is where a concept called agent ops is trending. In very simple words, agent ops is treating AI agents like traditional software systems. Just like how you deploy traditional software systems and monitor them, you do the same thing for AI agents. In today's video, let's learn agent ops practically using flight. If you're new to flight, flight is an open-source a orchestration platform and it is very easy to get started. In fact, in today's video, I'll show you how to spin up a flight cluster locally using the new feature called devbox. So, we will create a kubernetes cluster and we will install the flight cluster in it. All of that just using a single command. This is going to be a very interesting lecture. Agent ops is super cool. So just make sure you watch the video till the end. So to understand agent ops better, let's focus more on the problem statement. I'll show you how easy it is to build an agent and then I'll also tell you why is this agent not production ready. Let's take example of a weather agent. An agent that can predict weather of a particular region. We can build this agent in four simple steps today. Step one, we will start with defining a tool that can fetch the weather data. This is a code snippet. As you can see, this can fetch the weather data and whenever model requires it can use the data. Step two, once we have the tool defined, we will define the model and we will connect model with the tool. So this way whenever model needs the tool it can invoke the tool and we can use lang chain here. Step three we will use lang graph to define the flow of the agent. For example node configuration, router configuration or graph configuration. We will do all of that using lang graph. And finally step four we will invoke the agent or we will run the agent. That's all. Using these four steps today, you can build any AI agent. Let's see. Let's put these code snippets into a Python file. Then let's create Google Gemini API key and let's try to invoke the agent. Let's see if it works as expected. I'll first head to Google AI Studio where I can create a free API key. In fact, you can use your OpenAI key as well. But for the purpose of the demo, so I'm here. I've created using create API key and this is my API key. It comes with limited runs but for the demo it should be good. Now I'll go back. I'll export it using the export command. So just export Google API key followed by the key that you got from Google AI studio. Then I'll create the weather agent. So this is going to a Python file weather agent. py Pi and the code snippets that we have. Now let's save this file and let's try to run it. python 3 weather agent. py and let's give it with the prompt prompt. How is the weather in Hyderabad on 20th of Sep 2020? There you go. So it says the weather in Hyderabad on 20th of September 2020 was 27. 3 28. 1 29. 3 31 and 33. 2°. So you got the response from the AI agent. But if you observe carefully this setup is missing several important capabilities. We haven't taken care of deployment. We might still have to spin up a Kubernetes cluster and deploy the agent onto the Kubernetes cluster which is a very tedious activity. This is not even taking care of scheduling. This is not taking care of task management. We haven't implemented retries. We haven't implemented timeouts. And most importantly, as I even mentioned before, we haven't implemented observability. Let's say someone wants to understand what actually went wrong in 20th run or probably the

Segment 2 (05:00 - 10:00)

probably what actually went wrong in the agent run that took place 10 days ago. We don't have the observability data which is a primary drawback if you want to build a production ready agent. This is exactly where flight can help. Let's first see how to install flight. As I told you, it's super easy to set up a cluster. It just takes a single command. Once you install and configure flight, then I'll show you how you can update the existing agent. That's the best part. If you have an agent written in Python, you don't have to rewrite it with flight. You can just update your existing workflow with the flight configuration. Let me also show you that. But first, let's set up flight. Basically, we can install flight using Python pip. So, it's always a good practice to create a virtual environment first. Let me call this as doflight. I'll source the virtual environment using the source command dotflight bin activate. Perfect. So now we have the virtual environment activated. We can just run the command pip install flight TUI the terminal user interface. Of course, it's good to put this in a single quote. And we should have flight installed. Now there is only one prerequisite. You should have Python above 3. 10. So if your Python version is less than 3. 10 just try to upgrade your Python version. Cool. So we have flight installed. Now you just need to run this command. Flight start devbox. Now what exactly this command does within a docker container it will spin up K3S cluster and within the K3S cluster it will have the flight installed for you. So basically you will have a flight cluster ready on your local machine. So for this you should have docker running. If you don't have Docker just install Docker desktop on your Windows or Mac and it should absolutely work fine. Even on Linux you can install docker because it just needs a docker container. Now let's see by running flight start devbox. So there you go it is pulling an image and within next 30 seconds to 1 minute we should have the cluster ready. So we have the flight UI running on localhost 30,80 and image registry is also configured. We can access the user interface and we can configure flight from there. Let me quickly open and show you. So this is the user interface again running on port 30,80. So from here you can choose the domain. Let's say you're working on different environments, development, production, staging, you can switch between the domains. In fact, even if you're working on different projects, you can start setting up your project and switch between the projects as well. However, I prefer using CLI for configuring the domain as well as the projects. So, I'll go back to the CLI and quickly run this command flight create configuration. So, this is going to create config on the endpoint localhost 30,80. I'm creating a project name called flight snacks. It's up to you can create any project name. for example, Abhishek and the domain is development builder local and we are establishing an insecure connection because right now it is running with HTTP. Now as soon as I do this you will notice a local configuration file is created. This configuration file is updated with the information. Let me quickly open this for you. So there you go. This is the endpoint insecure connection domain name and project name. And whenever we have the first run, so when I trigger the agent through flight, you will also see the information reflected on the UI. So as soon as we have the first agent run, you will notice a project is created and within the project you can also see the runs. We are going to do that. But before that, what I want to show you flight is also very secure. So previously when we were using langraph and Google gemini agent without using flight we were using the Gemini secret the Google API key through environment variable in flight because this is a kubernetesdriven setup you can create a kubernetes secret and you can store the geminina API key or it can be openai AI key whatever it is you can store all your sensitive information within the secret and it's also super easy to create. You just have to run the command slide create secret followed by the name of the key in my case Google

Segment 3 (10:00 - 15:00)

Gemini API key project and the domain. Now it will ask you to enter the secret value just provide the Gemini API key whatever you get it from the Google AI studio just paste it here and your secret is stored. Now let's verify if the secret is created. We can just run command flight get secret. So we don't have to get into the container and interact with the Kubernetes cluster. We can actually do that using the flight CLI. So now whenever we need Gemini API key, we can just reference this flight secret. Nice. Now that we learned how to install and configure flight, let's focus on some core concepts or building blocks of flight. In fact, we will use some of these concepts to transform our weather agent. We already learned about projects and domains. Just for quick revision, you can set up a flight cluster. You can use it across the organization. You can use domains to switch between dev, staging, production environments. And you can use projects to switch between different projects within your organization. So from administration point of view, projects and domains are very helpful. Then there is an important concept called task environment. Personally, I feel this is the most important concept of flight. I'll make it very simple. When you trigger an agent run within flight, a container is set up for you. Right? So when you use Devbox, a Kubernetes cluster is set up for you. Within the Kubernetes cluster, you have the flight. When you use flight to trigger an agent run, a container is set up for you which is running within the Kubernetes cluster. But you have to define the configuration of the CL container. Do you want an Ubuntu container or do you want it with certain packages? Do you want to restrict resources of the container? All of that can be done using the task environment. Then there is something called task. like task are the Python functions that execute within the container. So within your agent file you have different Python functions. The Python functions that execute within the container are called as task and we will decorate those functions using the task environment. Then there is something called runs and actions. This is a flight tracks and manages your execution. So obviously you're going to run the agents. So what flight does it uses the runs and actions to manage the executions of these tasks that you're going to execute. Finally there is something called apps. So apps are basically the longunning services for APIs, dashboards and inference endpoints. So these are the longunning services. You can learn more about these core concepts using the flight documentation which I'm obviously going to share within the description. But for the flight demonstration that we are going to perform, we will use task environment task and runs. Obviously projects and domains we have already configured that in fact let's see that in action. You know for the purpose of demo to make our weather agent to production ready agent I duplicated this file I named it to weather agent with flight. py Pi and trust me I just had to make two simple changes to the file. One I had to define the task environment. As I told you task environment is a crucial component. So I defined the task environment. First I imported flight. Then using flight. task environment I started defining configuration for the container. So I named the agent as langraph Gemini agent. Then I'm using the secret Google Gemini API key. If you remember, we created this on the Kubernetes cluster. I'm not providing the value here. I'm just referencing that secret that we created. So it's completely secure. And finally, I'm providing the list of dependencies for the containers. Basically, for this agent to run, we need lang chain, we need langraph, we need request, we need sentence transformers. I'm providing all of that as dependencies so that once the container is started it has all the dependencies installed. Now once we define the task environment of course we have to use it for any of the tasks. I'm using it for the main task. So main function is responsible for the complete invocation. So what I've done I've decorated the main function using env. task. So now if I just execute this function basically what is a task? Any function that executes within a container in flight is called as a task. So now this

Segment 4 (15:00 - 20:00)

main function will be executed as a task within the flight configuration. Let's see that. Now I'll just run this command flight run weather agent with flight and let's provide a simple prompt even in this case hyphen prompt hi what is the weather of Hyderabad on 20th of September 2020. Now you will get almost the same response. But what's important here how flight transforms this agent into a production ready agent. If you see we have a URL here. So I'll just copy the URL. I'll paste it here. You will notice see you can start tracking the execution. You can also track the output. Okay. It says I'm sorry I cannot fetch the weather forecast. There is some intermittent issue. The good thing is that in flight you have an option to rerun. So you just have to click the rerun whenever you are running into intermittent issues or API issues. You don't have to go back to the terminal. trigger the run again. You can do it from here. You can also check the logs. End of the day, this is a container that is running within a Kubernetes cluster. So you can check the logs. You can also check the Kubernetes events. If there is some issue, you don't have to go back to the Kubernetes cluster. You can track the events from here and you can see if something actually went wrong. You can also look at the task configuration. So this is the same task configuration, the flight internal configuration, the main function that we triggered. Further you can also head to the task and you can see the complete task configuration. But Abishek where do I see the previous runs? So to the left side you have an option for runs. So these are list of runs. Let's say you want to go back and see the run that was executed 4 days ago. You can go there. You can see the logs. What is the status of the run that was executed 4 days ago. So this is how monitoring also works. So this is a very useful telemetry information. You can also set up triggers to flight. So let's say you want to schedule your agent or agent runs, you can do that. Scheduling has become a very important thing with agents these days. With flight, just create the configuration same way that we did before. In your Python file, that is in your agent file. Once you import flight, just add this one line here. Right? Once we define the task environment, immediately you can define flight. trigger. ly so that you can schedule your agent runs. Cool. So now let's go back to the runs and see what is the status of our rerun. Perfect. Now this time it is successful. So in case of intermittent failures, you can just go back and run the rerun option. And finally, if you want to track the advanced metrics like let's say the CPU kota or memory kota, this is only part of the enterprise plan. You can easily upgrade to enterprise like let's say you start using flight and you like the AI orchestration part of it. You can just use this upgrade to enterprise and you can go with one of these options. the team or the enterprise option. Of course, you can chat with the union team members and see what fits the best for your organization. I'll add more details in the description like how to get started with union, flight. you will find all the useful links in the description. Cool. — And I also want to introduce to you a very interesting option. So if you head to the flight documentation, so there is this option demo flight to in a browser like let's say you want to initially experience flight even without setting up dev box or even without uh setting up anything on your local machine, you can use this flight to live demo. So you have different options here. There are some basic agents. There are some advanced AI agents as well. So you can switch between these options and you can experience flight from the browser. For example, this is similar Langraph Gemini agent that we built. You can just run and track your executions from here. Anyways, so to sum it up, agent ops is something very cool. These days, everyone is working on building agents. But end of the day you have to get these agents to production and that is exactly where slide comes into picture. As you have seen it hardly takes two to three configuration changes to bring your existing agent to a production ready agent with flight. Just go through the flight documentation after watching this

Segment 5 (20:00 - 20:00)

video and you can learn more about the flight core components. Once again, I'll share all the useful links in the description as well as the pinned comment. Please go through it and I hope you found this video insightful. I hope you learned something about agent ops and this opensource achation platform flight. If you have any questions, do let me know in the comment section. Thank you so much for watching the video. See you all in the next

Другие видео автора — Abhishek.Veeramalla

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник