Build a Multi-Agent System with ADK, MCP, and Gemini

Build a Multi-Agent System with ADK, MCP, and Gemini

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI

Оглавление (5 сегментов)

Segment 1 (00:00 - 05:00)

Welcome back to the agentverse. In our previous missions, we trained as developers to forge code, as data engineers to structure data into actionable intelligence, and as platform engineers to secure the infrastructure. Check the description to see those other videos. A single agent, no matter how secure or well-coded, is just a soloist. In the modern enterprise, we don't build features, we build ecosystems. Today, we'll focus on the architects, the summoner. Building a single agent that tries to do everything is a trap. It's unmaintainable. It hallucinates and it doesn't scale. Your mission today is to design a distributed system where multiple specialized agents, your familiars, collaborate to solve problems that no single model could handle alone. First, we'll create a decoupled tooling ecosystem. We'll use the model context protocol to separate our tools from our agents, deploying both declarative database connectors and imperative API wrappers as independent power sources. Second, we'll summon the familiars, a legion of specialized agents. We'll use the ADK agent development kit to implement distinct architectures. Sequential for precision, parallel for speed, and loop for persistence, giving each agent a unique technique for the battlefield. Third, we'll establish the command locus, an intelligent orchestrator. We'll use the agent-to-ag protocol to turn our local agents into discoverable microservices, allowing our orchestrator to dynamically find and command them across the network. Fourth, we'll impose the laws of magic, enforcing governance's code. We'll construct sophisticated callbacks and plugins that intercept agents thought processes to enforce business rules like cooldowns and rate limits before they even act. And finally, we'll bind the echoes of battle, giving our orchestrator agent memory and state so that it knows when to summon each familiar without double dipping. This is how you build a system that scales. I'm not going to show every command, but the lab with all the directions is linked in the description. Make sure you don't skip all the setup steps. Summoners, let's architect. Before we summon any agents, we have to set up their sources of power. In a single agent implementation, you might build tools directly into your agent script. But for our multi- aent system, that's no bueno. Instead, we'll use MCP tool servers. The model context protocol, MCP, is the standard that lets agents discover and use external tools. We do this for a few reasons. One, it decouples tools from your system. If your database schema or an external API changes, you just update the MCP server and you don't have to worry about changing anything in your agent. Two, it helps accelerate agent development by allowing teams to build in parallel. And three, it's reusable. Once an MCP server is up and running, multiple different agents can connect to it and use it. No copy paste slop needed from one agent to another. We'll use two distinct patterns to build these three separate servers. imperative for our external API and general function servers and declarative for our database toolbox. First, the imperative pattern for the tools which explicitly defines the logic step by step giving more control and flexibility for our external API service to connect with an MCP server. We'll write the code in Python. We define these two functions cryo shatter and moonlight cascade. Then we bring them together in two other functions, list tools and call tools, and wrap those in an MCP decorator. When your agent is connected to this, it can ask for a list of all the available tools and then call the available tools. This second function is the workhorse. When the agent decides to use a tool, it sends a request to this endpoint with the name of the tool and the arguments like run cryo shatter, our code looks up the tool in our available tools spellbook, executes it with run async and returns the result in a standard MCP format. So first wrap the two functions, then list and call them. This pattern gives you total control. It's ideal for wrapping thirdparty APIs, complex calculations, or legacy systems where you need to massage the data before the agent sees it into a simple reusable tool. Next up, the general functions MCP server, the Arcane Forge. It's pretty

Segment 2 (05:00 - 10:00)

much identical to the external API server, but it has a different architectural intent, if you will. The code inside of these functions is self-contained. There are no network requests. It performs deterministic calculations. So math instead of wrapping your API, it has the custom logic coded in. Here are our three tools. You can see the calculations multiplying by a power of three. And here are the MCP decorators wrapping those. Creating a dedicated functions or utilities tool like this is a best practice. It encapsulates your custom logic, makes it reusable for any agent in your ecosystem and keeps it decoupled from your data sources and external integrations. The second pattern is the declarative pattern. This is for our third MCP server. If I want to connect to an external database, how do I do it? For our librarium, our cloud SQL database, we don't write a single line of connection code. We use the MCP toolbox for databases. This is a Googlebuilt open-source MCP server for databases. It handles the complex work of running the server, reducing the amount of code we need to write and maintain. And for this, we'll simply define our data source and tools in a YAML configuration file. We tell the toolbox, here are my connection details for the database, the project region, and instance. Make sure you update your project ID here. And here are the tools available. And here's the SQL query that access the logic. Notice that the query handles the data retrieval while the parameters definition handles the input. And finally, the tool set. This is where individual tools are grouped together. This allows our agent to connect and load the entire library of knowledge in a single efficient [clears throat] command. When deployed to Cloud Run, the toolbox then automatically spins up a secure production ready server that exposes those SQL queries as safe callable tools. We can check on the deployed servers in Cloud Run. There should be a green checkbox that shows that they're running correctly. Then you can test out your MCP server tools with the diagnostic agent in the lab. This is the power of decoupling. Your data team manages the SQL queries in the toolbox. Your backend team manages the API integrations. And your agents simply consume them as services. This architectural pattern promotes scalability, security, and maintainability. With our MCP servers deployed, we turn to the agents themselves, our familiars. A common mistake is asking one LLM to figure out a complex multi-step process. Instead, we'll use the ADK to bake architectural patterns directly into the agents workflow. So, we're going to build three distinct agents, sequential, parallel, and loop. Each chosen for a specific architectural purpose. For each one, we'll define a strict operational territory, connecting them only to the specific MCP tools they need to perform their job. First, the fire familiar. This is a sequential agent. It's designed for linear dependencies. In our lab, the fire familiar scouts the database for a spell, passes that data to the next step, and amplifies it. The output of A becomes the input of B. Step A is to scout the database for a spell's damage by connecting to our database MCP server tool. And step B is to amplify that damage using a calculation tool, the general functions MCP server tool. Each of these LLM agents, the scout and amplifier agents are sub agents defined within our sequential agent, which is our root agent. The root agent is always the first agent called in a process and acts as the runner of everything within that agent. By using the sequential agent, we can ensure that first the spell will be found and then amplified to form an attack. Second, the water familiar, an agent that runs a parallel workflow. Sometimes you need to query three different APIs or check four different compliance documents. Doing this one by one is slow. So you do it in say it with me parallel. The parallel agent uses a fan out fan in pattern. It launches two agents simultaneously. the Nexus channeler which is made to invoke the cryo shatter and moonlink cascade spells by connecting to the external API

Segment 3 (10:00 - 15:00)

MCP server and the forge channeler which is made to invoke the Leviathan surge spell and amplify it by connecting to the general functions MCP server. They execute independently and then a power merger agent awaits for both to be finished before synthesizing the result for a final attack. The efficiency is overwhelming. The assembled workflow looks like this. Once again, our root agent is a sequential agent that says, "First run the channel agent, our parallel agent defined right above with the two sub aents listed, and then run the power merger to merge the results together. This is a typical merge pattern with both a sequential and parallel workflow to deliver the most efficiency. Keep in mind parallel is using more resources so it can be more costly and more complex to design. Third, the earth familiar a loop agent. Some tasks require iteration. So we'll configure the loop agent to iterate until a specific condition is met or a max iteration count is reached. Looping until the job is done. In the lab, our Earth familiar charges its power repeatedly through the charging agent, calling the seismic charge function from our general function MCP server to accumulate energy. Then the check agent will check a condition, the amount of accumulated energy after every cycle to see if it has enough energy to attack. It'll unleash the energy when it has reached the last iteration. Here's our workflow. The root agent in this case is our loop agent where the charging agent runs first then the check agent and completes after a max of two iterations. Now let's see one of these agents in action. We can test an agent using the ADK web UI. First we run our agent. Then from the dropown we can select the earth familiar and prompt it. We can see it call the seismic charge a few times with the damage units increasing until it's ready to release the attack. And in the events tab, we can see exactly which agent and tools were called and when. Our workflow agents are ready to go. Sequential for precision, running things in a fixed order. Parallel for speed, running things simultaneously and then synthesizing them. and loop for persistence running things until you're satisfied with the output. Now, you've probably noticed an LLM agent in every familiar as sub agents. This type of agent relies entirely on an LLM to reason its way through a task, trusting it to call the right tools when needed. So, for our familiars, we use this LLM agent because we need a certain level of intelligence to make decisions in order to complete the task and pass along the output. And surprise, we have a new friend. The hierarchical routing workflow. This is a probabilistic workflow. It analyzes the situation and decides which path to take. For example, if you arrive at an event and tell the front desk which talk you're registered for, the person there will route you to the correct room. Now, here is where this hierarchical routing comes in. If we ran all these agents in one massive Python script, we'd have a monolith. If the fire agent crashed, the whole system would go down. So instead, we have one summoner agent, which acts as the orchestrator for all of our other agents. This agent doesn't perform any business logic itself. It acts as a strategist, routing tasks to the appropriate agent as requests come in. For this summoner agent to work, we have to transform our local scripts into discoverable microservices using the agentto aagent A2A protocol. The A2A protocol is the core architectural pattern that elevates a standalone agent into a discoverable network addressable micros service enabling a true society of agents. First, we wrap each familiar in an A2A library, which automatically creates an agent card and an A2A web server endpoint. Then, we deploy them to Cloud Run using a cloud build pipeline. This transforms them into a public web service that we can access. An agent card is like a business card. It describes who the agent is and what it does. Here's my name. These are my skills. And here's the URL that you can reach me at. That URL is the ATA server, the dedicated endpoint that hosts the agent and listens for incoming commands.

Segment 4 (15:00 - 20:00)

But how does our summoner talk to our familiars? It doesn't have their code. It doesn't know their prompts. It uses service discovery. Let's build our final agent. First, we establish a connection to our deployed remote agents using the remote A2A agent object. When this code runs, it performs the service discovery action, an HTTP get request to the provided URL to download the agent card from the remote server. The orchestrator agent holds these variables as sub aents. It points to the URL, reads the agent card, understands the capabilities, registers them as available tools, and then decides which agent to use based on the incoming request. This means that you can update any of the agents logic, redeploy it, and the summoner doesn't need to change a line of code. You simply restart it, and it works because the agent card automatically updates. This is how you scale teams independently. So now we can communicate with all of our agents to have them defend us during an attack. But even our magical agents need a little time to recover or they'll tire out and become defenseless. In a production system, we'll often implement a threshold to limit overflow and you enforce this with code. We call this the interceptor pattern where we intercept an agent's normal execution flow to run our own custom logic. And we can do this without changing the agent's core code. There are two main ways we can do this with the ADK. Callbacks and plugins. A callback is a function attached to a single agent. Good for quick and specific modifications. It can be put before or after the agent, the model, or the tools to run custom logic. A plugin is a class that's more powerful and reusable and can be applied globally to affect every agent running in a system. We'll cover this in a bit. First, we'll put a cool down call back into our Earth agent. This is a great way to prototype and debug a rule we'd like to put in place since it's only on a single agent. In the Earth agent. py, we add a check cooldown function that checks to see if the agent is on cool down. If it is, it terminates the run and if not it updates the cool down timestamp and lets the run continue. Now to apply this, it's very simple. We paste our before agent callback into our Earthroot agent. Then you can go ahead and verify that it's working either in the command line or the ADK web UI. Now that we know our callback works, let's expand it to apply to our fire and water agents, too. We could copy paste the same code into the files, but that's very repetitive and inefficient, don't you think? So, we'll implement a cool down plugin. This is a Python class attached to the ADK runtime. It sits outside the agent's cognitive loop. The code is essentially the same as our callback in that it uses our before agent callback code. When the summoner agent tries to call any of the familiars, the plugin intercepts the request and does the same thing. It checks an API. Has this agent been used in the last 60 seconds? If yes, the plugin blocks the execution and the LLM never even wakes up. To implement this, we remove the agent specific callback from our Earth agent and then attach our new plugin to the runtime in our A2A entry point script. Now we can redeploy our agents and test it out. I'll do a first summon and it runs successfully. Now, within 60 seconds, I'll summon it again to test the cool down. It's telling me I have to wait before I can use it again. This use of callbacks and plugins protects your APIs from rate limits and avoids unnecessary spending. Finally, our summoner agent has to learn if the fire familiar is on cooldown, it's a good idea to stop calling it and to use a different attack. advice that some humans could follow. In order for our summoner to know that, we have to give it the gift of remembering. This elevates our agent from a tool caller to an intelligent conversational partner. We need our summoner to remember the last familiar that was summoned for an attack. This is considered short-term

Segment 5 (20:00 - 22:00)

state and exists only for the duration of the current session. We'll implement short-term state using an after tool call back. This is a function that runs after the tool has been successfully executed. Every time a tool is used, this callback silently updates the summoner session state. Last summon fire elemental. Imagine it's like a sticky note that you write on to remember something and then the next time the summoner runs, it looks at the sticky note. Oh, fire was just used. We then update the summoner system prompt to include a directive. Don't summon the same familiar twice in a row. Let's verify our new logic is working. We'll run our summoner agent and ask it for an attack. It correctly summons the fire agent as it's the best for this weakness. Then when we say hype is still standing, strike again. Our summoner agent knows the last agent was fire and that it shouldn't call that again. So instead, it'll call for water or earth. Success. Finally, you can deploy your Summoner agent to Cloud Run and verify that its A2A endpoint is live and correctly configured with a curl command. You've completed your mission. This proves that your architecture is a success. Now you can test out your knowledge with the boss fight in the lab. Remember to clean up your resources to avoid any unwanted costs. As an architect, you take individual features and design ecosystems. Today, we've created a multi- aent system with decoupled tools using MCP, crafted agents with specialized workflows, and turned them into reusable components using the A2A protocol. Plus, we demonstrated the importance of governance with callbacks, plugins, and memory. We've moved beyond a simple chatbot and built a resilient, scalable, multi- aent system. Check the link in the description to run this lab yourself. And check out the other paths to see how your architecture influences the data engineer, the developer, and the platform engineer. Summoner salute.

Другие видео автора — Google Cloud Tech

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник