Multi-Agent AutoResearch with Open Source Models
9:10

Multi-Agent AutoResearch with Open Source Models

HuggingFace 27.04.2026 17 293 просмотров 724 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
In this video, we walk through a multi-agent setup of AutoResearch using open source models and OpenCode. Timestamps 00:00 - Introduction: Multi-agent AutoResearch setup 01:07 - Agent Roles: Researcher, Planner, Worker, and Reporter agents 01:55 - Repository Setup: Exploring the repo structure and files 02:11 - Environment Configuration: Python setup, UVSync, HF Hub login 03:20 - OpenCode Interface: UI walkthrough and agent configuration 03:51 - Running the Experiment: Autonomous research pass execution 05:17 - Sub-agents in Action: Planner and Reviewer agent collaboration 07:08 - Trackio Metrics: Tracking training efficiency and job monitoring 08:23 - Hugging Face Hub: Jobs running on HF infrastructure 08:58 - Conclusion: Try it yourself with OpenCode Links: Repo: https://github.com/burtenshaw/multiau... Twitter post with learnings: https://x.com/ben_burtenshaw/status/2...

Оглавление (10 сегментов)

Introduction: Multi-agent AutoResearch setup

Hi, in this video I'm going to walk you through a multi-agent setup of AutoResearch. I use open source models and an open source code harness, OpenCode, and I implement AutoResearch, which is a project by Andre Carpathie where he takes a nano GPT project, an LLM in its simplest terms, and optimizes the code with a code agent. He uses Claude Code and Opus 4. 6. and the agent makes small improvements to the training script over time optimizations that improve the efficiency of the training run as we can see in this figure here or after 600 or so experiments the bits per byte are at the lowest point in the experiment so the agent was able to improve the efficiency of the training run i thought that was really cool but one thing that struck me was that the same agent was doing all of these different tasks and what you could do is define specific roles for those agents and maybe then you would make the task slightly easier and you could use open source models by doing that. So these are the definitions I set up. The first

Agent Roles: Researcher, Planner, Worker, and Reporter agents

one was a researcher. The job of the researcher agent would be to find papers and they would use HF papers to do that and then take improvements from those papers and propose hypotheses. And then there would be a planner agent and this agent would maintain an experiment queue that was basically those improvements. So adjust the learning rate, try a different optimizer, these kinds of things. And then there would be a set of worker agents and their job would be to pick up those hypotheses and define training scripts or patch the original training script and then execute those training scripts. Finally, there will be a reporter agent that would collect all the results from those jobs and report those back so we could see if the experiment was working. Right, that's what I implemented. So let's go and take a look.

Repository Setup: Exploring the repo structure and files

First, I'll just start off in the readme. So this is the repo. I'll attach this with the video and the blog post and everything so you can try it out. I'll also show you how to set this up just to begin with. So let's create a new terminal to do that.

Environment Configuration: Python setup, UVSync, HF Hub login

So in the readme you see that the repo is quite simple It just has an original training script some files to keep the results in so like a research live master. json and research results. tsv, and then a series of instructions in skills and the open code format to define all of the sub-agents. So all of the sub-agents I just described are implemented in the repo, so you can take a look at those prompts. and they're all mapped from the agents MD file. So as soon as you open code in the repo, you'll get access to this setup. So if you go to the terminal, you basically need to set up your Python environment. I've already done that, but you can use UVSync. And then you'll just need to log in to the Hugging Face Hub, which you can do with hf-auth-login. And then you'll need to do open code-auth-login. and it will ask you to select a model provider. You can go down to hugging face and then it'll ask you for your API key. I've already done that, so I can just start open code.

OpenCode Interface: UI walkthrough and agent configuration

And it opens in this nice UI. And then you'll see here that there's this build definition. And so you need to go to agents, switch agent. And then if you open it in this repo, you'll see here an agent auto lab and it will have the configuration that we want. If you stick in the build agent, that's just a standard agent, and then plan is like a planning agent that you've seen in most code harnesses where it won't touch the code. Okay, let's just close that terminal now and switch back to this one.

Running the Experiment: Autonomous research pass execution

This terminal is where it's already run. It's quite a long job, so I'm just going to go through the trace of it. It's actually still going, I think. So this is the original prompt, and in this original prompt, you see it says, run one autonomous local auto research pass in this repo, use planner, use review. It specifies how to use HF cash, which is a utility that we have in the repo. We just defined a shared cash between the jobs, and that's a bucket on the Hugging Face Hub. It means that when the jobs are running, they don't need to upload and download all of the assets. They can just share the same bucket and swap it from one job to another it tells it to start five jobs at once and it tells it not to stop until it has a full pass of successful experiments I noticed with a lot of open source models they have a tendency to stop They have less of a long running ability, and sometimes they just need a bit more prompting in order to keep running like that. I think that's probably different between harnesses and configurations and things. So just prompting them can kind of get around that. Okay, and then you see that it does the utility, so it gets all the training script that it needs to do with a certain utility that's in the repo. It goes through authentication. And then you see here that it starts off a planner task and then a reviewer task. So let's take a look at that.

Sub-agents in Action: Planner and Reviewer agent collaboration

So now we've hopped into a sub-agent. And in the sub-agent, we see that we've got the prompt. You are the planner agent for Autolab. propose up to two fresh single change experiments. So as I described, the planner is acting as a planner and it's defining a set of experiments. Now all this definition comes from the repo. There's templates for these markdown files. So it's acting based on a consistent format and then it defines this task. Inside OpenCode, we can switch between agents. It's got quite a nice interface for that. And now you see we're in a reviewer agent for Autolab. So in this time, it's been given a master hash, the previous value. set of failed experiments there and a set of successful experiments, and it needs to review them. You can see it's thinking, it's repeating back the same kind of thing. And you see, OK, so this experiment here with a lower learning rate was a win, improved win. okay these ones and these ones the scheduler they were actually they were failed runs that they increase it decreased the efficiency of the training run so it's going to ignore them that's pretty cool it's acting like a reviewer okay and then it comes up with like a set of priorities now of what it's going to do next based on what it's learned that's cool and then who and then it hands that back now to the planner again Okay so you see later on the planner will get that same run and then do another run So if I go up I go back to the parent and that's the main thread and you can see that actually it's still working now. So it's still running and we've improved it. Now I'm going to switch over to

Trackio Metrics: Tracking training efficiency and job monitoring

Tracheo and show you the metrics. So Tracheo if you don't know is an open source tool for tracking metrics. You can track any metric but it's mainly based for machine learning. And so it's really handy here because we've got so many agents running at once that we can't necessarily pick through all of these traces and we want this exposed. So what you see here is for example all the active experiment jobs running up, there's eight at one point in time, all of the active jobs at all, any kind of anomaly counts here so anything that goes wrong all of this is based on alerts so you can also have specific alerts for anything that these sub agents are doing. And this is the kind of most important one for this experiment. This is the best delta versus master. So these are the main differences between the original run, like the original score for bits per byte and the version that we're working at the time. So you can see this experiment got down to its lowest and then the agent struggled to make any more improvements at that point. And yeah, there's various other metrics here, like another plot of Delta versus Master. And all of the runs land here. We also have tables, reports, and other information.

Hugging Face Hub: Jobs running on HF infrastructure

The next thing I'm going to show you are the jobs on the hub. So these are the jobs that have been run. You can see some of them have failed. Some of them were cancelled by the agents. But they've all been tagged as Autolab. and they've all been given like specific hypothesis tags i guess that the agent was using in order to track them and they're all running here on hugging face and you can see them there that's pretty cool yeah okay so that's it so to end i'll share the repo and the blog post

Conclusion: Try it yourself with OpenCode

and this video all together i'd recommend you go and try out for yourself open code is available and really easy to use and it just plugs into Hugging Face Informs providers and it's pretty fun to use.

Другие видео автора — HuggingFace

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник