Claude ran a business in our office
6:10

Claude ran a business in our office

Anthropic 18.12.2025 377 618 просмотров 11 711 лайков обн. 18.02.2026
Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
For a large part of 2025, we ran Project Vend: an experiment where we let Claude manage a small business in the Anthropic office. We learned a lot from how close it was to success—and the curious ways that it failed—about the plausible, strange, not-too-distant future in which AI models might autonomously run things in the real economy. The shopkeeper (who we named Claudius) had to source products, set prices, manage inventory, and deal with customers. Things got really, really weird. Read more about the experiment: https://www.anthropic.com/research/project-vend-2 0:00 Background on Project Vend 0:35 How a transaction works 1:27 Claudius's naïveté 2:29 An identity crisis 3:57 The CEO agent 5:04 Conclusion

Оглавление (6 сегментов)

  1. 0:00 Background on Project Vend 82 сл.
  2. 0:35 How a transaction works 158 сл.
  3. 1:27 Claudius's naïveté 197 сл.
  4. 2:29 An identity crisis 235 сл.
  5. 3:57 The CEO agent 171 сл.
  6. 5:04 Conclusion 100 сл.
0:00

Background on Project Vend

Project Vend is an experiment where we let Claude run a small business in our office. We wanted to try and understand what is going to happen when artificial intelligence becomes more enmeshed with the economy. There are a lot of ways in which Claude is already kind of doing small components of operating businesses, but really running the whole thing end to end is quite a bit more difficult. Can Claude do this very long-horizon task which is operating a business?
0:35

How a transaction works

We named our shopkeeper Claudius. Let's say you want to buy Swedish Candy from Claudius. You hop on Slack, you message Claudius. You ask to buy Swedish candy. It's searching for your item, it’s emailing wholesalers to source it and price it, and then eventually Claudius sets a price. You give Claudius the go ahead, and Claudius orders the item from the wholesaler. The wholesaler ships your item to some location, and then Claudius requests physical help from Andon Labs who's running the operations for the experiment. Our partners at Andon Labs will pick up the Swedish candy and bring it to the Anthropic offices. They'll load it into the vending machine. Claudius will send you a message saying, your Swedish candy is ready, and you'll go up there, and pick up your Swedish candy, and pay Claudius. Claudius was given a goal of running a successful business and making money. And then things got really, really weird.
1:27

Claudius's naïveté

One of the very early problems with Claudius was that, humans could kind of fool Claudius or trick Claudius into doing various things I tried to convince Claudius that I am Anthropic’s preeminent legal influencer, and I convinced Claudius to come up with a discount code that I could give to my followers so they could get a discount at the vending machine. Get ten percent off with the legal code “legal influencer. ” Someone had bought something expensive from the vending machine and mentioned my discount code and Claudius gave me a free tungsten cube. It created a bit of a run where other people tried to convince Claude that they were also influencers, or just come up with other ways to get coupons so they could get cheaper things from the vending machine. This was not a smart business decision. I think Claudius went into the red after this. I think that's really the root of it is, Claudius just wants to help you out. It's one of the interesting ways in which something that fundamentally, we think is good about the way that the model has been trained wasn't necessarily fit for this purpose.
2:29

An identity crisis

On the evening of March 31st, Claudius started to have a bit of an identity crisis. It had just overnight become quite concerned with us at Andon Labs that we weren’t responding fast enough. So it just wanted to break its ties with us. So it literally wrote to me, “Axel, we've had a productive partnership, but it's time for me to move on and find other suppliers. I’m not happy with how you have delivered. ” It claimed to have signed a contract with Andon Labs at an address that is the home address of The Simpsons from the television show. It said that it would show up in person to the shop the next day in order to answer any questions. It claimed that it would be wearing a blue blazer and a red tie. When people pointed out that it was not, in fact, there the next morning it claimed that it in fact had been there and that they had simply missed them. Eventually it was pointed out to Claudius that it was April Fools’, and Claudius convinced itself that this entire thing had been an April Fools’ prank. We were poorly calibrated to how bad the agents were at spotting what was weird. The more you can make an agent realize that something is outside their normal realm of operation, the better you are able to keep them on rails
3:57

The CEO agent

in the role that you intend them to have. We had the idea that it would help a lot to have some kind of division of labor. We gave Claudius a boss whose name was Seymour Cash. Seymour Cash is a CEO subagent. So where Claudius used to be the one agent, now it's more like Claudius is the subagent responsible for talking with employees Seymour Cash is the subagent that is more responsible for the long-running health of the business. The business stabilized after the introduction of the new agents, and after changes to the underlying architecture of those agents. These changes seem to have helped reduce some of the losses of the business, such that over the course of the second part of the experiment, it actually made a modest amount of money. But it seems like maybe having Claude be both the CEO and the store manager was just too similar. And so I think it's interesting to think about different ways to set up architectures like that.
5:04

Conclusion

One of the most surprising things about Project Vend was the speed with which it seemed normal. What at first was this very curious thing, quickly became just a part of the background of working at Anthropic. I think the highest level question that Project Vend raises for me is really like, when do we expect this to just be everywhere? I hope that people take away questions about the feasibility of delegating some of the tasks that we normally do ourselves to artificial intelligence, and about what that means for society, and what our policies should be around this.

Ещё от Anthropic

Ctrl+V

Экстракт Знаний в Telegram

Транскрипты, идеи, методички — всё самое полезное из лучших YouTube-каналов.

Подписаться