# Understanding the Most Viral Chart in Artificial Intelligence | Odd Lots

## Метаданные

- **Канал:** Bloomberg Podcasts
- **YouTube:** https://www.youtube.com/watch?v=Hx5dIJ3H8p4
- **Дата:** 25.04.2026
- **Длительность:** 1:01:24
- **Просмотры:** 45

## Описание

METR, which stands for Model Evaluation and Threat Researc, is focused on understanding the degree to which AI models can engage in autonomous, complex tasks. METR see this is as a particularly important benchmark, given the risk that AI could one day be engaged in recursive self improvement, taking humans out of the loop. But how do you really gauge a model's ability to do complex problems. And what is being measured for exactly? On this episode we speak with METR's President Chris Painter as well as Joel Becker, a member of the technical staff who works on evaluation methods for the organization. We discuss both the mechanics and the philosophy of METR's work, and what it means when we see a a chart showing that Clause Opus 4.6 can do a task that would take a human nearly 12 hours.

Chapters:
00:00:00 - AI Productivity Discussion
00:03:41 - Understanding Time Horizon Charts
00:05:09 - What is Meter?
00:06:46 - Safety Mission vs Public Perception
00:09:26 - How Time Horizon is Actually Measured
00:11:27 - Human Baseline Methodology
00:13:24 - Task Distribution Focus
00:16:54 - Human Baseline Sample Sizes and Challenges
00:19:10 - 50% vs 80% Success Rate Charts
00:23:20 - Investment Interest and Public Information
00:25:29 - When to Worry About AI Autonomy
00:26:26 - Current AI-to-AI Collaboration
00:29:06 - Task-Specific Performance Variations
00:32:12 - Industry Dynamics and Safety Concerns
00:37:30 - Chinese AI Models Assessment
00:39:19 - Stakeholder Interactions
00:41:51 - Capitalism and Safety Tensions
00:44:25 - Compute Costs and Capabilities
00:46:03 - Baseline Methodology Criticisms
00:49:28 - Accelerating Progress Trends
00:52:40 - Team Size and Funding Model
00:54:44 - Talent Bottleneck and Research Priorities
Watch Bloomberg Vodcasts:    • Bloomberg Video Podcasts | Interviews & More  

Watch more Odd Lots episodes:    • Odd Lots  

Bloomberg's Joe Weisenthal and Tracy Alloway analyze the weird patterns, the complex issues and the newest market crazes. Join the conversation every Monday and Thursday for interviews with the most interesting minds in finance, economics and markets.

Join the conversation: discord.gg/oddlots

Subscribe to Bloomberg Podcasts: https://bit.ly/BloombergPodcasts

Check out more Odd Lots:    • Odd Lots (Audio)  

Get the Odd Lots newsletter: https://www.bloomberg.com/account/new...

And for all things Odd Lots, visit https://www.bloomberg.com/oddlots

#Investing #Markets #Finance #Bloomberg #Podcast #OddLots

Visit us: https://www.bloomberg.com/podcasts

For coverage on news, markets and more: http://www.bloomberg.com/video

Visit our other YouTube channels:
Bloomberg Television:    / @markets   
Bloomberg Originals:    / bloomberg  

## Содержание

### [0:00](https://www.youtube.com/watch?v=Hx5dIJ3H8p4) AI Productivity Discussion

More likely to be messy. In some sense. They are, they involve working with other people. They involve working in much larger code. It is all sort of more open ended problems, maybe with, something even adversarial going on in the, in the software engineering context that might be that someone's trying to make a change to the parts of the code base that you'll currently working on, and you need to work around that. And we do tend to see that the AIS, less capable of working on these more messy problems. I don't want to overstate that. You know, it's not an enormous effect. But, you know, that's one thing that gets in the way of, of these productivity increases, you know, and then I do think that there's something to the reliability question, right? Where, you know, if it was true that for a certain type of task you only had, you know, 80% reliability, then every time you're going to need to go back and verify the work of, of these areas and not only verify the work of these areas, but without the context of how they implemented the solution, relative to it. If you went to, if you went about the task yourself, you'd already have that in your head. And so this verification step, quote unquote, where it's would take less time. You know, I don't expect these frictions to, to be sort of so fundamental in some sense or I imagine they, they go up levels of abstraction. I think, not only is the underlying technical progress real, but, I think the productivity improvements are also going to, show up increasingly. But yeah, there are these versions. Hello and welcome to another episode of the Odd Lots podcast. I'm Joe Weisenthal. And I'm Tracy Alloway. Tracy, one thing about AI is that, lots of lines that go up. Yes. Famously, there is perhaps, one line that has captured the attention more than others when it comes to lines going up. Yeah, but, yes, but we're recording this, April 7th. Did you see the, anthropic revenue chart, by the way? Oh, it's just extreme. Yeah. Okay. On the number of lines going up. I mean, there are some. Some really. Let me caveat that up until recently, there was one chart of a line going up exponentially that became, I think it's fair to say, the most viral chart in AI, right? Yes, I would absolutely agree with that. So one of the many lines that go up and their various lines that sort of capture this is, essentially just measures of AI progress or what they could do, what the models are capable of and so forth. And, you know, there's all different benchmarks out there and hobbyist, benchmark creators, etc. all kinds of benchmarks out there. Organization called METR based out in San Francisco. And they measure how well AI models are doing at various sort of, engineering tasks, etc. and they have these charts showing how long, you know, certain tasks, how, that it would how long would take a human to do them. And then whether I could do them. And yes, the lines just almost, almost vertical. I think there was someone like one of the ones that came out maybe very early this year or late last year, showing the latest Claude model. And yes, this is crazy. So that one was really interesting because when I look at these charts, they're called Time Horizon charts. Yeah. When I look at them I like intuitively I kind of understand what they're saying. And you can kind of see the leap in progress between some of the previous models and cloud. Right. The latest cloud model. Yeah. And that's what got everyone excited, was you had this big exponential shift up in the capability of that particular AI model. But then when I start like diving into what it actually says on my website about what these charts represent, I start getting really confused.

### [3:41](https://www.youtube.com/watch?v=Hx5dIJ3H8p4&t=221s) Understanding Time Horizon Charts

I'm kind of the same. And like, I know everyone wants to get excited about AI and charts going up in general, but I think there's a lot of nuance here, and we should probably talk about it. Because the other thing going on with METR right now is they've become sort of the industry standard benchmark. And so a lot of investment decisions are being based on these charts. And if you oversimplify them as just like okay, lines going up. Yeah. And then suddenly it goes up even more obviously people are going to start to get like maybe a little overexcited. Can I say one other thing too, that I'm very curious about? Like, I'm really glad that there are people designing various benchmarks for measuring AI progress seems like an important thing to get a handle on. But like, if I were like, say like talented or smart enough to be like doing these things, I would go work for one of the labs and make $10 million a year or something like that. And so I'm actually curious to like nonprofits, etc. it's like, do you really like, want to be like working at the cutting edge of AI in a nonprofit? I mean, I guess open air is owned by a nonprofit, weirdly enough. But you know what I'm saying? Like, I would want the money. We should talk about it with our guests who are currently sitting right here. That's exactly right. I'm very excited to say we have the two perfect guests to talk about the most viral and maybe important chart in AI. Right now, we're going to be speaking with Joel Becker. He is a member of the technical staff at METR also going to be speaking with Chris Painter the president of METR So, Joel and Chris, thank you so much for coming on Odd Lots

### [5:09](https://www.youtube.com/watch?v=Hx5dIJ3H8p4&t=309s) What is Meter?

Thank you for having us. Yeah, really excited to chat with both of you. Why don't we just, start? Chris, since you're the president, I'll. I'll start with you. Like, what is METR or how long has it been around? What is this organization? What's it all just give us the sort of 60 second synopsis of METR Yeah, totally. I can try and, you know, sometimes I give it a long version. I can try and do a short version here. So METR is a research nonprofit based in the Bay area, like you said, dedicated to advancing the science of measuring whether and when I systems might pose catastrophic risks to humanity as a whole. Focus specifically on threats that come from AI, autonomy or AI systems themselves. So when you talk about this kind of this whole field in AI of dangerous capability evaluations, people seeing, can this AI system assist with a chemical or biological weapon attack, can it advance? Kind of like bad actors ability to execute cyber attacks on a really large scale METR is sort of specialized in specifically assessing how autonomous are AI systems? What is the scale and like length, and difficulty of tasks that they're able to do by themselves, partially because we think it sets the stakes for conversations about AI misalignment. So we sort of see ourselves as being on the hook for, at any given point in time, giving humanity the bits of evidence that are most informative for establishing the stakes of. Are we reliant on AI systems as a society in a way that could make it, really bad? If they are misaligned? I'm going to let Joe ask the question about why you're both working

### [6:46](https://www.youtube.com/watch?v=Hx5dIJ3H8p4&t=406s) Safety Mission vs Public Perception

in a nonprofit instead of one of the labs later, but one question I do have is when I think of METR you guys always come up in the context of these time horizon charts, and I, I don't mean this as an insult or anything, but I hardly ever hear anyone talk about the actual like safety aspect of your mission. Why do you think that is? Yeah. So I think, there's some distinction between our motive for assessing time horizons and the kind of, how it gets used then by the rest of the world, or kind of like what the origin of the rest of the world's interest in it for, for METR I think the, the reason that we, work on things like the time horizon charts is because if we're trying to establish the stakes for talking about you, could AI systems go rogue or one day, could they, like, try to take over and subvert human control? The three years ago, if you went back to around one METR started about four years ago and if you it was started by Beth Barnes, Paul Christiano. And this was kind of the initial, motive is if you went back then and you said, why don't I think that AI systems are going to go rogue and like, take over or overthrow humanity today? The kind of most intuitive, you know, you can come up with a lot of abstract reasons, debates about the goals AI systems might or might not eventually have, but the kind of most damning in the moment reason is the AI system just can't do much right. It doesn't make sense to talk about a question answer system that like, can't even reliably answer programing questions saying like, is it going to hack my systems or like backdoor me in some way? It just doesn't make any sense to talk. No. It's going to write you a poem that you asked. For, right? Or won't. Even at the time, they couldn't do anything with the cells. And so if you're like kind of being able to subvert human control depends on agency. And so we wanted to come up with a measure that kind of tracks agency over time to kind of say, when would this argument no longer apply? When are AI systems now able to kind of do long, complex enough actions by themselves that the argument kind of the goalposts almost move somewhere else too, like, well, we would catch the eyes or the I don't want to subvert human control. And so I agree that there is a distinction between like how I think partially the exercise of trying to come up with these measures throws off things that are very like grounded and intuitive measures of AI progress that might be more intuitive than just benchmarks, right? So if you a lot of people are in the game of making just, benchmarks where you say, like, here's my, my home bench or something, that I get 70%, that's, that's much less of a kind of grounded or long lasting metric. Like, it's hard to say what that means or how that generalizes, but the idea with time horizon is like, maybe it's a, it's more intuitive. And I think that helps both for safety and for like business understanding.

### [9:26](https://www.youtube.com/watch?v=Hx5dIJ3H8p4&t=566s) How Time Horizon is Actually Measured

So let's talk about what this chart I got the main chart here of METR or right on the front page. It's you know, it's this time horizon chart and it shows up is 4. 6 as of February 2026. Able to complete a task length in 11 hours and 59 minutes with a 50% success rate. I have to admit, the first time I saw this chart or versions of this chart, what I assumed, and I suspect the others assume, is that it was able to go off and work on a task for 11 hours and 59 minutes and then come back with an answer. But apparently it's not that. What do you walk us through? Like what's really being measured here? By the way, the previous high was always GPT 5. 3 Codex. That was five hours and 50 minutes. So I guess part of the reason this charge of blew people in mind, because literally that's basically a double. But why don't you talk to us about what's really being measured here? Yeah. So, so fundamentally, you know, in simpler terms, we are plotting the difficulty of tasks. The AI is are able to complete overtime and, you know, the particular way that we measure the difficulty of tasks is in how long it takes humans to complete, to complete those same tasks that we're asking the AI to do. So in this case, you know, we're talking about purpose 4. 6, something like tasks that take humans 12 hours to do. We predict that it will succeed at those tasks. Around 50% of the time. And yeah, you know, it turns out that when you plot using this particular difficulty, measure the, how long it, how performances are, relative to how long it takes, humans to complete these tasks. We see an exponential increase in capabilities for AI's. And you know what? That's what that ends up meaning, is that you keep on having these doublings of capabilities every, let's say, four months, it seems on, on recent trends where, you know, the next model is not mainly going to have necessarily, you know, an hour longer time horizon, but perhaps be, having some multiple of the time horizon. If the, if the previous model does come out.

### [11:27](https://www.youtube.com/watch?v=Hx5dIJ3H8p4&t=687s) Human Baseline Methodology

So then explain how that number that 12 hours is established. So there is some engineering task and you say, okay, this is a task that would require 12 hours. But humans have all different types of talent capabilities. How do you establish that? Okay. This was a 12 hour task. This was a six hour test. This whatever it is. Yeah. So the simple answer is literally we get humans to sit down and complete the tasks that we give to AIS under as close to identical conditions as possible. So first we come up with the tasks and that's, you know, that's a whole lot of kettle of fish. We can talk about exactly how we do that. And then using essentially the same tools that we're about to give the AIS, we take, talented humans, you know, not people who have seen this particular type of task before, but people who have relevant expertise. So if it's a software engineering task, you know, they have software engineering expertise, machine learning task, they have machine learning expertise. And then we time them, we see how long it takes for them to complete those tasks successfully. And then roughly, we call the difficulty of the task as measured in human time to complete as the average time it took these humans to complete the task. Then we'll, run the AIS on this same set of tasks. You know, typically today for the very easiest tasks that more or less always going to succeed, there's some mid-range of tasks where, you know, perhaps they succeed 50% of the time, or perhaps for some tasks in that range, they succeed 0% of the time and for others 100% of the time. And so they're getting 50% on average, let's say. And then for the much harder tasks, perhaps they're getting closer to 0%. And then, the, the point at which we predict, you know, in the middle of all of these zero percents and 100% spike by task, the point at which we predict that they'd have a 50% chance of succeeding, that is either a, 50% chance at succeeding on some task or 50% of the tasks, of that difficulty that they, we think they would succeed on. That's what we're going to call the time horizon of these models.

### [13:24](https://www.youtube.com/watch?v=Hx5dIJ3H8p4&t=804s) Task Distribution Focus

I think one thing also that could be good to explain here is the task distribution. I mean, this is not all activities that humans do. We are specifically interested in or the like. There's some question in what tasks are, you know, like Joe mentioned, we're having people come into our office do the tasks to get a sense of how long it takes. We're not having them come in and like, you know, paint paintings or write novels or you know, we're focused here specifically on things that are in the distribution of work that a, engineer at a like, we like to think of it as like a frontier AI lab. The tasks that they might be doing. So this is things like software engineering, it's fine tuning, AI models, I don't know if you want to give other examples, but it's it is like software machine learning that that, kind of task. Wait, can I just ask, why did you decide to focus on engineering? Because you could have widened it out to, you know, if we're talking about AI being capable of, you know, taking over the world, there are all sorts of substantive tasks that would fall under that category. So why just do engineering? So for one thing, I think maybe other people on the team, or maybe Joel has thoughts about this, but might have other answers. I think my particular motive in being interested in the time horizon on software tasks is that it is kind of first of all, it's the thing that the industry is very like already, even before we started working on this is very focused on. So it's one of the capabilities that you should expect to come along for the ride earliest. It's the thing that like a lot of optimization pressure is being exerted on. And then I think that it is kind of the like thing that you would expect as an early warning kind of sign of this AI, R&D, automation. So to some extent, METR thinks of itself as trying to build, you know, science that are advance science that can say, when are we getting to the point that AI systems could improve themselves or speed up the pace of AI development? When will AI research kind of feed on itself? And the kind of core capability for that might be software engineering and machine learning research ability? There are other skills that could be relevant to taking over the world. Right. I think other people have done time horizons on like cyber security since. Yeah, yeah. But I suppose it is true. Like the Basilisk isn't going to paint its way into like, power or something like that. Okay. The other it might deceive you, it might be like, you know, you could it might be very convincing or cunning in some way and. Fair hand. Over the keys and might. I might say, if you're if your mental models, you know, we don't have, perfect evidence of this whatsoever, but my rough sense sort of, colloquially or, you know, my probably before evidence comes in is that if we did study tasks on these very different distributions, you know, not machine learning, not software engineering. I'm not sure about painting, exactly, but, you know, perhaps, or other kinds of task distributions that we could and you might that basically we would see this similarly shaped exponential progress over time where every. I'm not sure exactly, but let's say, you know, four months, six months, something like that. The level of capabilities as measured in time horizon would be doubling at something like that pace, maybe from a much lower level. So, you know, one example that we do have better evidence of is that the AIS today are much less performance at, you know, anything that requires vision capabilities, seeing, seeing what's on a screen, kicking, clicking around at a, at a computer. But they're getting, you know, tremendously better at that, that sort of thing. Over time. I just to mention quickly, we did actually do a very kind of brief investigation of this on other task distributions. That's on our website somewhere, like cross-domain time horizons. I think we looked at data from the Teslas shared on self-driving. I'm forgetting the other time there's like OS world. Maybe some of these are like somewhat similar, still kind of in the distribution of software tasks, but trying to get further afield into things like vision. Yeah.

### [16:54](https://www.youtube.com/watch?v=Hx5dIJ3H8p4&t=1014s) Human Baseline Sample Sizes and Challenges

So the other question I had is how big is the sample size on the humans who are actually doing work? And also, is it getting harder getting like human engineers into the room to compete with like opus 4. 6 versus say, you know, if I was a mediocre engineer, and I'm not I'm a non-existent engineer, but if I was a mediocre one, I would like maybe I would feel good about going up against like GPT three or something, and maybe I would feel a lot worse about myself going up against like, Claude. Yeah. You know, on these tasks, I mean, I'm in a pretty similar position myself to you. So we have approximately three, although it varies quite a lot across tasks. Human baselines, per task. So, you know, typically we're averaging over something like three. I think the, the final numbers, it's my impression that they're not going to be so sensitive to, to the particular baselines that we choose. Aren't the longer tasks we more weakly baseline. I'm thinking of Nathan Witkin. He's in my year. Yeah. So indeed I think it will get a lot harder to baseline these tasks. As, as the, as the length of task the AIS are able to successfully complete gets longer and longer. You know, you might think at some points of the, the length of tasks that they can complete is, longer than the doubling time, you know, in, in format time, they're going to be able to complete tasks of more than four months. And then it's, you know, kind of becomes perhaps close to impossible to, to get these four month long baselines. Of course, we're not at that point yet, but, you know, definitely it has become more difficult to get these, to get these baselines as, as time has gone on at the moment. Not impossible, but very challenging. Joe, these are the future jobs for displaced engineers. Right. It's competing against the codes for benchmarks for benchmark evaluation. We found the jobs. Oh here's an all right. Here's a question. So we mentioned at the beginning the most viral charting AI is this chart that you have on the front of your website. Website defaults to this. And it shows, you know, this doubling. So if we actually go back to November and say November 2025, Gemini three Pro three hours and 44 minutes, cloud opus 4. 6 12 hours. This is the 50% success benchmark. If we go to the 80% benchmark, which the website does it default to?

### [19:10](https://www.youtube.com/watch?v=Hx5dIJ3H8p4&t=1150s) 50% vs 80% Success Rate Charts

The improve the pace of improvement looks a little less impressive to me, so okay, now it's like it just doesn't it does not have the same gap pretty clearly now 80% is still not 100%. And I know that this is your Meta's goal is about like, you know, human safety and all this stuff. But when we think about people look at this and they use it as a stand in for how performant are these models, even 80%, you know, certainly for like any business application, I understand you're not like serving business here per se, but probably, businesses care about this. Even 80% may not be very good enough, and it does not look as crazy when you look at the 80% chart as it does at the 50% chart. Why the focus on the 50% chart? And given like, why not? Like why not look at the chart that just does not look is impressive. Yeah, maybe two central things to say. You know, one to my eye is the 80% chart basically does look as impressive for the doubling time. Is about to say this. Is cope on my part. Usually it's the same. It's the same as. An offset of. It's the same pace of progress. You know, something like five times. Five times smaller than the, than the 50%, okay. Than the 50% number. But you know, that only takes you two doublings. And if each doubling takes around four months, that means that in eight months time, you're going to have the same 80% success rate roughly as you do 50% success rate today. That that's one thing to say. Maybe a second thing to say is, you know, remember at the beginning I said, essentially what we're doing is plotting the difficulty of tasks that these eyes can complete over time, just with this particular measure that ends up showing this clean exponential trend. And we've picked a particular number as our difficulty number. And, you know, that is this 50% reliability threshold. We could have picked a different one. I think there are reasons for picking the 50%. One in particular. It's the one that's, statistically we're better able to to, to measure for some, for some technical reasons. It's the one that shows up in, in previous literature. There's some a couple of other reasons why we can go for $0. 50 rather than 80%. Maybe a final thing to say is that this 50% number is sort of equivocating between, these tasks. It's able to complete 50% of the time and 50% of the tasks it's able to complete 100% of the time and 50% it's able to complete 0% of the time. And actually, I think the situation is it's somewhere in between, but it's a little bit closer to the latter, where there are some tasks that it's completing with near-perfect reliability and some tasks in that range that it's completing with very low reliability. And, you know, for, for downstream economic applications or for applications inside of these major AI companies or something, you know, you might think that that's more favorable in some sense, that there are some of these tasks where we're getting 100% reliability, even for very challenging tasks. I think two other things, maybe it could be useful to just explain when you said that there are technical reasons why it's easiest to measure it 50% one, like it is just the case that it is 50% is the point at which it is like least sensitive to like the distribution is kind of thickest, right? I mean, correct me if this is wrong, but my I mean, there are like to resolve something like 95%, you would need way more samples because then you need to have some that are like, yeah, you need way more samples to be able to resolve that level of precision in my. I think there are some caveats to that picture, but let's say even more extreme, you know, let's say that we cared about, you know, 99% in that case, if we had 1% label noise, quote unquote. Yeah, it's, you know, if sometimes we were, we were accidentally grading some of the failing tasks, passing some of the passing tasks as failing, then we just never be able to estimate that reliably. Right? Okay. And, at 50%, this comes a little bit, a little bit closer to washing out. And I think one other intuitive thing here, one intuition, is that if you give me a task and you give me the model, all you tell me is the length of task, then it takes a human to do the task. The 50% time horizon is the, like point at which I think it is more likely that the model will be able to do the task than that it can't. And I just find that into it. Yeah.

### [23:20](https://www.youtube.com/watch?v=Hx5dIJ3H8p4&t=1400s) Investment Interest and Public Information

How much interest you get on these charts from potential investors specifically? And the reason I ask is because, I was just messing around and like googling some stuff. And when the opus chart, the latest opus chart came up, someone posted it on Reddit and I think, like the second comment on it was someone going, how do I invest in open AI? And like people were they were trying to clubbed together to like invest in these companies. So clearly there are people out there who are using these charts as investment tools. I would say, you know, we don't get an enormous amount of inbound from investment firms. I mean, sometimes, you know, VCs or whatever we're based in the Bay area will reach out to us. I think that there's some kind of principle of our goal is to inform the public and give them the best evidence that we can about when we might get to this point of kind of, you know, AI being, you know, fully autonomous or able to improve itself. And there's some principle at play here of like, I kind of want to enable people to do whatever they will do with that information. And I think that, yeah, I, we don't engage a ton in kind of the, like, business side or investment implication of the work. I think that if I knew when one kind of thought experiment, I sometimes do like, say to myself, as if I do believe that at some point we're going to get this AI that's improving itself, and we're like, AI research is automated, and you have all these fears about a singularity. Would I rather that, like all of Wall Street, didn't like falsely, didn't think that was coming when I believed it was coming? Or would I want them all to know that it was coming, given that I believe it's coming? Maybe this is more a personal view, but I think if this is possible that we will automate AI research, I think all of humanity being aware of it is kind of a aware of where we're heading is sort of a precondition for us all being able to figure out what to do about it. And so I don't kind of want like certain people or one side or one team to kind of like selectively be in the dark because they might invest on the basis of this or something like that. But we don't, you know, it's not where we put our time. We're focused on informing the public. The public includes some investors.

### [25:29](https://www.youtube.com/watch?v=Hx5dIJ3H8p4&t=1529s) When to Worry About AI Autonomy

So on that note, like what is the actual level at which we're all presumably supposed to panic or at which, like if you're a policymaker, you would start to get worried about AI being able to automate and improve on itself in a way that eventually becomes detrimental to humanity. I don't know exactly what the level is on this time horizon measure. I think, you know, one thing to say is we have made, real progress on the science of measuring these AI systems and how capable they are, but I think there's a long way to go. And an important sense, I think we're behind on this task. You know, we're measuring some underlying technical trends. And, at some point I, you know, I do think that implies greater, greater risks of astonishing things happening, although, you know, Chris can speak more to, other arguments that we might back out to, for, for why, even if AI is a very capable, we still might not see catastrophic danger so much in the short term. Yeah. I'm unsure what happened today.

### [26:26](https://www.youtube.com/watch?v=Hx5dIJ3H8p4&t=1586s) Current AI-to-AI Collaboration

You know, I think part of the reason why the AGI chatter has really picked up, particularly in the wake of, like, everyone using cloned code is it's very easy to imagine like you're sitting there is like, yeah, do this, do this. Just like I don't even need to be here right? I think you sort of get a very intuitive feel for like how the human can come out of the loop. What happens today is I'm sure this has been tried. Like if you go to, like, ChatGPT. And you say, here's a Claude code, here's a you here you have Claude code access, go build something. And the AI is what actually happens today when AI is working with AI. Yeah. My sense is that, at some point, you know, a further away points than would have been true some, some time ago. The AI is more or less full on their faces. That's, that, you know, there are some things they're not so capable of today, like. Collaborative hallucinations world do just like, you know, just like, devolve into terrible. Yeah. I think all sorts of ways that this can go, you know, at some point they're going to need to rely on external resources. And today that they're not as capable at managing these external resources effectively. I think they're less capable at, sort of ideation and sort of self-awareness about where they are in the problem today than they are these kind of raw software engineering skills, you know, as you mentioned, the ways in which AI's are autonomous today, or close to autonomous today is the human has the idea and then, you know, submits that idea to, cloud code or Codex or one of these other AI tools, and then they handle the software engineering components. Possibly there's still some, some intervention after that. I do imagine that the sort of circle of autonomy or something that's gets larger over time. I do think there's no, fundamental barrier, it seems to me to today, AI is having those ideas and, and so being moved to a greater level of abstraction. But if we were purely relying today on, on these fully autonomous capabilities, you know, could you manage, research department departments, any, any departments of your choice inside of a major AI company? You know, my guess is probably not. Actually, on this note, this reminds me something I wanted to ask. So when you look at the domain specific time horizon charts, so the ones that show like, you know, the I think you call them task suites or something like that, like I guess productivity by specific job and you see these different lines. So sometimes you see like almost horizontal lines and sometimes you see squiggly or steeper lines. What is actually happening there. Like how are we supposed to interpret that. Like is this is that a measurement problem or is it saying something very fundamental about like what I can and can't do under current conditions? I feel like it could be useful

### [29:06](https://www.youtube.com/watch?v=Hx5dIJ3H8p4&t=1746s) Task-Specific Performance Variations

So I think the thing that I think would be good for Joel to explain is that I think that there is a distinction here between, will I like the time horizon chart doesn't by itself, I think tell you, will productivity in one specific kind of job increase because of access to I do I talk about that. Yeah. Maybe one thing to say on on that chart showing the time horizon on, on these different, task distributions relative to my guesses ahead of time. You know, I think there's time horizons are remarkably similar. I think the, the, the doubling times, the pace of progress in AI seems more similar than AI, than I would have guessed to the original trends that we published, although, you know, imperfectly so on this difficulty translating what we might call raw AI capabilities in some sense, you know, capabilities on benchmarks or something to, to real world productivity. I think there are a number of differences in a number of ways, in particular, in which the benchmark results are overestimating what we might see in the wild, you know, not hugely overestimating. I think we do see that people are getting real utility out of these modern, AI tools, but overestimating to some extent. One is that the scoring implicitly is different. In real problems, I'm scoring based on, you know, something a bit more realistic than these, algorithmic scoring procedures, these automatic scoring procedures that we're using at meta, and many other people are using in the, in the benchmark world, there's some notion of code quality. If you're working in software engineering. But for other tasks there's, there's beautiful code. Elegant code piece was talk about. Yeah, yeah. For other task that's going to be if. I was coding this is what about. One more thing is that the, the tasks that come up in the wild are more likely to be messy in some sense. They are, they involve working with other people. They, they involve working in much larger code bases or sort of more open ended problems, maybe with, something even adversarial going on in the, in the software engineering context that might be that someone's trying to make a change to the parts of the code base that you're currently working on, and you need to work around that. And we do tend to see that the AI is, less capable at working on these more messy problems. I don't want to overstate that. You know, it's not an enormous effect. But, you know, that's one thing that gets in the way of, of these productivity increases, you know, and then I do think that there's something to the reliability question, right? Where, you know, if it was true that for a certain type of task you only had, you know, 80% reliability, then every time you're going to need to go back and verify the work of these eyes and not only eyes, but without the context of how they implemented the solution. Relative to if you went to, if you went about the task yourself, you'd already have that in your head. And so this verification step quotes and quotes words would take less time. You know, I don't expect these frictions to, to be sort of so fundamental in some sense or I imagine they, they go up levels of abstraction. I think, not only is the underlying technical progress real, but, I think the productivity improvements are also going to, show up increasingly. But yeah, there are these versions. Tracy alluded to this question when she asked about VCs and investor interest.

### [32:12](https://www.youtube.com/watch?v=Hx5dIJ3H8p4&t=1932s) Industry Dynamics and Safety Concerns

So people see these charts and regardless of what METR Point is like, oh, this is incredible. I got to invest in this. But this brings me to this broader thing that I find very strange about AI, which is this kind of, sort of Baptist and bootlegger relationship between the AI labs, people who are building this stuff and the sort of alignment safety people, and they sort of go back and forth in like the you have the heads of the lab saying, yes, this might destroy the world and take all your jobs of the safety people and the alignment. People say, yes, this might destroy the world. And like I, it's a very strange industry, right? Like, the only thing that I can think of is cigarets, where like, they warn you that smoking is bad, except they had to do that because they lost a lawsuit. I don't think they were particularly inclined to do that. I can't think of any other industry or the most enthusiastic people about it are also warning and doing about how bad the thing they're building could be. So I'm sort of curious, like, you know, first of all, like, and I talked about this in the intro, like, who is the type of person that's like working at meta data is like skilled enough to do like advanced evaluations and like, where's the funding coming from? But like talk to us about like, who's behind meta and why they're there. I think one thing to say on the kind of like big picture of or the history of kind of people caring about AI safety in the Bay area, is that this concern goes back like quite a ways, you could say, for over a decade. There are many people who got into the field because they saw they kind of saw this trend of deep learning, like, what if deep learning works? And it kind of goes all the way to artificial general intelligence and then superintelligence. And if that works, then like it could affect everything. I think possibly when people worry about this, there's a future that they have in mind with superintelligence that's even more capable than what people who think of themselves as like AGI pilled today, think of their managing AI systems that can run, you know, the entire economy. And I think people who kind of, a while ago or many years ago saw that vision and were sort of alarmed about the stakes of it. Many people had this intuition that the thing to do is go and work in the industry, because if you're, like, helping build it, you know, what's the best way to shape the future? It's to build it. And I think that one, there's obviously you could have questions about how sincere that is for many of the people, who are in the industry, or if there's kind of a mix of different motivations and like, you know, different wolves inside of them where maybe they partially are motivated by that, but also they're like, there's kind of this like Oppenheimer, like it feels good to feel like you're in the position of making something that's dangerous. Maybe someone once described OpenAI. To me, this is years ago. A friend said it was like OpenAI. I was sort of like the Manhattan Project, except the goal was to not build the bomb at the very end, if that makes any sense. So to your Oppenheimer point, it's like very strange. And I think one thing to emphasize is, you know, well, it could be that there's a mix of motivations. Now, there are definitely many people, I think, in the Bay area who sincerely believe that the technology is headed to someplace that will be very difficult for where it humanity to stay kind of in the driver's seat, or like stay in control and kind of a meaningful sense. And where are they going with that? Sorry. Zoned out for a second. I think that the I zoned out shoot. Oh, that's. You say they're very worried about people being able to stay in the driver's seat. Yeah. And so I think that, like, this concern, I think the point I was going to make is so I think that the, you know, this concern is quite old, and I think many people have this intuition that they're like, I can influence the thing by building it. But now there's this problem that logic kind of always recommends that you continue building more advanced technology or like more advanced AI systems. And now you have this problem where we're kind of there's all of these companies and they all say that they need to build it because, you know, if they don't build it, another company will. And then even if all of the and they could all have doubts about each other's commitment to safety or to these principles, famously, the leaders of the labs really do not get along. They're not friends. It's not easy for them to kind of sort out the safety thing among themselves. And then even if all of the US AI labs kind of agreed to do that, they then have this kind of external boogeyman of China. Right. Well, what will the Chinese companies do? And so there's this sense in which, just like even if the concern is real, I think a lot of people then who are in the industry have the instinct that they kind of there's no guiding principle for what they should do on safety other than to, like, build leverage for themselves for later. And I think that is a concerning state of affairs for AI development to be in globally. You know, obviously we're trying to do something different by like informing the public or kind of giving like partially, you know, you could imagine that the situation would be better if or like one gap that exists right now in that picture is that it's the people building the technology who most believe that it's going to be destabilizing and sort of all encompassing. Maybe if the public and governments all were on the same page and believed the same thing, if it were true that it was headed there, then there would be kind of like more time for society to figure out a response from people who are not trying to build leverage over the technology themselves directly or, you know, control the technology via some kind of, like public action or government.

### [37:30](https://www.youtube.com/watch?v=Hx5dIJ3H8p4&t=2250s) Chinese AI Models Assessment

Can I just ask very quickly, since you brought up China and I don't want to forget to ask this question, but Quinn doesn't show up on your, like, main charts. I think you did a preliminary assessment of it a while ago, but, like, what's the difference between assessing one of the closed models in America versus one of the open source models over in China? I think one thing to say is that the capabilities are lacking behind, we think that they're lagging behind. I'm not sure. So they're so irrelevant. They just like, don't make it onto the charter. So we do try to prioritize, just because meta has limited resources, staff time in particular. The, the models that we, we anticipate being on the frontier and in general, the Chinese models have been something like, you know, 9 to 12 months, let's say, behind the US models. And I think the, the gap by time horizon is probably even larger than the gap by benchmark scores, where there's some I'm not sure how scientific I can make this, but there's some, colloquial sense or something that's, that the Chinese models are stronger according to benchmark scores than they would be on, you know, truly held out problems in some sense. Like, you mean. They're like, on that they're. Gaming the benchmark. Is that what that means, or. I'm not sure. Technically, you know exactly how that shakes out, but something spiritually, spiritually close to that. I'm not sure that's true for all Chinese models. I'm sure it's true for lots of models outside of China. But I think that's at least one possibility. Okay. And then just getting back to the sort of is AI going to take over the world question. I'm very curious when you talk to external actors in all of this, and I'm going to group them into, I guess, policymakers, investors and, the labs themselves, like, who are you interacting the most with at the moment?

### [39:19](https://www.youtube.com/watch?v=Hx5dIJ3H8p4&t=2359s) Stakeholder Interactions

I think that, in practice, we end up interacting a lot with AI labs because there's some amount of sorting out, you know, sort of getting access to models, kind of working with them to set new precedents on things related to third party red teaming and third party risk assessment. We think of our audience as being sort of like high context members of the public. So the kind of like people, you know, who are maybe like you too, right? People who are kind of like people. Listening to this party. Yeah, I guess, yeah. People listening to this podcast, people with kind of who have to make important decisions, that will be informed by the pace of AI progress or like the kind of profile of the AI capabilities overall, because we're based in the Bay area, I think we like disproportionately end up interacting with people who are building the technology and like closer to it. Partially. I think back to Joe's point before, I think this is kind of because it is the case that you to kind of care about a lot of these, frontier problems. You, you're kind of selecting for people who are building the technology themselves. There's some sense in which, like the companies in the industry, spends more time thinking today about frontier capabilities assessment than the government does. Yeah. I think, like one day you could imagine us getting to the point where the government is, like very focused on this and dedicating a lot of resources to it. And at that point, I would expect meteor to be spending more time talking to governments. And that's kind of what I was getting at, because our senses and a lot of the conversations, like we talk to people and they'll say something about like, oh, it's important to have a social safety net for an AI enabled future. But no one seems to be really thinking about it in a lot of detail. And when you say, you know, it's easy to imagine or, maybe the government will care more about this. Not so easy for me to imagine. It seems like they mostly care about, you know, data centers and, like, where they located and stuff like that. It would be nice if we had policymakers really looking at, like, frontier capabilities and stuff. Still seems kind of a way off. But it is interesting, you know, you're like, talk about like this sort of like capitalist dynamic, right? There's competition. And it's like, you have a lot of people that are really worried about what if the other guys get to ASI or AGI first, or what is the Chinese, etc.? How much does the fact of like free market capitalism and the demand, you know, the big investors that the VC funds like they want to return, they want IPO if we might get too big. I, I IPOs this year in fact how much. Just like how much do you find that to be perhaps in tension with the safety element. Yeah I maybe yeah people on our team would have different views on this.

### [41:51](https://www.youtube.com/watch?v=Hx5dIJ3H8p4&t=2511s) Capitalism and Safety Tensions

I personally don't feel there's. Yeah there's some thing here of like investors are key decision makers and you know they're people too. That sounds strange to say investors or people. Do I sound like Mitt Romney or something? But I, I think that like, it doesn't feel I think that the thing that feels the element of this, that feels like it could be in tension is if you build a bunch of financial obligations to, keep kind of the pedal to the metal no matter what the risks are going into the future. So, like, one thing I think a lot about is if you're like building up a huge amount of debt to build data centers and then say that you do find evidence that you're now worried about the, you know, loss of control from AI systems. You do find instances of AI systems going rogue. Do you now have like a financial commitment to build out those data centers and like continue kind of the pace of progress? Right. I think that is one place where I feel the tension pretty acutely, like you're building these expectations into the market, that could kind of, force you to continue development when you otherwise would rather invest more in safety or. Yeah, like, it at least gives you a kind of financial obligation to continue scaling, at least compute. I think that, like the people themselves being informed about the progress does not seem bad to me. I think it's like good in some ways for everyone to be on the same page about capabilities that could be related to subverting human control later on. But I think that there is, I think, in the world beyond, like the information that METR Shares, I do think there is attention, like the fact that private companies are building this, I think, could it cause, kind of like really acute tensions in the future where, yeah, people make these commitments that they wouldn't if they were trying to like, slow or, you know, maximize social resilience, the technology. Yeah. I'm not sure how these things shake out, but I think there are some forces on the other side, right. Like, you know, some safety promoting technologies quote unquote, or techniques, do make the models more useful, you know, if they're, if they're better complying, better complying with your will in some sense. And so you have capitalist incentives, standard capitalist incentives to, to invest in that kind of research. Maybe that doesn't cover, you know, the broad suite of safety research that seems important. And it's, you know, so it certainly doesn't, rule out capabilities, progress as being an important axis on which, on which you do want to scale. But I think, I think there are some, some forces in each direction.

### [44:25](https://www.youtube.com/watch?v=Hx5dIJ3H8p4&t=2665s) Compute Costs and Capabilities

Since you mentioned compute just then, can you talk a little bit more about, I guess, the relationship between, like the time horizon improvements and the cost of compute at the moment. And like what you've actually seen and how that impacts it? Yeah. So one extraordinary fact from, from my perspective, I'm not sure how to, how to fit these facts together, but something like the R&D spend on compute of these companies has risen exponentially of course. And in fact, it's risen exponentially at essentially the same rate as time horizon progress. You know, I think there's nothing necessarily about that. You know, it doesn't mean by itself that if compute work progress slows, then then, capabilities progress will also slow. But you know, it's clearly an important inputs into, into AI progress. I expect that to continue to be compute. That can continue to be true in future. Sometimes people ask us if we think it's plausible, how plausible we think it is that capabilities progress, this exponential capabilities progress might slow down at some points at some point in the future. And, you know, one reason it seems it's hard for me to consider it plausible that it will slow down in the next at least small number of years. Is that a lot of those compute R&D investments, basically already baked in, right? Like the centers have already been built, you know, plans for data centers even beyond, you know, 2027, 20, 28, presumably, you know, coming and coming to fruition, coming, coming about. And so some of these input investments are already baked in some sense. So it would be surprising to see capabilities slow to the extent that, compute has been an important inputs. After that, maybe you need to think about, you know, other arguments for how capabilities might slow, but that's roughly how I think about it.

### [46:03](https://www.youtube.com/watch?v=Hx5dIJ3H8p4&t=2763s) Baseline Methodology Criticisms

There's a very good, or interesting critical Substack post. It called Against the METR Graph by someone named Nathan Witkin, who brings up an interesting point that I wouldn't have thought of had I not read it, which is you're paying the software engineers to come in and perform these tasks, right? It seems like, you know, maybe this would be the last job of humans just doing benchmarks. If I were like a good software engineer. You say, Joe, come in and do this task. What is it? The how do you prevent me? Oh, man, this is taking me a long time. Meanwhile, I keep getting $100 an hour for, like, looking at my computer and time. Oh, this is tough. I'm gonna have to come back tomorrow and keep working on this. How do you avoid the sort of conflict of interest where the person who is paid to work on this problem may be encouraged to take as long as possible to solve it? And with only three people working on it at, at times like, I don't know, just like this does not how did. Yeah, just seems like it seems like a conflict of interest to me. Yeah. So the short answer is, you know, in general, we are incentivizing these people to complete the task as soon as possible, in particular, to complete the task faster than their peers who are attempting, the same task, the time that it will take for them, like the task a. Bonus if they do it faster. Yeah. Approximately. There's a bonus, if they complete it faster, faster than anyone else. You know, another thing to say is, I think it just is true that, baseline methodology or the ways in which we compare to humans in some ways leaves a lot to be desired. That's, you know, ideally we would have invested, you know, a hundred times as many resources in, you know, having 100 baselines and baselines per, task. And, and those would have come from, you know, perhaps the very best software engineers or machine learning engineers in the world. Maybe that would be the comparison that we're making. And indeed we'd be, doing all of, all of this procedure over many more tasks. Not just many more tasks, many more tasks over wider task distributions than just software engineering or machine learning engineering. You know, I mean, I do think time horizon still, represents progress over, over what's come before in the science of measuring AI capabilities. But, you know, in some ways, I'm sympathetic to a lot of, to a lot of criticisms of Time Horizon. I do think that some of the details, at least for the work we've done so far, you know, aren't going to matter as much as you might naively think. So choosing the, you know, shortest baseline time that we that we, at the we end up observing or the longest time, you know, it's actually not going to make that much difference to the final measurements, you know, of course, we do think these people, talented software engineers or cybersecurity people or so on, depending on the task. But, you know, perhaps we could have found even more talented people. They would have completed it in half the time. And so, you know, naively, it would seem like the time horizon that we estimate of these models would be half as long as we actually end up observing. But of course, that wouldn't change. The wouldn't change the doubling time. It would mean you'd get to the same level after another, another four months. In some sense, the you know, the, the big picture that I want time horizon to point to is less this like opus 4. 6 is 12 hours in particular, and more that we're seeing this remarkable pace of progress that shows no signs of slowing in the recent past. And, and I think in the near future as well, you know, in fact, it shows some signs of, of speeding up.

### [49:28](https://www.youtube.com/watch?v=Hx5dIJ3H8p4&t=2968s) Accelerating Progress Trends

Well, I was going to ask about this because I think recently the statistic that you would always hear was like a doubling every seven months or something like that. How fast do you see it going in the near future? Yeah. So, I was a doubling over every seven months person. That was, there was controversy in our team about what to believe here because when we originally published this work, approximately a year ago, you'd see, you know, if you plotted a single straight line, a single exponential, you'd get something like, you know, 6 or 7 months, let's say. But if you restricted to just the time since I think GPT four. Okay, since the 2024 models onwards, you'd see something closer to this, sort of like 4 or 5 month trends. And, some people believed in that. And, you know, some people like me have the intuition that, well, we have so few data points. We should really be estimating over this larger number of data points than a large number of data points, says every 6 or 7 months. There are a couple of things that have changed my mind and, made me realize my colleagues were right since then, one is that for the models that have come out since, you know what trends has, has better predicted, how performant those models would be. And it's very clear that the answer to that is, the, the four months doubling time and not this seven month doubling time. You know, there's some, some possibility that could speed up again. We've seen it. We've seen it speed up once. I think there are some reasons in principle why you might expect it to speed up again. I think there are, you know, some, some caveats about this. You know, these are maybe some, some tricks that my, my colleagues would agree with. And so, you know, maybe, maybe you should discard that or, you know, you should think that they're, they're going to, convince me in the way that they did with the four month versus seven month, doubling times. I have some suspicion that the tasks that meta is measuring performance on, you know, in some sense, a more and more narrow slice of possible tasks and in particular, a more and more narrow slice that is perhaps similar to the kinds of tasks that you'd expect these major AI companies to be training on in the first instance. And so in some sense, where increasingly more so than was the case before, measuring progress on the exact types of tasks that they're trying to get better at. You know, you might think for instance, the kinds of tasks that, that would make for good, reinforcement learning environments, the kinds of tasks that you can score quickly and cheaply and automatically. I think that progress is real. I think that progress, generalizes to some extent to, to other types of tasks. I think we're seeing, you know, remarkable progress in these more messy tasks. For example, as I, as I referenced earlier. One last question, which is like, how big is your team funding? And like also how many people there are basically like really rich from AI and the like, you know what I'm good. I don't need to pursue like 20, you know, stay there, stick around for the IPO or whatever I'm said. And now I want to work on something that let humanity know I've seemed like there are other independent ER researchers and they talk about this like I want to be able to talk about what I saw, Miles Brundage, someone who has like a little, think tank. He's talked about this once, like how many people are, like, rich already and they're like, okay, now I want to work for something that's public facing. Yeah.

### [52:40](https://www.youtube.com/watch?v=Hx5dIJ3H8p4&t=3160s) Team Size and Funding Model

So METR right now is about 30 people that were growing and hoping to, you know, grow fast. We are hiring I should say METR dawgs slash careers. And, yeah. You were touching before and kind of the thing about, is it difficult to be a nonprofit, you know, we can't pay people in equity, right? We got to get an. IPO, right? Yeah. There's no IPO or anything for me. But we do try to pay competitively on cash compensation, right? So that's, an area where we feel we can, like, somewhat compete with labs. And I it's true that I think a lot of our team is just motivated by trying to kind of do something different, like not, you know, all the companies to some extent are in this build business of kind of like building somewhat redundant products, kind of competing for the same role in the world. And meta is in a really unique position at the moment where I think that we have kind of different, we have like access and the ability to communicate these ideas and explain the state of AI research to a number. Like a lot of audiences, that might be hard for like individual researchers inside of a company. Like we get to talk to a lot of governments directly. We get to come here and talk with you all. And that's kind of different. I think if you look at all the actors that are working in the frontier of AI research or AI safety, you kind of if you compare us to, AI lab staff, I think that our work gets to be we get to kind of every day work on whatever research we think will be most informative to the like public decision. And do you have x I not x I but x is and former AI lab staff who maybe there was a tender at some point and now they work at METR Yeah we do a ton of those. Yeah. So we do have some people who previously worked at AI labs. I do think that as time goes on, I think one hope that I have is that more, you know, there will be more and more researchers who have kind of like, made the money that they need from working in the industry. And now we're excited and kind of like lifting all boats by working on kind of like inside of an organization where the North Star can be what is most informative to the rest of the world outside of these, like, you know, relatively small set of companies.

### [54:44](https://www.youtube.com/watch?v=Hx5dIJ3H8p4&t=3284s) Talent Bottleneck and Research Priorities

Chris is very polite. I think that's wonderful. I'm tempted to be a little bit more aggressive in this conversation. I think we have spoken through Meta's work on some of the most important problems in the world, problems that are going to define the future, I think, for not just the next year, but, you know, coming, coming decades, maybe, maybe even coming centuries. And we've also spoken about some of the ways in which meta workers it's not might not what you might want it to be, that there's a long way to go in the science of evaluating these AI's. Why have we not made more progress? You know, maybe, maybe a couple of reasons. I think clearly the central reason is that we are bottlenecked on, on technical talent, on, you know, incredibly capable people to come work on these questions. I was on a meta work rate recently where we were brainstorming, you know, 20, 30 of these what seemed like world important problems, problems that we think no one else is going to get to if we do not get to them. And we are able to to, to conduct research on how many of those problems. I think it's one to, you know, maybe if we do an extraordinary job this quarter, it might be three, you know, as, as Chris alluded to, I think if you're interested in, you know, less working on redundant products, at these, at these major companies and more, you know, advancing our understanding on some of the most important questions in the world that are going to shake the world for years to come. METR is a great place to go. Yeah. One more thing to say about that is like the vibe inside of meta is a state of triage, right? And I think people often tell themselves externally, people might guess, oh, you know, Meta's a it's outside of any of the AI labs. So the thing it might most struggle with is things like access to AI models. You know, you can't do the research you want because you don't have you're not building the thing yourself in practice, or that's the story that people always tell us. You have to build, you know, the future to shape it in practice. I think our experience at meta is that like, when we want to try new types of research that would require new kinds of structured access. Our experience at this point has been that AI labs are like pretty game to play ball on that. And the thing that is more happening is that we're having to turn down opportunities to do stuff like that, because we don't have the staff that we need to make those things happen. So yeah. Interesting. Joel and Chris, thank you so much for coming on Odd Lots Absolutely fascinating conversation and I appreciate your taking your time. And great to have you in studio. Yeah. Thank you so much. You so much for having us. There was a really interesting conversation too, that we had starting from the end, sort of the idea of like, okay, here are some really important questions, like, let's just set everything aside. And there's 30 people working on this and. There's, you know, and like how many people want to do it and it's like, okay, we trying to match cash comp, etc... Yeah, that seems like kind of a tricky issue if like if you accept the premise that these are some big questions we have to get right and you got to land this plan hopefully like that's a bit of an issue. Yeah. The other thing I thought was really interesting was the Chinese models not really making it on the charts, even though like we know in, in the market itself, like when Deepseek when that new version came out, that was like this huge thing where everyone started to panic and to not see it even like land on the time Horizon chart. It's kind of interesting. I guess it's I mean, I guess I buy the reasoning from their perspective that the only interesting question from Meta's perspective is like the most cutting edge was going to be slightly adjacent to the most interesting chart for like business. Right? So it's like, okay, we know the deep sea. You can clean and kill me. And all those are like very impressive. Do they push like the very frontier? Perhaps not, but just in general. I find the space so weird because it's like here you have these people who are, like, clearly quite alarmed at the potential here. And most people, I think, look at these chart and they say like, wow, this is like, I want to invest in this, or this is like, no. I know, like that's why my first question was like, you're here for AI safety purposes, but everyone seems to get excited about the line go up chart. Yeah, right. Like there's a. Disclaimer all connected. Like I say, when an industry basically says it's worried by itself. Yeah, you should pay attention. It's really strange. I really like this gets back to, you know, a very it's very strange where you have the CEOs of these companies who are in many cases the most alarmist, and there's this sort of cynical thing. And I don't totally discount the cynical interpretation. It's like, oh, they're saying this because they want to get investors right and so forth, and they need all this money. But look, it was also true that open AI and anthropic, but open a little more. We're like founded with these very exotic corporate structures, like a private company owned by nonprofit, etc., which they presumably did because they took pretty seriously the fact that this technology is a science. It was like very strange. And not just like it's not just enterprise. Software, right? Like they were self-limiting in a way. One other interesting thing to that, I that I this idea is like, okay, like first of all, what's the difference between the seven months and form of the time doubling now much? You know, it's like the people are like, oh, I. Yeah, but it's exponential. Isn't it? I guess it's exponential, but it's still funny to me. It's like, oh, I think like AI is going to destroy all white collar work in two years. And someone else is like, no, no, I think it's going to be three years is if that makes like any different whatsoever. But one thing to consider in, Joel sort of alluded to this. You know, you had like OpenAI shutting down its like video efforts etc... So perhaps part of the story is just this intense focus now on the software engineering side as what these labs are working on. Yeah. And sort of like all these other side quests are not as important. So maybe we will see even more rapid progress on some of these technical benchmarks, because clearly, from the labs perspective, that's where the action is more than some of these consumer things like making images or videos. Shall we leave it there? Let's leave it there. This has been another episode of the Odd Lots podcast. I'm Tracy Alloway. You can follow me @tracyalloway And I’m Joe Weisenthal @thestalwart Follow our guests Chris Painter @ChrisPainterYup. And Joel Becker @Joel_Bkr Follow our producers Carmen Rodriguez @carmenarmen, Dashiel Bennett @dashbot and Cale Brooks @calebrooks And if you want more Odd Lots content, you should definitely check out our daily newsletter. You can find that@bloomberg. com/oddlots And you can chat about all of these topics 24-7 in our discord, discord. gg/oddlots And if you enjoyed this conversation, then please leave a comment or like the video. Or better yet, subscribe! Thanks for watching.

---
*Источник: https://ekstraktznaniy.ru/video/47170*