Because I feel that's a really good way to, you know, frame alignment work. - One of the really key things about control is that yeah, there's kind of this hypothesis that if you guard against the key threat models then that's sufficient. And so you don't actually need to ensure that your model's doing nothing bad. You just need to list through what are the most major things that the model could do that would be really bad, block those reliably and then hopefully from there you're pretty safe to deploy, the main threat models that yeah, we've been thinking about Anthropic, it's possible there are more, but I think these are the most salient ones to us currently are models copying their weights out, models deploying themself within a company in an unmonitored way, and the third one would be sabotaging AI safety research. The reason the first two are important, the first two copying weights out of the data center and launching unmonitored copies of the model internally. Those are both important because we can then no longer, it becomes much more challenging or impossible to do yeah, monitor what the model is doing, block it from doing certain things, retrain it, delete it, shut it down, things that. If the model escapes and starts spinning up yeah versions of itself on cloud compute or other compute resource it's able to find then yeah then it's sort of very difficult to shut it down And then yeah, I think if it launches itself internally on the cluster on some, yeah some internal cluster then I think that then it also becomes potentially also similarly difficult or maybe not, yeah, I don't know. It at least becomes difficult to catch that there's something going wrong. You maybe need to check the logs to understand is this a real job that someone launched or is this something that a rogue AI is doing? Maybe the rogue AI can then do unmonitored additional security research, research on how to then later escape or make some PRs which then enable it to do something else that's bad or additional things that are bad. So yeah, I think those, those are two of the key threats on the sort of unmonitored deployment side. Then there's another, yeah, I guess sabotaging safety research is the other threat and models, I mean already models are producing a large fraction of the code yeah surprising large fraction of the code that the world is producing. But yeah, also including people are doing AI research with, with tools cursor or Copilot using LMs to generate the code and yeah pretty soon it seems pretty plausible that people use more agentic coding systems which are doing sort of multi-step coding, coding tasks writing code, running it, checking the results and repeating that loop. And with those systems we, yeah I think there's a kind of strong poll towards just using those to automate a lot of safety work and that the upside of that would be really high. We could get way more safety research done but also on the other hand if we don't trust the models they might actually sabotage that research, persuade us that of a wrong interpretation of some certain plot, persuade us to go down a diff, a wrong research direction, which is actually a rabbit hole and waste a lot of our time, persuades us that a certain model is safe, maybe even runs experiments to justify that the model is safe and then gets us to deploy an actually unsafe model. So yeah, I think there's just a lot of risks related to yeah, sabotaging our understanding of the safety of models. So yeah, I think those are, yeah that's another, another key threat as well. - Yeah, there are other sketchy things that models could do. For example, they're deployed insert general security vulnerabilities that someone then from the outside could exploit or maybe not directly influencing our AI research but poisoning data for future models or persuading internal people or even external people at very important organizations about certain key decisions or doing kind of research helping people externally with AI research. Maybe if we're in a situation where we kind of, we would much prefer that AI research was happening at a small number of well-regulated or kind of well understood institutions instead model might kind of help people outside do a lot of AI research. That's another thing we might wanna monitor for but yeah I think roughly most all things that we care about boil down to the three things you outlined or close to it. - Scary future or scary risks that we should definitely mitigate or work towards mitigating. I wonder maybe to leave on an optimistic note what model developments do you think have happened or what has given you positive evidence that control will work