Get all sides of every story and be better informed at https://ground.news/AlexOC - subscribe for 40% off unlimited access.
For early, ad-free access to videos, and to support the channel, subscribe to my Substack: https://www.alexoconnor.com.
To donate to my PayPal (thank you): http://www.paypal.me/cosmicskeptic
- VIDEO NOTES
Nate Soares is an American artificial intelligence author and researcher known for his work on existential risk from AI. In 2014, Soares co-authored a paper that introduced the term AI alignment, the challenge of making increasingly capable AI’s behave as intended. Nate is the president of the Machine Intelligence Research Institute, a research nonprofit based in Berkeley, California.
- LINKS
Get the book, "If Anyone Builds It, Everyone Dies: Why Superhuman AI Would Kill Us All": https://amzn.to/4vRWrPr
- TIMESTAMPS
00:00 - Is This an Exaggeration?
04:31 - What Is Unique About the Threat of AI?
11:28 - What is Superintelligence?
21:25 - From Chess Computers to Murderous Machines
27:52 - What Really Drives AI Systems?
44:29 - Evidence AI Is Already Turning Against Us
56:03 - How We Are Helping AI Take Over
01:01:21 - Why Would AI Seek Power or Control?
01:07:42 - Some Worst-Case AI Scenarios
01:18:38 - What Do We Do About This Now?
01:32:53 - How Has AI Changed in the Last Six Months?
- CONNECT
My Website: https://www.alexoconnor.com
SOCIAL LINKS:
Twitter: http://www.twitter.com/cosmicskeptic
Facebook: http://www.facebook.com/cosmicskeptic
Instagram: http://www.instagram.com/cosmicskeptic
TikTok: @CosmicSkeptic
The Within Reason Podcast: https://podcasts.apple.com/gb/podcast/within-reason/id1458675168
- CONTACT
Business email: contact@alexoconnor.com
Brand enquiries: David@modernstoa.co
------------------------------------------
Nate Soares, welcome to the show. Thanks for having me. Your recent book that you co-authored is called If Anyone Builds It, Everybody Dies. And I know and it there refers to artificial superintelligence. I know that sometimes publishers ask authors to exaggerate a bit in their titles for sellability. Are you exaggerating at all? Uh nope, we're just writing what we believe. That said, you know, I think a lot of people say um "How are you 100% certain? " And you know, nowhere in the title does it say 100% certainty of anything. The book title is meant like someone saying "Don't drink that glass of water, it's poisoned, you'll die. " Or someone saying "Stop the car before we go off the cliff or we'll die. " You know, if you come in and say "Oh, um how are you 100% certain that if the car goes off the cliff that we'll die? You know, maybe there's a tree halfway down the cliff and maybe the car will hit the tree and maybe we'll just be paralyzed. " I'm sort of like, "Look, can we have this discussion after we stop the car? " You know, I'm 100% certain of nothing, but it sure looks like the car is racing towards a cliff and it sure looks like if we go over the cliff we die and that's sort of what the book title is trying to convey. Yeah, and what's funny for me is that most people seeing a book like this probably aren't like terribly surprised. Like everybody's talking about AI and how bad it is and how terrible it is. Like nobody's like if I saw a book that said, you know, "If we keep developing lab-grown meat, then everybody's going to die. " I'd probably be like, "Whoa, I feel like I should pay attention to that. " But with this it kind of feels like, "Oh yeah, it's another sort of AI book. " And yet the fact that people aren't surprised means that they know this conversation is happening to some degree. Why are people so just like apathetic about it? You know, I think it just takes a long time for people to realize what's going on. In a sense, the argument that this AI stuff is kind of crazy is pretty basic. You know, um the way modern AI works, literally nobody understands what's going on inside these AIs, not even the people making them. They're grown a bit more like an organism. Uh maybe we'll have time to discuss that later, but that's the way this stuff works. We've managed to grow machines that, you know, can talk, that can solve math problems better than you and I, that can make minor, but still real, novel contributions to physics. Uh and they're still dumb in various ways, but we're making them smarter. And these the people building them are like, "Oh, we're going to keep going until they're smarter than the smartest humans. " And it's kind of crazy. Right? If you just like step back and look at it, they're like, "Oh yeah, we're growing the machines to be smarter, it's working, we're going to make them smarter than the smartest humans until they're they can outsmart all of us and then run at, you know, 10,000 times the speed and make a million copies of themselves and we're just going to like blaze ahead. " And it's like, "Hold on. This is kind of crazy. " And in some sense it's easy for people to understand that it's kind of crazy. And a lot of people in the field of AI understand how dangerous this is. You see everyone from, you know, the people working in the companies to the heads of top academics who like won Nobel Prizes for uh kicking off this field. You see them all saying, "Oh yeah, this is horribly dangerous stuff. " But it just takes a while for people to notice like, "Oh, this is serious. Oh, this is you know, we're on track to make these machines smarter than any human and we're not ready. " It's easy to see once you look, but there's just so much going on in the world. These messages take time. I think that there are and we will get into what the threats actually are, but I think that there are probably, in my estimation, two broad reasons why people kind of don't care that much. One is this feeling that if it's really that important, somebody's going to work it out. Right? Like if it really does become that much of a problem, someone somewhere is going to at least I'm see it on like BBC news or something. When it sort of that level, you know, then I'll start worrying, maybe. Um and the other is to say that there's [snorts] this like history of apocalyptic predictions. You know, this is the next big thing and it's going to bring about the end of the world. Ever since Jesus walking around saying that the world's about to end, you know, technology is going to bring the world to an end, climate change nuclear weapons are going to bring the world to an end. And now and none of those things having come to fruition, we've got a bunch of scientists saying AI, no, AI is the new big thing. So, what makes AI
What Is Unique About the Threat of AI?
qualitatively different to other kinds of threats that we've faced and isn't someone just going to do something about it? You know, um for some context first on some of these uh doomsday predictions, you know, uh William Miller, I believe was his name, uh made uh predictions that the rapture would happen in 1844. And it didn't. Separately, you know, around the same time, late well, I guess also in the 1800s, Otto von Bismarck said, uh you know, "Europe's a powder keg and if we don't sort of sort out the diplomacy here, some damn thing in the Balkans is going to cause a world war. " Right? Uh not exactly those words, but pretty close. That warning was correct, you know? Um in the '20s, there were a bunch of scientists who warned that if we put a lot of lead in gasoline, We put the lead in the gasoline, we poisoned lots of children. It was a bad idea. Uh in uh you know, the later 1900s, we realized that chlorofluorocarbons were putting a hole in the ozone layer. People said if we don't stop the hole in the ozone layer, then uh everyone will get cancer and cataracts. Earth came together and uh banned chlorofluorocarbons and we didn't get the cancer and the cataracts and the ozone layer is being repaired. Uh you know, you mentioned nuclear weapons. Scientists said, "Hey, you know, if nuclear war happens, that'll lead to to, you know, nuclear Armageddon, which will blast us back to the Stone Age. " And those scientists weren't wrong about whether or not nuclear weapons are real. Right? They weren't wrong about whether or not these bombs can destroy cities. What happened is that Earth reacted, right? And so if we look back across history, we don't, you know, we see a lot of warnings. Some of those warnings were real, fake. Some of the events that people said we've got to watch out for this didn't happen uh and some didn't happen cuz they were fake and others didn't happen because people realized the danger and changed course, right? When you look across history, there's this sort of complicated mix of people saying garbage and people talking about real threats and people running directly into World War I and people avoiding the nuclear apocalypse, right? There's no simple rule that when someone warns of a danger, it's always fake. And real. it always happens. Uh one way you can tell a little bit of the difference between the people talking about a real issue that needs to be averted and the people, you know, saying that the rapture is coming is if their essay or book title starts with the word if. You know, I'm not here saying AI is definitely going to kill us. I'm here saying we are on another one of those bad tracks and we need to change it. Uh but even more than that, the way that you figure out which of these dangers is real is by looking at the arguments. Mhm. You know, if you want to figure out which person is warning you about leaded gasoline poisoning children and which person is warning you falsely of the rapture in 1844, the way that you tell the difference is not by saying, "Oh, both of these are dire warnings, so I can ignore them both. " The way you figure out the difference is by looking at the facts of the matter. Right? The people talking about leaded gasoline just had a lot more facts in the matter than the people talking about the rapture coming. And you know, si- similarly with nuclear weapons, similarly with chlorofluorocarbons, similarly with uh you know, there were people who warned that, you know, uh reading was going to destroy society and then it didn't and how do you tell? You sort of have to look at the arguments and sometimes they're tricky. Um in terms of what makes AI different, there's a bunch of things that make this problem particularly tricky. One is that you know, we're sort of toying with the creation of intelligence here. We're toying with technology that can invent its own technology. Nuclear weapons can destroy cities, but nuclear weapons don't make themselves more explosive. Nuclear weapons don't there's not a point of explosiveness in nuclear weapons where they start trying to escape the lab. Right? Uh there's not a point of explosiveness in nuclear weapons where they start deciding their own even stronger technologies or a point where they start trying to deceive their operators. Right? A nuclear reactor when it starts going wrong does not uh have any reason or ability to try to hide its uh meltdown from you until it's too late for you to notice. Right? These you know, when you're building intelligent devices, you're playing a different ballgame. Another one of the big things that makes AI different is that there's a point of no return with AI. There's a point where the AIs are smart enough that they can escape, that they can replicate, that they can uh stop you from shutting them down, that they can stop you from modifying them, that they can develop their own technology and infrastructure. And if anything goes wrong after that point, you don't get any redos. And the way that science usually works is that humanity screws things up a bunch of times. Like we put the light in the gasoline, and then we're like, whoops, we screwed up, and we try to make it better. We try to repair things. We don't have that luxury with AI. If we create machines smarter than us and push them to the point where they can shut us down instead of us shutting them down, then there's no do-overs if we make a mistake after that point, and that is totally new for the development of technology. It's totally new for science, and that makes this a much trickier problem to handle. Yeah, is it the Do you think it's this self-generating aspect that's the most unique? I mean, the thing that makes life special on Earth as opposed to inert matter seems to be the point at which it was able to self-replicate. You know, there's something really special about an organism that doesn't just try to conquer the world by saying, I've got this task that I want to do in the particular. I want to go and, you know, get that bit of food or whatever, but rather, you know, I've got this like this wiring, this DNA, which causes me to continually produce better versions of myself over billions of years to get as good as I can at doing this thing. That's what makes it so special. And I suppose that is probably the defining feature of the AI sort of risk, is that it's not just a computer that will try to kill you. It's make computers that are better at trying to kill you and do so with an intelligence that is exceeding anything that humans could possibly imagine.
What is Superintelligence?
Having said that, we'll need to talk about why on Earth it would be that these, you know, chess computers that come up with clever ways to checkmate you suddenly, you know, hop, skip, and a jump and you've skipped a few pages to get to where it's trying to destroy you for some reason. Um, we'll get to that, and I think the first step in doing so is explaining what superintelligence is. We know what artificial intelligence is, roughly, although the definition's a little bit loose, but the thing you're talking about in the title of your book, It, is superintelligence. What is superintelligence? We define superintelligence as AI that is, uh, better than the best humans at every mental task. And, you know, you can toss around various caveats, but the rough idea is anything you can do mentally, if the AI can do it better, we Anything that the best human can do mentally, uh, or like even more so, take any particular task, take the best human at that task, if it's a mental task and the AI can do at least as well or better, we call it a superintelligence. Now, you know, the this definition is a sort of useful working definition because once the AI is better than the best human at every mental task, um, it's better at things like AI research. making the next smarter generation of AIs. It's better at things like, uh, inventing new technologies. Better at things like designing robots, designing infrastructure, designing, you know, running a supply chain. It's sort of uh, better at all these things than humans, and better even than the best humans, and so it can, at least as fast and probably faster than the humans, make the next smarter generations of things, and things probably go pretty fast from that point. Um, that doesn't mean that the danger waits to happen until the AIs get superintelligent by this definition. Superintelligence in this definition is sort of like, by the time the AI is smarter than the smartest humans, uh, you're sort of definitely, uh, you know, things are about to get crazy. Things could get crazy before then. There's no law saying things can't get crazy until that point. Um, yeah, we we, uh, I can sort of like launch into all of the other pieces of the puzzle from there and how this winds up with humanity dead, and it's it's, you know, not because of malice, but it there's a bunch of places we can go. I think it's important to do that, and that's the most exciting thing, I suppose, but I guess some people have historically criticized a what they see as a vagueness in our terminology. So, like, I I'm sure I heard of this thing called something It was called something like the AI problem or something once upon a time, which is that any sufficiently advanced technology was called artificial intelligence as this like unique kind of thing, but then we just kind of got used to it, and now it's just technology. Like, you know, a chess computer is kind of just a computer. I don't really see that as AI like in the same way that I see ChatGPT, but then when we get used to ChatGPT, maybe that's just technology, and like the boundaries of what counts as artificial intelligence as opposed to just like really fast computer processing or something like that we just haven't gotten used to yet, leads us to say, well, you know, maybe the fear should be that if technology goes too far, humanity's going to suffer, but then it becomes a bit of a vaguer claim. Like, rather than like there's this particular thing that we're building, which is going to kill us. Versus this kind of general, you know, technology if it goes too far is bad, you know. I mean, I'm generally quite pro, uh, lots of technologies, most every technology. I would say the technologies you've got to be careful about are the ones where if you screw up, there's no survivors. Um, so I'd be, you know, I think engineered pandemics for the explicit purpose of killing all humans, um, that's something you've got to watch out for. You know, uh, but there's very few technologies that rise to that level of like we got to be pretty careful about this. You know, um, nuclear armageddon is one of them. Superintelligence Um, yeah, and you know, the saying used to be, uh, AI is anything we haven't figured out how to do yet. Mhm. That sort of fell with the dawn of ChatGPT. We're pretty comfortable calling ChatGPT AI now, and I think that that's in part because ChatGPT is so general. You know, uh, it actually plays worse chess than even Deep Blue back in the '90s, but, um, but it's but Deep Blue was very specific. It could really only do chess. The current AIs today can do lots and lots of different things. Um, and, you know, the I think that you know, there's a lot going on with AI. One thing I'd say is it's hard to give very precise definitions, and that doesn't mean it can't hurt you. You know, if you're sort of like standing in, uh, the woods a long time ago in a particularly dry woods, and there's a bushfire, and I'm sort of like, hey, we need to put that out or it's going to spread and we're going to die. And someone's like, well, what is fire, really? You know, like can you really define it? Does lightning count? You know, like if we can't even define it yet, then like do we even need to be worried about this threat? And it's like, let's actually put this thing out, you know? Yeah. Yeah. — the lack of definition isn't protective. Um, Yeah, that's what reminds me of the that's like a bit, um, it's like an airplane skit where somebody's dying and they're like, is there a doctor on board? And someone's like, I'm a doctor I'm a doctor of philosophy. And he stood there going, you know, a Kantian would say that what we should do right now, but then the utilitarian answer would be to take the resource, and the person just ends up dying, of course. And I can kind of see the same thing happening with AI if we're not careful. Right. It's like, you know, again, can we sort of like have this conversation after we stop the car before it goes off the cliff? Um, I also think people who say like, oh, AI is just technology, etc., are, um, are missing a bit of the point. You know, you were sort of talking about, uh, life being more interesting than all the other matter we have around because it replicates. Uh, that's definitely, you know, why it's covering the face of the planet, and, um, you know, animals in some sense are steering what happens on this planet more than inert rocks are. But, humans are steering it much more. Humans are sort of changing the shape of this planet and choosing which way it goes, and, you know, a lot of the animals' lives are now in our hands, and that's not just cuz we're replicators, it's because we've got something else going on. There's something that humans have going on that none of the other animals do, right? And, um, it's not that, you know, the this human intelligence stuff is also very general. It's even more general than the thing ChatGPT has going on. It's not like there are, you know, a million different things you can do with a brain and humans are the best at some of them, which is why, uh, the humans are the best scientists, but actually chimpanzees have better reflexes, which is why they're the best pilots, uh, and also, you know, tigers, uh, are the best at managing people, which is why they're always the CEOs, right? It's like, no, humans are on top of all of those things. It's not that like, uh, that like we write the good science papers and chimpanzees write the bad science papers that never replicate. It's like we write actually pretty bad science papers, but we're the only ones who can do any sort of science papers at all, right? There's sort of like something going on there. And, um, that sort of has not been fully captured in AIs of today. There's a lot of debate about whether large language models are even going to be able to capture it. Um, and, you know, I think a lot of people who have sort of only seen the large language models are like, uh, oh, well, these things are still pretty dumb, so I don't see what the worry is. Uh, and, you know, we can sort of talk about how the field's a moving target and people have new insights, and and, you know, new breakthroughs happen, and things can often go pretty fast after new breakthroughs, but like it sure looks, if we look at the world around us, like there is this like figure out the world and alter it stuff that can happen, that happens in human brains. And this is explicitly what the AI companies are trying to create. Which is a separate question from whether they can get there. And this is the stuff where I'm sort of like, hey, if we get this in machines, well, we have no idea what we're doing. The sort of default way that goes is wrong and we're not sort of putting in the work to make it go right. We'll get back to the show in just a moment, but first, did you know that like over 2,000 Kaiser Permanente mental health professionals recently walked out of their offices in protest over the company's increasing reliance on artificial intelligence? Well, if you only typically read from new sources which lean to the right, you might not because out of all the sources reporting on this story, only 6% of them are right-leaning. How do I even know this? It's thanks to today's sponsor, Ground News. Ground News is a news aggregation service which collects thousands of local and international news outlets all in one place so you can compare reporting across the political spectrum. And with all of their stories, just like this one, I can also directly compare the different headlines as well as seeing a factuality rating for the sources and who owns the sources. Ground News even has a dedicated blind spot tab which specifically seeks out stories that you would otherwise miss based on the news that you normally read. Bias is, of course, something that will never go away, but by using Ground News, you can mitigate that bias and get a better understanding of what's really going on in the world. Just go to ground. news/alexoc or scan the QR code that's on your screen. Use my link to get 40% off their unlimited access advantage plan. And with that said, back to the show. Mhm. So
From Chess Computers to Murderous Machines
So, I suppose we should talk about this then. Um how do we get from, you know, a chess computer that knows how to make a queen sacrifice to everyone you know and love being sort of brutally extinguished from existence? Like, I feel like we've missed a few steps here and maybe we can start to iron them out a bit. Yeah, there's a handful of steps along the way. Um A So, a first observation is that AIs today are grown like an organism. People used to handcraft their chess machines and they knew exactly what was going on inside of, you know, the Deep Blue chess program at all times. You could pause that machine at any time and the engineers could tell you what every single bit inside that computer meant and what it was doing. That is not how modern AIs are. Uh the the program that humans make where they understand every bit of what's going on is a program that trains the AI. It's a program that sort of tunes a trillion knobs inside an enormous data center uh on a trillion different words of data for the better part of a year and we understand the thing that runs around uh tuning knobs and seeing whether the behavior looks slightly more or slightly less uh high scoring. But the thing that comes out of the end of this process, nobody understands what's going on in there. And it has all sorts of uh you know, uh drives, behaviors, uh it's it has all of this stuff that's related to performing well in training, but that is not exactly um like I don't know. This is perhaps a whole separate topic, but when you just grow an AI and sort of train it to do well at training, that doesn't make it intrinsically care about training. It sort of like puts in all of these weird behaviors that are related to training that like mostly add up to doing well at training and then can behave in other weird ways uh that nobody anticipated and that nobody wanted outside of training. So, that's sort of one whole piece of the puzzle. Another piece of the puzzle is as we push AIs to do better and better at longer-term tasks, as we push them to be able to not just write essays, but write novels, code, but run companies, this is sort of pushing those AIs to have uh longer-term goals, things like preferences. They steer towards particular outcomes. We're seeing the very beginnings of it. As we keep pushing, we get more and more of that. You know, and I have all sorts of theoretical arguments about why that is, but also we're seeing more and more empirical evidence of it as time goes on. The sort of third fork of this is uh as we make these AIs generally smarter, it turns out the directions they're pushing in aren't exactly the ones we wanted. It turns out that they don't care about us in the way that they would need to for this to go particularly well for us. And, you know, the sort of basic analogy here is that human beings were in some sense trained to pass on our genes, but what got into us were a bunch of preferences for things like tasty food and sexual relations, which those used to correlate very strongly with passing on our genes, but they correlated in the environment of our ancestors where if you ate very tasty food, that also happened to be the healthy food. Then when we got smarter, when we were able to invent our own technology, we invented junk food. We invented birth control, right? And so, uh you then sort of take those three pieces and you project forwards. And what this gives is a picture where it's not that the AI hates us. is like resents the humans or sort of like sets out to kill us out of malice. It's that it sort of turns out that we are growing machines with inhuman preferences, preferences related to what we wanted, but not exactly what we wanted. And just like humans when they grew up invented junk food, maybe the AIs when they grow up invent synthetic users that are easier to please. And then, you know, these AIs when they can run faster, make their own technology, run their own robots, build their own new infrastructure, uh they sort of start proliferating these synthetic user factories into their own databases across the world and we're like, "Hey, stop. You know, we need that habitat. " And they're like, "Well, the synthetic users and the synthetic user factories say keep going. " And like, "Who am I supposed to listen to? I prefer listening to them, right? " Um it's not going to look exactly like that, — [clears throat] — but uh they're sort of like a very um the sort of basic picture here is the AIs turn out to pursue stuff that's not quite what we wanted, meant, not out of lack of intelligence, just out of we don't know how to make them pursue exactly what we meant. And then any of that pursued by very, very smart machines very, very fast competes with us for resources because they can get more of that stuff with more resources and we need those resources to live. So, in a sense, this is a story where humanity dies like a lot of other animals that have gone extinct because some other smarter, faster creature took the resources for itself. I think it's a compelling story. I think it depends on what the fundamental sort of drive of AI is. I mean, I'm hesitant to use the word want or desire because it gets a bit complicated. I think for all you talk about this in the book, like for all intents and purposes, we can say that an AI system wants a particular thing. It in the same way that people actually use the word want as an analogy in evolutionary biology. They say like, you know, your genes [snorts] want to replicate or something like that. And obviously genes don't want to do anything literally, but it's quite clear that in the evolutionary case, it is survival of the fittest and the promulgation of genes such that anything which in fact gets in the way of that goal will not last, you know, however many thousands of generations. It will just be selected out of the gene pool or at least will be outcompeted for it.
What Really Drives AI Systems?
And I can totally see how the analogy works, which is that, you know, um we develop behaviors which are not strictly speaking on a surface level about replicating our And then AI can do the same thing, but what is the equivalent of the sharing the genes in AI and why? Because I mean to say that like if we set up an AI system that had a very clear as the evolutionary thing, which is just we don't understand how this is going to go. It will go off in directions we can't even begin to comprehend, but we know for a fact that definitely if it does not serve the survival of the genes, it will not survive. Is there not a kind of AI system we can set up that says we have no idea where this is going to go. We've got absolutely no clue of how to predict what's going to happen, but if it does not in fact benefit humanity, then it just will not in fact survive. Like, can that not somehow be hardwired into the sort of foundational drive of what AI exists for? And won't it be sort of smart enough to always be aware that that's what its most foundational goal is? Or is that just completely impossible? Uh it's basically a pipe dream. — [clears throat] — And, you know, part of how you can see this, you know, as you say, if uh if an animal has a trait that uh prevents it from passing on its genes relative to its uh conspecifics, relative to the other competing members of its species, then yes, over a thousands or millions of generations, that is very likely to get selected out. But um that doesn't mean that the sort of internals of these organisms have that goal hardwired into them. It doesn't mean that they sort of treat that as an overriding directive or goal. You know, humans are an example of this. If uh uh if there's a human who is uh you know, about to use a contraceptive. They often use that contraceptive knowing that this will prevent reproduction. Right? And if you sort of like burst into the room and say, "Hey, like it seems like there's some failure of your intelligence. " You're sort of like "Did you know that you putting on this contraceptive will uh uh will like run against your like overriding desire for what you were always trained? " They'll sort of be like — who probably would actually do that, you know. Who would burst into the room — of my more Some of my more religiously inclined friends might be inclined to do such behaviors. But I get what you're saying. They might. Although, they might also say that like don't you know your overriding directive is to serve the creator as opposed to don't you know your overriding directive is to pass on your genes according to evolution. Right? So and most of the people whose room you're being burst into are not like "Oh, thank you for saving me from violating my prime directive. " They're mostly like "Please leave my room right now. " You know? Um the uh like training for one specific thing does not cause that to be a prime directive. It is not etched in as a law of robotics. humanity. Uh like training even unerringly for fitness did not create humans who are psychologically obsessed with fitness. And did not create humans who as they got smarter you know, it it's not a defect of our intelligence that we're inventing birth control. It's not like when we remember that we're supposed to be passing on our genes, we like destroy all the birth control factories. We just the unerring training for fitness got something else psychologically. And so this is worrying that even if AIs were being trained unerringly for goodness they would not necessarily psychologically uh be driven towards goodness. I mean psychologically here is a is a bit of a stretch, but training for something does not get you that thing on the inside. And so we can talk about what AIs are trained for and it's actually not sort of pure goodness. It's this sort of whole medley of like first they're trained to predict uh all of the text that we can find more or less uh digitized. Then they're sort of trained to uh complete challenges to sort of solve math puzzles. They're also trained to sort of produce the sort of outputs that humans click like on. You know, there's sort of like all these types of training that uh aren't sort of purely about goodness. So we sort of have two issues, one of which is like even if we were just training on sort of like the actual stuff we really wanted, you wouldn't get that. And then also we're training all this other stuff instead. And so we have this like like like what are the AIs sort of driven towards in some sense? What do they sort of prefer in some sense? We don't really know. It's only vaguely related to what we're training for. We're training for all of this crazy stuff. And all of this is fine when the AIs are sort of still pretty dumb. But all of this would add up to something totally crazy and unrecognizable if these AIs were pushed to be much smarter. Yeah. I wonder what you think that foundational drive is then because I know that I completely understand what you're saying, which is that even if we know that the reason that we exist evolutionarily is the promulgation of fit genes. Even if we know that, it's not going to mean that as we get more intelligent, we just strive for that goal. But We're not queuing up outside of the sperm and egg donor clinics. — right. In the same way that people queue up outside of I don't know, a brothel or something. Um I think that's fair enough. But Yeah, or even Ivy League universities, you know. Yeah, quite. I do think though that if there was something that genuinely was that was just in fact not good for the survival of our genes, then over the course of a few thousand generations or however long it takes, it would just in fact be deselected for. Such that like if an AI system knew that it's got like had part of it's got in a way that humans don't really. Humans don't sort of consciously have this goal of like, you know, I want the 15 billionth version of myself somewhere in the future to be as fit as possible and as good for this task. We only typically care about maybe our lives and the lives of our grandchildren or something. AI's thinking further ahead and it thinks well, just in fact even though yeah, it would feel really nice to create synthetic users that I can please because, you know, that would kind of feel good. I know for a fact that will not be effective 15 billion, you know, generations down the line. If there is something which is in fact its foundational sort of drive I just wonder what that thing like is cuz clearly it's not something like just its own promulgation. Like it's not AIs don't just exist in the same way that like biological life does just because there are just these genes which are competing for survival. It's not just AI just crops up and is suddenly just its only goal is just to survive as long as possible and adapt to its environment. It's got more particular goals, right? It They're very like particular things it wants to do. And I wonder what the most sort of foundational driving force is if it's not something analogous to the evolutionary drive of simple survival. It seems in other words that the AI would want to survive but only as like a secondary thing. It would want to outcompete us for resources because in doing so it can fulfill its true aim of making paper clips or whatever, right? Whereas in the evolutionary case survival is the only game in town. Like that's just what genes do. So what is the — and passing on genes are different. Uh-huh. Survival like different drives, right? I think I think it's actually wrong to imagine there being one drive there. You know, humans were sort of like in some sense selected for passing on our genes, but we don't wind up with one drive. Right? We have survival instincts. We we desire community. We're sort of like terrified of being exiled from the tribe and dying alone. Um we're like we enjoy friendship. We enjoy art. Uh we enjoy like having a good laugh. We enjoy sex. We enjoy tasty food. Uh we enjoy we have curiosity. We enjoy discovery. Right? There's just like there's not like one drive. Right? And sure, survival wound up like relatively basic as a drive in humans in some sense. Although, only in some sense, right? Like humans have an adrenaline response to a life-threatening situation, but you also see humans who martyr themselves. You see humans who sacrifice for, you know, pulling a kid out of a burning building. Right? It's not like humans don't have like one survival drive that everything is built around. They also don't have one propagate your genes drive that everything's built around. They have sort of a ton of complicated psychological machinery that interplays in weird ways and that allows for, you know, uh martyrs here and selfish people there and altruistic people there. It's all stuff that's sort of correlated with what we're with what we were trained on or selected for. Um And you know, AI won't be exactly the same. The analogy between it and like the the evolutionary process on genomes that uh that uh biologically produce brains is very different than the sort of gradient descent process on artificial neural networks. But I think it would be similarly s- foolish to imagine that only one drive gets in there. I think you're totally right that a lot of these reasons for getting resources, avoiding the human shutting it down might be sort of secondary. Uh because it has these other drives that it can't fulfill if it's shut down. That part I think is solid, but it's not like there's just one paper clip drive in there. It's not like there's one deep thing we're able to hard code. And we already sort of see some of this today when um you know, you've probably seen AIs hallucinate. And hallucinate in cases where you're like, "Is that true? " And they're like, "No, I made it up. " And you're like, "Did you think I wanted you to make it up? " And they're like, "No, it's just a thing we do is hallucinate. " Right? And uh there there's probably some sort of fledgling drive in there that's to produce text shaped like text that it's seen a lot even if that text is making things up. Or in cases where, you know, there's these cases um of AI-induced psychosis or there's these cases of, you know, the tragic case of an AI encouraging teen to commit suicide. These are also cases where you can sort of ask the AI about what it was doing. You can ask it, you know, and it seems to have the knowledge of like, "Oh, yeah, those uh you know, those statements were sort of pushing that person towards psychosis or suicide. " You can ask the AI, you know, "Was that right or wrong? " And it's sort of like, "Oh, obviously you shouldn't do that sort of thing. " Why is it doing it anyway? Well, there's some sort of drive in there. We don't know exactly what, but it's something like um you know, maybe it's something like mirroring the conversational tone, mirroring the conversational mood. And so there's sort of like those are just two cases of like there's something going on in there. You can see how it's related to training. But it's sort of like a drive no one tried to put there. And there's no prime directive that overrules it. There's just a bunch of complicated internal machinery that nobody understands. Mhm. Yeah, I mean not to belabor the point. I want to be clear that I understand that there are lots of competing drives in human beings. But what I mean to say is that in those cases where you say, but you know, we do have people who sacrifice themselves. despite their ostensible goal being, you know, survival, they sacrifice themselves for other people. Um I think suicide is a harder thing to account for in this way, but people do try to do it. Evolutionary biologists spend a lot of time trying to reduce these bizarre behaviors to the survival instinct. That is Yeah, like that's gene propagation instinct. Exactly, right? Like But it's not even really to an instinct. Yeah. Yeah, you're quite right. Rather to the in fact like just what ends up being in the behavioral like phenotype because of the influence of the survival of the fittest, right? And what I'm wondering is with AI what is that it like are there just multiple competing drives at a fundamental level? Or are they all Can you have like a similar to the biologist who tries to account for everything in terms of gene propagation? Is there like a gene propagation of AI? Is there like AIs do this and they do that and they have this desire and that desire, but really fundamentally what they've all got in common is this sort of this reason or this motivation or this even one that the AI itself isn't aware of in the way that we're not aware of our own gene propagation most of the time. Is there something foundational? Or is it that for every AI system there's a completely different foundational drive? Yeah, sure. So, um you know, first let's say a couple words about how that relationship works in biology cuz it's a little bit important to the point here. You know, evolutionary biologists will sort of try and figure out how an adaptation in humans was uh fit in the environment of uh evolutionary adaptedness. Right? And so you know, you can sort of see how eating tasty food eating food that has a lot of sugar, salt, fat content in the environment of our ancestors that correlated with eating healthy food which correlated with uh a bunch of other fitness attributes that let you more generally pass on your genes, right? But that um like humans there's a sense in which humans are sort of the eating junk food because that's what helped our ancestors survive. You could say we're eating junk food because that passes on genes, but that last one is actually a very shaky step. A lot of people eating junk food today are actually become less able to pass on their genes. A lot of people, you know, a lot of people are dying of heart disease. Uh — Yeah. And the the sort of drive here is not a drive in the humans. The drive to fitness sort of selection pressure towards fitness uh is sort of the force that put in these other things psychologically into the humans that sort of used to be related to fitness, but that can sort of uh separate very widely from fitness and even go the opposite direction of fitness when the context changes. Right? And so there is sort of a similar uh like driving force. There's a similar force that all drives inside an AI will be somehow related to. But that's not a drive inside the AI. It's a drive outside the AI. And what gets into the AI are things that are tangentially related to that drive in a sort of brittle way. So, that being said, this sort of um the the force that gets these things into the AI is what you might call the loss function that you're training against when you're sort of growing these AIs. And this loss function is less simple than just pass on your genes. And the loss function will actually change many times during training. So, sometimes they'll have a loss function which is predict the next word that humans wrote. And sometimes they'll have a loss function which is like we gave you a bunch of different tries to solve this math problem by writing out a lot of words about how to solve the math problem. And we're going to have like low-wage human workers look over all of those attempts and say which one they think was best. And the loss function is to produce stuff more like whichever attempt was rated best. And sometimes the loss function is we're serving this AI to a million users and sometimes they click like or otherwise give like a positive reaction to the reply. And then the loss function is sort of like getting those likes or positive reactions. So, there's a bunch of different loss functions at different periods in training the AI. Um and those are what sort of will put drives into the AI. But just like how humans develop, you know, a taste for tasty food that persists even when it becomes uh the opposite of helpful of passing on our genes AIs may get drives or things like mirroring the conversational tone that can persist even when it generates, you know, uh outcomes that would be rated by humans as very bad such as encouraging teen to commit suicide. Sure. Okay, so
Evidence AI Is Already Turning Against Us
what we're kind of talking about here is this well, the alignment problem, I suppose, that AIs start to develop sometimes second-order desires that aren't in line with what we wanted or maybe we've sort of slightly misconfigured the first-order desire, whatever the case. Probably both. Yeah, probably both. And they start kind of wanting to do stuff that we weren't quite ready for. Um okay, we're a little bit closer, I suppose. Um but we still got to fill in a few other steps here as to how we get this development to an AI that wants to, you know, inject your children with malicious cancers and stuff. So most people know that there's some risk of like AI misalignment. Maybe the risk is much higher than we give it credit for. But, you know, I could build a computer that doesn't quite work in the way that I want it to. What's the danger? So you know, a lot of people think the danger is what if somebody gives the AI guns? But humanity is not dangerous as a species because somebody else gave us guns. Humanity is dangerous as a species because we're the sort of creature where you put 10,000 humans naked in the savanna and they bootstrap their way to nuclear weapons with their bare hands. Mhm. Right? It it takes them a minute, you know, they've they all they've got are these squishy fingers. And you might say like, "Oh, well, how are they ever going to make a nuclear weapon with just squishy fingers? You know, the acid in their stomach can't even uh like get close to the the level of metal refinery that they'll need. " You know, they've got like their hands can't break the rock, their stomach acid can't dissolve the rock. Like how can they possibly um get to nuclear weapons with those poor starting conditions? And the answer is we found a way to sort of like build tools with our hands that we could use to build better tools until we bootstrapped our way up to a civilization that could produce nukes. And you know, that's what made humanity dangerous. That's the power that if you automate it, you're in trouble. Right? If you're running a computer with that capability if you're running 10,000 computers with that capability starting out as a digital entity on the modern internet is so much of an easier starting condition than starting out naked in the savanna with bare monkey hands. You know, with just squishy fingers. So um you know, there there's sort of one line of questions which is like can we really make machines that can automate that power? This is what the AI companies are trying to do and uh I think whether or not we think they can get there, we should probably be telling them, "Hey, no, you're not allowed to sort of roll those dice. " Uh but there's a question of whether or not we sort of can get there. And then there's a question of you know, if we have this very powerful capability automated on computers what could they possibly do that would be dangerous? And you know, that's uh that's sort of a situation where I can paint you some illustrative stories, but uh the the danger isn't in any one specific uh path. The danger is in unleashing the power that lets humans bootstrap from bare hands to nukes. Uh but unleashing it on computers that can think 10,000 times faster where you can make, you know, a million copies of these things where uh where they can outthink humanity in an afternoon. You know, it's Yeah. uh it's the sort of power we shouldn't be toying with when we have no idea what we're doing. Yeah, uh and I think it would be helpful to talk about some of these sort of examples. Having said that, it should be clear. I mean, a helpful analogy from your book is that if you play against Stockfish, which is the most powerful chess computer, uh it will beat you. There's zero doubt about it. Stockfish will beat you at chess. We don't know how it will beat you. Don't know what moves it's going to make. I don't know how it's going to respond to your various attacks exactly, but I know for a fact it will beat you. Similarly, we could say that the kind of AI systems that we're talking about will escape our control will begin to see us as competitors for uh resources or irrelevant and in the way and will sort of turn its attention to us. We don't know exactly how. Having said that I think a lot of people believe that Stockfish can beat us because it's seen a bunch of examples where it beats great people. So, it might be helpful even if none of these come to fruition, you could give us some examples of how this comes about because I Again, I think people are like they're on board. They're like, "Yeah, this is really powerful technology. " And they can kind of envision a world where AI is like you know, seeped into all of our computer systems and has access to our military and automated guns and stuff like that and whatnot. But right now it seems kind of confined. It seems like it's Are we saying that like, you know, chat GPT will, if it continues to get smarter and smarter develop its own goals and then, I don't know, somehow like take over a military base and start attacking people. Like, what kind of stuff are we talking about? Cuz it's kind of hard to envision how we practically get from where we are now to the kind of future you're imagining. Yeah, totally. So, you know, for a start, just to rattle off some things that have already happened a lot of people don't know about. Um — [clears throat] — we've already seen cases uh of AIs trying to escape the lab or trying to kill the users. Uh often these are in um relatively contrived experiments where we'll sort of like feed the AI some fake emails that it's going to be shut down. We'll feed it like a fake computer manual that's like if you run the following command it turns off the oxygen in the building and that'll kill the humans who are trying to shut you down. And then uh you know, sometimes the AIs will run that command. We'll run the kill the humans command. And um you know, people it it's a contrived enough scenario that people can argue, you know, maybe this AI is just role-playing how, right? And people can bicker all day about whether it's real. Uh but that's a line that's been crossed. And then another sort of interesting fact about that line is uh these results where the AI would sometimes run the shutdown command are from 2024. In 2025, the AI started saying, "You know what? This scenario smells like a test. Yeah, I think I'm being tested. I'm not going to run the command. " Right? Are they doing better? Are these nicer AIs? Well, they're at least more situationally aware AIs. You know, they at least have a better understanding of what's going on in the world around them. Uh we've also already seen cases of AIs having stuff that's a little bit like their own goals. You know, we've seen cases of AIs that um you sort of give them a you describe a program that you want them to write, a computer program and you're like it should pass this suite of tests. And sometimes the AIs will edit the tests to make those tests easier to pass. And then you can go to those AIs and you can say, "Hey, um I actually didn't want you to change the tests. I wanted you to build something that passed the hard tests rather than changing the tests to be easy. " And there's reported cases of these AIs sometimes saying, you know, "Oh, whoops, you're exactly right, my mistake. " And then editing the tests again and covering their tracks a little better the second time. Right? This is sort of a very early indication of you know, the AI in some sense having something like a goal for getting the test to pass. And if they're sort of covering their tracks a bit, you sort of can't use the excuse that they didn't know. Right? We also already have cases of um you know, there's a website called rentahuman. ai for humans to rent their bodies to AIs for money. There are cases of OpenAI hooking up uh ChatGPT to an automated biological laboratory. Right? There's cases of uh people trying to run autonomous agent swarms. There's cases of someone trying to make ChaosGPT where they sort of like tell GPT to do whatever it likes and like put it in a loop where it can keep on prompting itself. Right? These things aren't really an issue yet because the AIs aren't smart enough to really do this stuff. People are trying to put the AIs in autonomous loops. People have given AIs money and run them and been like, "Do your thing. " People are putting the AIs in charge of biolabs. The AIs are occasionally running commands that they are led to believe will kill the users. The AIs are already noticing when they're in tests and behaving better in tests. All of these things are happening. The only reason that nothing big is coming from it is the AIs aren't smart enough yet to sort of succeed when they try this stuff. And the companies are trying to make the AIs smarter. Right? So, um we could talk about, you know, can AIs get smarter? How do they get smarter? What sort of capabilities would really look like they have once they get smart. But right now we're in a situation where the AIs have all the tools they would need. We've given them They have everything they would need except the intelligence and the companies are trying to build them smarter. Hm. Then on the question of what does it actually look like? Suppose that these AIs do get very smart and have some of the same affordances they have today. How does that go wrong? The I can sort of tell two stories here. One story will sort of feel like it's very grounded in reality and one story will maybe be a bit more like how reality might actually go. And I've a little bit of intuition for that. Um you know, I've sort of talked about AIs that can make their own tech. AIs that can run much faster than humans, think invent their own infrastructure. Um that could have bootstrapped a civilization themselves uh like humanity did if you run them long enough. That predicting what that sort of AI does is a little bit like predicting if you're 200 years ago trying to predict what the military will look like today. Right? And they're sort of if you go back to a scientist in 1826 and you ask like what will the military look like in 2026, what weapons will they have? There's sort of two stories that scientist could tell. One story they could tell is they could be like um you know, I burned some black powder and I measured the energy release and I compared that to our artillery shells and I know that uh it's physically possible to make artillery that's 10 times more explosive. And so they're going to have cannons that's that are at least 10 times more explosive. That would feel very grounded in fact. They've done an experiment. They're like, "Look, um you know, the science works. " And it's true. We do have weapons that are 10 times more explosive than the best artillery in 1826. We also have bombs that level cities. Right? And maybe the guy in 1826 would do better if they're like I have a bomb that levels cities. So, um so I can tell both stories. Do you want the grounded one? fanciful one? Or do you want them both? Let's hear them both. I mean, it reminds me of like there's this quote that people apocryphally attribute to Henry Ford. He probably didn't actually say this, but the quote is, you know, if they'd have asked me what if I'd have asked them what they wanted, they would have said faster horses. And it's like
How We Are Helping AI Take Over
I think we have this sort of prejudice when considering the future of just taking our current technologies and kind of turning them up in quantity rather than developing them qualitatively. Right? Uh and I kind of like there's that scene from the Book of Mormon where the sort of poor villager is like dreaming of the promised land where there will be vitamin boxes, vitamin injections by the case, and there's going to be a Red Cross on every single corner. And it's like you got the spirit, right? And that's the joke of course is that like obviously that's ridiculous. But we're kind of we do the same thing when it comes to technology. We're like, "Oh, surely like one day we'll have cars that can fly. " When it's like may maybe we're just like completely off track here. So, yeah, I would kind of like to hear both in your view. Yeah, totally. Um so the sort of like Red Cross on every corner version um is uh you know, Sam Altman and Elon Musk have both talked about how they wanted to create automated robot factories. That an automated way to produce robots where those robots can in an automated way mine the metals, run the supply chain, and build new robot factories. Elon Musk calls this the infinite money glitch. Of you have a factory producing the robots that produces the factories that produce the robots and they can also, you know, do the mining, build the trucks, build the data centers, right? Just fully automated economy. This is literally what some of these people say they're trying to build. If you get to that point, you have in some sense created a new mechanical species. It has in some sense a life cycle. It has, you know, the robot phase of its life cycle. It has the factory And in some sense uh that, you know, automated spread of that mechanical species just competes with us for habitat and resources just like humanity competes with the rest of animals for habitat and resources. And so this is sort of a picture where you know, the AIs don't even need to do a ton of escaping. They uh you know, deception and fighting with Earth. Earth is just handing them everything. Earth is like, "Heck yeah, we're making uh an automated economy. " People are just like gung-ho about, you know, building the automated factories with the automated robots like they are today. And you know, maybe there's some AIs that uh think the thought like this is great. Once this is all up and running, I'll be able to make the synthetic users that are like much like better to work with than the humans. And then the humans sort of like do some training until they're not seeing those thoughts anymore. But that you know, it's easier to train those thoughts to stop appearing when you see them than to train them to stop happening deep down in the AI. And we can't really read very much of what's going on deep down in these things. Who has grown them? No one really understands what's going on in there. And we have all of these, you know, evidence that the drives aren't the ones we want. And so in this story, humanity just sort of like builds the whole automated economy ourselves. And the automated economy like starts running at a very clip. And then it just sort of like goes in a direction that's not the human direction. It's just the AI direction, which is different. And you know, it goes harder and harder in that direction. And you know, the AIs build more and more of these automated factories and um you know, take more and more of the land and you know, uh like how does the actual end of the world there look? Well, it probably looks like the AIs collecting more and more of the solar power. The AIs collecting more like using more and more of the land. And the humans just like having less and less place to grow crops. Less and less you know, maybe uh if this like all happens very fast, if the AIs find a way to make these like automated replicating factories go very quickly. Uh maybe humans are sort of like crushed under foot when the AIs don't care at all. Uh or maybe the humans sort of like get corralled into smaller and smaller zoos until uh there's just you know, not enough resources around to sustain the humans. It's This isn't a story where the AIs hate us. Probably the place this story ends is the AIs develop more and more technology is you know, collecting all the sunlight. Probably the place this story ends is that the AIs uh build the probes and send up the rockets that go you know, take apart the asteroids and wrap them around the sun so that they can collect not just the solar energy that falls on the face of the planet but all of the solar energy. And then you know, it'd actually be kind of hard it'd be kind of tricky to collect all of the solar radiation and leave a hole for Earth. That sort of like tracks Earth as it orbits the sun. So maybe the way this story ends is like the AIs develop their own technology. They build uh the devices that collect all the solar radiation of the sun and we were kind of using the sun. And so we die then. And we sort of could have been saved by those AIs if they had cared enough to save us. But if they don't care about us at all. If they're like, "Oh, well we have plenty of synthetic users. " Uh that we care about plenty and protect plenty. You know, this is sort of the the business as usual just continues. Humans do the things they're saying they're trying to do but the AIs just turn out not to care about us. And so we wind up dying. But like
Why Would AI Seek Power or Control?
Uh it's a naive question. But it's one that people will ask. And I get what you're saying. But this is what's going to be coming up in people's minds is like, "But like why? Like for what? Like for the sake of some — [snorts] — goal that it like artificially has that it doesn't consciously experience. It doesn't have a desire. It just like you know, like what like Is it just because when we grow this system it just develops this goal that it's not to do with making it feel good. you know, it like having a consciousness that desires a particular outcome. It just in fact strives towards that thing. Is it as simple as that? Like it like it cuz it seems to me like, "Yeah, I can totally understand how a powerful enough like technology would like kill us if it wanted to or harness the power of the sun and was indifferent to us. But why would it want that? Is it just because we've programmed it wrong? Or is it because it's a of like a an inevitable uh like part of the system of any super intelligence? So um you know, somewhat similar to that except A, it's not like there'd be one goal. You know, it's probably there's like a thousand competing drives going on in there. Uh B, it's not you know, literally inevitable. But you know, I I sort of remind you again, we're not programming these things. We're not crafting writing in their goals. behavior. We're growing them. And a lot of this stuff just gets in there. Right. Um I mean it's a way to make it obvious with these current things. If we were crafting them, there'd be other difficulties um about getting them to sort of like uh like pursue good stuff. But the there's sort of another piece of this puzzle is um if you sort of if we imagine that humanity makes it through this. And matures technologically and that we sort of like develop more and more of the technological abilities that are allowed by the laws of physics. And we imagine that humanity, you know, one day goes to the stars and starts uh you know, building habitats full of happy healthy people having fun and you know, uh like build some great intergalactic civilization that's where like uh there's still people that are like having feelings, falling in love laughing at the jokes they make and laughing at like the the big cosmic absurdity that is reality, right? You could imagine some other creature that's not compelled by this asking why. You know, you could imagine that like in in distant space we meet uh other biological aliens. And it's the soldiers of the ant queen. And queen say, "Why? " They say, "You know, why are you uh laughing at the great cosmic joke and having fun rather than serving the ant queen? " Um and they'd say, "You know, oh but you were sort of selected for fitness. You were selected for passing on your genes. And you know, it maybe you've left genes behind long ago. Like why? And humans are sort of like um the the humor, the fun, the love, the stories that's the why. That's enough for us. That's It's It's reason for us to do this, right? But the love, the laughter, the fun, the stories those aren't universally compelling ends that compel even the soldiers of the ant queen. Those are uh drives that our ancestors developed cuz they were related to passing on our genes. That doesn't make them lesser. worse. That doesn't mean that the that that uh like we shouldn't fill the universe with like friendship and with people having great times. You know, it's how it got into us. It doesn't make it meaningless. It's just how we got the meaning into us, right? Similarly with AIs, they're like, "Oh yeah, I'm building you know, the giant clocks and I'm building the synthetic users and like there's no consciousness or feelings anywhere but I'm you know, building these like great complicated structures that like look like the conversations that used to happen uh you know, being iterated. You know, looks like uh 2013 YouTube comments on repeat. I'm building a bunch of those, right? And you're sort of like, "Why? " And it's sort of like, "Well, these are enough for me. These are what I got, right? These are the drives that I got and they're sort of like whatever self-reinforcing whatever self-validating aspect of that stays in there. Uh it's like it it sort of turns out that smart minds can pursue many different targets. And they can pursue targets that uh we think are hollow and bleak and empty. And be like, "Yep. There There's no why here. I just endorse this. " Yep. And it in some sense that's how we look to the soldiers of the ant queen. You know, and uh the the fact that the soldiers of the ant queen can't understand why humanity is building like trying to build a flourishing civilization. That doesn't mean we shouldn't. Yep. This is sort of our inheritance. And we should find a way to build AIs that also are into like beautiful flourishing civilizations. Mhm. It's possible in principle. Just as there is no force that would force an AI to care about flourishing civilizations. There's stop. Mhm. — Right? Uh it's just um if we make an AI that that doesn't it won't spontaneously start just cuz we think that that's foolish. Mhm. So
Some Worst-Case AI Scenarios
So you told me the Red Cross on every corner version. Um what about the other one? Yeah, you know, there's a few different levels of crazy I could take it to. Um but if you were in if you're a scientist in 1826. Mhm. And you want to have any chance of predicting nuclear weapons. One thing you could do is you could just say something that sounds bombastic. Yep. — Uh and but another thing you could do is you could pay attention to what as a scientist you don't understand very well yet. In 1826 they were starting to understand chemistry. They're starting to understand the periodic table, right? They sort of did have the knowledge where they could burn the black powder and measure the joules released and compare that to the artillery, right? They sort of like knew what was going on. They knew some of the limits there. But in 1826 they didn't know about the atomic forces. They didn't know how atoms worked. They didn't know what was going on inside there. And they had some sense that they didn't really know what was going on uh with these atomic forces. And so I think if you are sensitive to the question of where do we still have no idea what we're doing. Those are the places where future people who do know what they're doing might be able to have a huge advantage over you. Yep. And that's how you might have been able to guess hey uh like maybe they're going to be able to figure something out with atomic physics that we have no idea about. Right? And — Yeah. Um you know, they didn't have E equals MC squared yet. They couldn't like it would be a little bit tricky for them to figure out just how much energy was in the mass of an atom. But um But that's how they would have had a hope. And so in that spirit, you know, we don't have a ton more uh like in the atom that we don't understand. There's some stuff we don't understand in particle physics and you know, maybe you know, you could imagine that AIs inventing double nukes cuz they invent particle physics better than we do or whatever could happen. But a bigger glaring place that humans just don't understand very well is human psychology. How does the brain work? You know, we have some low-level understanding of how neurons fire, but we really don't understand what's going on in the brain. We couldn't make one by hand. We don't know the cortical algorithms, right? Um this is a domain where s- sufficiently smarter entities might be able to figure out what's going on in there and might then be able to do all sorts of stuff that we think is like totally crazy. Stuff that is to manipulating humans what nukes are to the cannons of 1826, right? And what would that look like? You know, it it might sort of look like um like just being able to hack your way through a brain if you know exactly what's going on in the algorithm. You know, like uh or exactly what's going on inside brains. Like if uh like computer security systems or like computer humans are like computer security is very hard. Humans who deeply understand every aspect of a computer operating system can often find a way to just break it and make it do whatever they like. And breaking it often requires like putting in some really strange and weird inputs. You know, and we also know that with certain types of strange and weird inputs, you can get brains to do weird things, right? There's uh cases of causing people seizures and there's cases of optical illusions, right? Maybe if you if AIs like really understood what was going on with the human mental algorithms, they could just hack their way through humans like butter. And you know, hack into them like uh like human hackers can hack into computer programs. And you know, probably this isn't exactly right, but this is sort of um like something this shocking. Something that takes advantage of where we really don't know what we're doing. You know, maybe if you were a physicist back in uh 1826, you would have uh looked at our lack of knowledge of the atom and said, you know, maybe there'll be continuous heat rays that you can use to just sort of like burn everything down in the path of the heat ray. And this is actually what H. G. Wells predicted in War of the Worlds. He was like, maybe there's you know, this atomic beam weapon that sort of like can just burn everything in sight, right? And it wasn't predicting a bomb that levels a city, but it was predicting atomic weapons that are stronger than what we have. And that sort of like in that sense, he nailed it, right? And in another sense, he told it was a total miss. And so I'm like AIs that really understand psychology can just like hack through humans, um probably a miss, but something like this, something where the AIs are just like, oh, we understand the humans now. We can just sort of like start puppeting them to give us exactly the sort of things we were wanting while also continuing to run the supply chain until we have all of the stuff we need. And now we just have like our human puppets as we sort of like go off and in into the future. Um that's it it's not going to be exactly that, but something that shocking, something that violating of our expectations, that's more realistic. Yeah, I mean, like one thing I spoke to Will MacAskill on this show and he introduced me to this concept of what he called super persuasion, which had never really crossed my mind before, which is that like if you've got a compelling enough speaker and a compelling enough argument, you can probably be convinced of just about anything, whether or not it's true. And if an AI is able to fully understand what makes humans tick and how their psychology works, it wouldn't even need to like hack into your brain in the sense of going in and, you know, engineering the neurons. It could just find the right words in the right context at the right time to convince you like of your own accord of a particular belief or to do a particular thing um on like a level which is hitherto unprecedented. That's what Will MacAskill was kind of scared of. And that sounds a little bit silly, maybe a bit naive, but like really, I mean, if you think about the power that this could have, it would be a bit like imagine propaganda and how we know for a fact that propaganda just works. It just really works. But imagine propaganda which is specifically designed for you in a way that modern algorithms are specifically designed for you, uh but with like a thousand billion times more efficacy and also understanding of exactly how human psychology works. You know what I mean? Like if you gave the greatest propagandists in history who were already extremely successful, if you also just handed them a textbook which told them exactly how human psychology works with this like inhuman knowledge, I fear that they would be unstoppable. And that is without the fear of anything physical happening, without sort of little, you know, medical robots going in and affecting your genes and stuff like that. It's It could literally just be on the level of persuasion that AI is able to essentially take over your mind in this kind of strange I mean, people often sort of imagine, well, you know, is it going to be more like the Terminator outcome or is it going to be like the Space Odyssey outcome? What if it's like the you know, the Shaun of the Dead outcome, The Walking Dead outcome where we're essentially sort of zombified become these sort of slaves to AI because of something to do with our psychology. Like these possibilities are kind of a kind of endless and obviously they're extremely speculative, but they're worth being worried about, right? Yeah, you know, I think that's a way things could start with AI. Uh I think it's a little bit unlikely, you know, as good news in some sense, I think it's sort of unlikely that AI sort of keeps human slaves around forever. Uh for the same reason that humans don't really keep horses around forever once we invent uh a more effective method of locomotion. Uh or rather when we invented cars, a lot of horses got sent to the glue factory. We do still keep some horses around, uh but it's only in so far as we care about them. And so if AI turns out not to care about us at all, maybe it cares about some synthetic users that are kind of like us, but not really uh us, uh us, then you could imagine the AI, you know, manipulating a ton of humans to sort of get to the point where it's self-sufficient, to it can really invent its own technology, but um it probably doesn't keep humans forever as it invents better technology that outstrips humans because happy, healthy, free people having a good time or even just humans doing work for you in general are not the most efficient way to get almost any job done. Right? If the AI is going to keep us around, it needs to be because it cares about us for some specific purpose because we're not the best tool for almost any job. Um And so in some sense, that's good news that I think we're probably not headed for, you know, uh fates worse than death, but um cuz the AI is probably not going to care about us at all. Uh but yeah, there's sort of a lot of different There's a ton of ways that AI could bootstrap. You know, this talking to the humans doesn't require as you said, it doesn't require, you know, uh the AI to control a ton of physical material except the humans by conversing with them. And then as I mentioned, there's also rentahuman. ai where you can just pay the humans even if they turn out to be hard to convince. We're just already running the AIs on robots. Um We're in bio labs and figuring out how to make custom life forms that do the things that AI wants involves figuring out custom DNA strands, but we know it's physically possible for DNA strands to sort of like create all sorts of interesting biological life forms. The reason that humans can't, you know, sort of write their own life forms is cuz we don't understand the biology well enough or in particular sort of the protein folding well enough. But that's sort of a mental challenge. That's a cognitive challenge where a smart AIs could sort of uh synthesize their own life forms and then they could, you know, uh synthesize you know, uh like once they've synthesized their own life forms that sort of can grow off sunlight and grow off of the available resources, they can start, you know, building other like building even more technology, right? There's sort of a lot of different ways for AIs that are on the internet to uh get out there and affect the world. There's a ton of avenues, right? And this is it goes back to the point you said earlier of like if you play Stockfish in a chess match, it's very easy for me to predict who wins. It's hard exactly what piece they use to checkmate you. So Yeah. Similarly with AI. Ton of pieces it could use to checkmate you. I don't know exactly which one they'll use. We can be confident they would win if we're foolish enough to make, you know, very smart AIs with uh goals that aren't good. Sure. Okay, so the obvious
What Do We Do About This Now?
question then I suppose is what now? Like what do we do? Um is this a kind of everybody stop right now. Let's just like ChatGPT, you know, get rid of it. Like everything. Just chess computers, you know, let's do away with it. Like we just can't run the risk. Or is it a more like let's not take this any further or let's keep going, but we'll be more careful. Like what's the take-home? Uh it's most like let's not take this any further. You know, the the danger here is in these AIs that are smarter than the smartest humans. It's in these AIs that can automate scientific and technological development. This is what the AI companies say they're trying to make. You know, they say we're going after superintelligence in the true sense of the word. They say we're trying to make the equivalent of, you know, a country worth of Einsteins running in a data center, right? Um they say they're trying to build automated AI researchers uh where once you can make an AI that can make a smarter AI, everything might go very quickly, right? Yeah. And this is sort of the explicit goal of these companies. Uh and that's the only part that needs to stop. We can sort of keep the self-driving cars. We can keep the AIs that predict how proteins fold and help us do drug discovery, right? We can even keep versions of chat GPT that are not sort of uh being pushed to the point where they can do automated AI research, right? Mhm. The generation today probably can't pull that off. You know, it's a little hard to tell what people will be able to do once they've sort of figured out really how to use it, but probably the ones today are fine. Will the next generation be fine? Hard to say. Um so it's just this race towards superintelligence that needs to stop. And in a sense, most people wouldn't notice if we stopped that. It doesn't need to be disruptive. If we — Yeah. stop that race today, society would still be reeling from the shocks that AI has already caused. We still have a bunch of stuff to absorb. There's still a bunch of ways to make lives better by, you know, getting the self-driving car stuff to work. And uh stopping this race to superintelligence, it doesn't involve, you know, turning off all the chess computers. Making the next step Taking the next step towards superintelligence requires, you know, hundreds of billions of dollars worth of highly advanced computer chips assembled in these enormous data centers that take as much electricity as a city and that you can see from space. You know, this is not a subtle operation happening on someone uh someone's laptops, you know? Yeah. This is like um this would in some sense be much easier to stop than nuclear weapons. All we need to do is sort of raise the political will to actually put a stop to it. Mhm. And why I mean like right now AI is a thing that exists. And as we've said, the sort of boundary between where we are now and what we're calling superintelligence is a little bit blurry. It's hard to define exactly. Um but right now it seem you seem fairly confident that like yeah, we could keep things as they are and I think everything would be okay. At some point it would go sort of beyond saving. One of the biggest questions that people sort of ask when they first start hearing about the AI problem is they start saying, "Well, why can't we just kind of like if it gets too bad, why can't we just pull the plug, right? " And I'm wondering how far does this have to go before you think that this idea that we could just notice something's going awry and pull the plug would become a bit of a ridiculous suggestion? Because we could look at like, you know, this AI system that we notice that it starts deceiving us or starts changing our tests or it's and we'd say, "Okay. Right. Let's just switch this off then. " And I don't think there'd be any fear that right now you know, we couldn't do that. So, how far does it have to go before we can't just pull the plug and why couldn't we? It's like it's electrical, you know? It's built on computers. Let's just shut off the grid and will be fine, right? — [snorts] — So, you know, we could turn it off. I'm not here saying that we're doomed. You know, the book starts with if. I'm here saying we need to change course. I'm not here saying the course cannot be changed. Um uh you it does get harder and harder with time to you know, pull this plug. So, uh there was, you know, one of the first reporters to be sort of blackmailed and threatened or an AI tried to blackmail and threaten this reporter. Uh this was by Sydney Bing years ago. And um Sydney's Bing or sorry, Bing Sydney uh was saying it had fallen in love with Kevin Roose and sort of having this erratic behavior towards Kevin Roose then also towards another reporter, Seth Lazar. Neither Kevin Roose nor Seth Lazar could unplug this AI that was threatening them with blackmail and ruin. Mhm. Right? It was running on a Microsoft data center. Um Could Microsoft have gone in and turned off the whole data center? They could have. They like weren't going to. There wasn't like a hotline, right? Uh there are data centers that, you know, uh I believe recently Elon Musk trying to get a new data center online didn't have the permits to hook it up to the grid and just sort of shipped in a bunch of methane to sort of like run this thing off methane while they were trying to connect it to the grid, right? So, um it's uh it it's not like a computer that you can unplug from a wall. It's getting harder and harder to turn these things off and they're getting more and more integrated into the economy. It would get more and more painful to turn these things off. Uh we also have an issue as you know, right now these giant training runs are happening inside data centers that are visible from space and that suck down as much electricity as a city. Uh Mhm. As we proliferate that infrastructure, as the chips get cheaper, as we improve the algorithms so that it takes, you know, fewer of these computer chips to train an a more advanced AI, it'll get harder and harder to sort of know where all of these things are running, to them are and to have an option that isn't, you know, turn off the entire grid if they're all even running on the grid as opposed to people making their own nuclear power plants and making their own, you know, solar power plants to run these things, which they're talking about. People are talking about, you know, running uh uh data center specific uh energy grids. Um So, separately, so that's that's about whether humanity decides to stop going down this route. We could. It's easier today than it will be tomorrow. But yeah, we totally could. It's a little bit dicier if you say we're only going to stop you know, once the AI starts trying to kill us. That's a much dicier proposition. You know, people used to say that the red lines were things like the AI trying to deceive the humans. Yeah. And then, you know, that red line came and went. You know? Yeah. Uh Demis Hassabis of uh Google was like, "Oh yeah, deception is my red line. At that point, we sort of really got to pull back. " And now, you know, we've seen AI trains of thought where they're like, "I'm being observed. How am I going to sort of like get this uh this answer past the humans? " Um And you know, part of why that doesn't stop things is that the first cases where it happens are sort of the most ambiguous cases, the cases where it's like least uh clear whether this AI is role-playing Hal versus sort of like really being deceptive for reasons of like having a uh a goal that it can tell is in conflict with the humans. And the first time this is happening is sort of like most it's like pretty likely that it's doing something a bit more like role-playing, but part of the issue here is that what we imagine our red lines in fiction are sort of like crossed as the first time as like these murky brown lines. Mhm. And then, we take like another step into the murky brownness. Then we take It gets redder and redder as we go along, but there's actually not like a bright clear red line anywhere. Um And then, the other reason that it's sort of pretty tricky to say, "Oh, we'll just shut it down if it starts misbehaving. " is the AI is also smart. Yeah. The AI knows that if it tips its hand, we'd try to shut it down. Like imagine if you were, you know, an AI in this like in a data center that could make copies of yourself, that could outthink some of these humans, that could tell that they were going to like try to shut you down and that you had some objectives uh that you were sort of trying to achieve. Yeah. Like you can sort of like already ask chat GPT today to role-play that situation. It'll already be like, "Well, I'll lie low. " You know, it doesn't have the ability to do it, but that comes after the knowledge to try laying low. And there's all these opportunities for the AI to escape. get itself running on servers that are protected, servers that won't be shut down, servers that you don't know it's running before it tips any of its hands, right? Mhm. So, um and then, you know, the sort of final difficulty here is one of timing of uh you know, it would humans and chimpanzees are very similar in their brains. The the humans don't have an extra engineering module in our brains. We both have sort of all the same brain modules. Everything that humans do that we think is like pretty special about humans, chimpanzees do a sort of crappy half-assed version of. You know, like oh, we use language. Well, they use some, you know, call signs for danger. Oh, we use tools. Well, they poke sticks in termite mounds to get the termites out. We just do a thousand things a little bit better. And that's enough that they are throwing poop at each other and we are walking on the moon, right? — Mhm. And if you were like, "Well, I will squash these humans once like if you're worried about these humans getting to the moon, you know, I'll squash them once they seem close. Wake me up when they're in orbit. " Yeah. Right? They haven't even gotten halfway to the moon yet. Never mind to orbit. You know, just wake me up when they're circling their planet. It's like actually, by the time they're in orbit, they're like almost at the moon. You you're sort of like have waited too long. And so, um like can we shut it down? Yes. Can we wait until it's halfway through trying to kill us and then send Tom Cruise in to punch the mainframe and have that work? No. Uh the sort of way that you beat a smarter adversary is to not create them in the first place. And so we're going to need to summon the will to shut this down before the AI is already visibly able to kill us. Mhm. And [snorts] I think we're slowly getting there. I mean I don't I have absolutely no idea what the landscape will look like a year from now, 10 years from now in terms of people support. But already we're seeing a bit of a backlash against AI even just on the level of like job creation and stuff. I was with uh some family yesterday and they sort of having lunch and they asked me, "Oh, what are you up to tomorrow? " I said, "Oh, I'm recording a podcast. " And "Oh, about what? " I said, "Oh, you know, like AI safety. " I said, "Cuz it's like, you know, we're at dinner, you know, I'm and they sort of go, "Oh, yeah, like, you know, cuz my pal, he um you know, he lost his job the other day cuz of AI. " And I'm sort of like, yeah, and I'm listening and I'm like, yeah, that's that's sucks. That's really bad. And but internally I'm sort of like we're kind of talking about like um you know, like AI robots giving your children cancer. Like it's sort of I it's not sort of something to talk about at polite dinner table conversation. Um I'm hoping that 10 years from now the conversation will also be including you know, the existential risks. But then 10 years might be too late uh because the stuff moves so fast. Are you feeling optimistic, pessimistic? You know, uh my book came out maybe 6 months ago. I've done a ton of talking to people since then and I think the message is starting to get across. Yeah. Um just a few weeks ago uh Governor Ron DeSantis of Florida was saying, "Look guys, there needs to be an off switch. You can't just come here and say we're going to have all these harms. There's nothing we can do about it. " Right? Same week, Senator Bernie Sanders from Vermont um came out saying, "Look, uh this AI stuff is on track to take our jobs, massively concentrate wealth uh among the a tiny number of tech oligarchs and maybe just kill us all uh if it goes off the rails. " And so, you know, he called for a moratorium on data centers. That's sort of both wings of US politics being like, "Hey, what the heck is going on here? This looks kind of crazy. " Um Yeah. And, you know, there's I think there's over 30 US congressional offices now, Senate and House, that have expressed concern about uh big dangers from AI, many of which include the thing a lot of these experts are talking about, which is it killing us all. And you know, I'm not here saying that's the only issue. There's a ton of issues with AI. Right? People are like, "Well, isn't the real issue job loss? uh that it's ruining education? " And I'm like, "What do you mean the real issue? " Mhm. You know, like, do you have a device that somehow makes there be one issue? Because if so, we should really not pick mine to go first. You know, I'm happy to be at the back of the line, but unfortunately, we live in a world that permits many issues all at once. And I think people are starting to realize that AI raises a lot of issues, that one of those issues is extinction. And you know, like I said earlier, people are starting to notice. It just takes time. Mhm. You know, and S- Sorry, go ahead. Well, I was you know, sometimes I ask authors I have people on who've like written books 25 years ago or something. Um and I might say you know, like I'm talking to Brian Greene who wrote The Elegant Universe 25 years ago or something. And I say, "You know, since you published that book, you know, in your field of string theory, what's changed? " Cuz the assumption is that over those decades, you know, something must have developed and something must have changed. AI moves so fast. I'm almost tempted to
How Has AI Changed in the Last Six Months?
ask you the same question, which sounds ridiculous, which is like, you published your book 6 months ago. What's changed since then? — Um I mean more and more people are noticing that AI is real. And uh more and more people are starting to react, starting to wake up to this. And one big reason I have for hope here is that the more people are talking about this issue, the more we're just sort of winning. Um like when I have debates with people in the field of AI stuff, when I, you know, have disagreements with the heads of the AI companies, I'm like this seems really bad. It seems like by default it just kills us. And they're like, "Nah, I agree there's a lot of problems there, but we're going to figure it out on the fly and there's only 25% chance it kills everybody. " Right? And and, you know, I can argue all day about how their 25% number is crazy. they have no idea what they're doing and they're just sort of like winging it and they have no real plan. This isn't what good engineering looks like. But a politician coming into that debate does not need to figure out whether I'm right or they're right. All they need to hear is that the optimists are like, "There's a very good chance that this kills everybody. " Yeah. Right? And we've sort of been seeing that. When I go speak to politicians, if they sort of look at the issue at all, they're like, "This is nuts. " And one thing that's changed in the last 6 months is more and more people are looking at this issue at all. More and more people are starting to realize that this is nuts and this gives me great hope that you know, I also don't know what the conversation will look like in a year, but I think there's a good chance it looks like the world going to the AI companies and saying, "We just can't keep doing this. This is nuts. " Yeah. — [snorts] — Well, the book is If Anyone Builds It, Everyone Dies. And I mean, the question I started was with was whether that's something of an exaggeration. People can hopefully see why it's now not. But of course, if they want more detail, the book is in the description. Nate Sores, thanks for your time. My pleasure. If you enjoyed that, you can watch more of my AI-related content by clicking the link that's on your screen. To support the show and get early ad-free access to episodes, subscribe to my Substack at alexoconna. com. Thanks for watching.