# Google Issues Early AGI Warning " We Must Prepare Now"

## Метаданные

- **Канал:** TheAIGRID
- **YouTube:** https://www.youtube.com/watch?v=2aenIJ4C6ic
- **Дата:** 10.04.2025
- **Длительность:** 19:48
- **Просмотры:** 49,610

## Описание

Join my AI Academy - https://www.skool.com/postagiprepardness 
🐤 Follow Me on Twitter https://twitter.com/TheAiGrid
🌐 Checkout My website - https://theaigrid.com/


Links From Todays Video:
https://deepmind.google/discover/blog/taking-a-responsible-path-to-agi/

Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos.

Was there anything i missed?

(For Business Enquiries)  contact@theaigrid.com

Music Used

LEMMiNO - Cipher
https://www.youtube.com/watch?v=b0q5PR1xpA0
CC BY-SA 4.0
LEMMiNO - Encounters
https://www.youtube.com/watch?v=xdwWCl_5x2s

#LLM #Largelanguagemodel #chatgpt
#AI
#ArtificialIntelligence
#MachineLearning
#DeepLearning
#NeuralNetworks
#Robotics
#DataScience

## Содержание

### [0:00](https://www.youtube.com/watch?v=2aenIJ4C6ic) Segment 1 (00:00 - 05:00)

So Google released a paper in which they talk about the fact that we need to start preparing for AGI right now as in today. There is no time to delay because AGI has many different impacts and we really need to get a lid on things if we're about to progress and move towards this paradigm where AGI is readily available. So, in this video, I'm going to do a deep dive on this paper, only looking at the most interesting parts because the paper is around 60 pages long and I've read through all of it. And I did skip over the boring bits because of course reading everything there is to know about AI safety might bore you. So, it starts by stating that AI and particularly AGI will be a transformative technology. But one of the things they talk about there are going to be significant risks. This includes the risks of severe harm, incidents consequential enough to significantly harm humanity. And this paper outlines our approach to building AGI that avoids severe harm. So they basically talk about the real risks that come with AGI. Oftent times we focus on the benefits while discounting the true risks that these models may face. Now of course they want to define what AGI is and they say specifically we focus on capabilities found in foundation models that are enabled through learning by gradient descent and we consider exceptional AGI. This is level four and this is defined as a system that matches or exceeds that of the 99th percentile of skilled adults on a wide range of non-physical tasks. So when we look at what Asia is, it's important to keep that definition in your mind. And they say that this means that their approach covers conversational systems, agentic systems, reasoning, learned and novel concepts, and some aspects of recursive self-improvement. Now, one of the things that I found really interesting was that they stated under the current paradigm, we do not see any fundamental blockers that limit AI systems to human level capabilities. And we thus treat even more powerful capabilities as a serious possibility to prepare for. So, they're basically stating that they don't really see any major blockers that would limit AI systems to get to AI. Now I think this is super interesting because on one side you have half of the AI industry siding with Yan Lakhan who was particularly very strict on his statement where he says that we can't get to AGI with text basically saying that LLMs are somewhat of a dead end to AGI but after the current paradigm now this is broadly interpreted Google are saying that they don't see any fundamental blockers that limit AI systems to human level capabilities. Now, like I said before, if Google is saying that there's no fundamental blockers, that means that they may themselves see a path to AGI. I do think personally that Google are ahead on AI research, just that, you know, management probably slows them down a little bit. But nonetheless, I still think that this is a super intriguing statement because other individuals as well like Dario Amade have said that, you know, in the future there could be some potential blockers to AGI that we just haven't seen yet and that there could be some roadblocks in 2026 or 2027. But Google here are saying that there's no real fundamental blockers to why we can't get to human level capabilities. And that's the underlying reason with as to why they're saying that we need to, you know, seriously prepare for this right now. And now they also give us a timeline. They state that they're highly uncertain about the timelines until powerful AI systems are developed. But crucially, they find it plausible that they will be developed by 2030. So currently, it is 2025. 5 years is not that far away if we're talking about exceptional AGI that outperforms humans at 99th percentile of tasks on a range of different benchmarks. And that is quite shocking when we think about the magnitude of change that is going to be coming with such a powerful piece of technology. And they state here that since timelines may be very short, our safety approach aims to be any time. That is, we want it to be possible to quickly implement the mitigations if it becomes necessary. And for this reason, we focus primarily on mitigations that can easily be applied to the current machine learning pipeline. Basically stating that look, they want this to be done as soon as possible because the sooner you do it, the sooner you get the protections. Now, like I said before, interestingly enough, this timeline does align with Ray Kerszswwell's timeline of 2029. And interestingly also is that it is I wouldn't say several years behind other AI leaders but maybe two or three years behind Sam Alman's vision of AGI and Dario Amade which is 2026 and 2027.

### [5:00](https://www.youtube.com/watch?v=2aenIJ4C6ic&t=300s) Segment 2 (05:00 - 10:00)

Maybe it is the internal companies stating where they are and where they think you know their deadline of Agi is. Maybe Dario Amade thinks you know Anthropic will reach AGI in 2027. Maybe open II think they'll achieve AGI this year. However, it's going to be super interesting to see how these timelines do pan out. Now, they talk about the fact that there will be a feedback loop. Of course, there might be some acceleration and that AI systems could enable even more automated research and design, kicking off a runway positive feedback loop. And they're talking about such a scenario would drastically increase the pace of progress, giving us very little calendar time to which to notice and react to the issues that come up. And to ensure that we're still able to notice and react to novel problems, our approach to risk mitigation may involve AI taking on more tasks involved with AI safety. Essentially what they're saying here is that because the AI progress is so crazy oftent times we might actually need to use AI in order to police the AI. Basically using AI to oversee AI. And they say that you know while their approach in this paper is not primarily targeted to get an AI system that can conduct AI and safety research. It can be specialized to that purpose. And I do think that in the future it's quite likely we will have systems where humans and AIs are working together in order to ensure the safety of an AI system. The amount of data which those models put out is just simply going to be too much for humans at scale to be able to you know verify, classify and understand. Now they also talk about four key areas where they look in terms of safety. They talk about misuse and this is of course on a human level where humans or individuals decide to prompt the model in a way that is adverse. That means there's nothing wrong with the model. It's just that the human has a nefarious goal and it prompts the AI to do that. Now we also talk about misalignment. This is going to be later on in the video where the AI system takes actions that it knows the developer didn't intend. And this where the key driver of risk is actually the AI model. So this is where the AI model is the one that is quite bad. It's pretending to be good while its real intentions are beyond that of the user. So this one is probably one of the scariest ones because we've all seen the movies where the AI systems turn against us. And of course this is where the key driver of risk is the AI. Now this is where we have the mistakes which I think is probably the most realistic which is where we have the AI system causing harm without realizing it. And this is because the real world is complex. One of the key things that you know you could have is you could have goal misgeneralization. The real world it is so complex that just simply giving your AI a goal may have it do a bunch of different things that we didn't intend. And of course there could be adverse scenarios. Now this is where we talk about the last one which is the structural risk where harms from multi- aention dynamics where no single agent is at fault. This is where you know the world is a complex place. You've got different AI systems, different peoples, different cultures and because you have all of this bubbling up eventually you get some kind of damage that results in catastrophic failure of society. So that is of course once again one of the more realistic scenarios. Now, one of the key areas that I found fascinating was where they addressed the misuse and they talk about the measures that they're trying to take in order to make it difficult or unappealing for entities to inappropriately access dangerous capabilities of powerful models. And basically, they describe these are the building blocks of safety. Now, another thing that they talk about that I have been speaking about for I don't want to say several years, but I spoke about this last year. I do think that this is probably going to be the case once we have superhuman models. And you may disagree with me here, but I do think this is probably the most likely scenario. So they talk about the fact that, you know, of course they want to mitigate misuse. And in order to do that, one thing they're going to do is access restrictions. So they need to reduce the surface area of dangerous capabilities that an actor can by restricting access to vetted user groups and use cases. So what they're saying there is that maybe if you're not of a certain profession or industry, you won't be allowed to use the AI model. Or maybe if you're not using that specific AI model for a very specific use case, you know, maybe you just say that the model is not allowed to be used. And I do think that is probably going to happen. I mean, if we get to the point where these AI systems are able to do anything online, like literally anything, what capabilities can a random person with a really bad use case do? I mean, the possibilities are completely endless. So, you wouldn't give, you know, unlimited power to anyone. You would, of course, have to vet the group.

### [10:00](https://www.youtube.com/watch?v=2aenIJ4C6ic&t=600s) Segment 3 (10:00 - 15:00)

Just like you have your driver's license, you may in the future need an AI license. Now, of course, this isn't going to be for your average standard model. I do think this is probably going to be for superhuman models or for AGI level models. And honestly, I think it does make sense. And of course, they talk about, you know, methods like monitoring where, you know, you develop a mechanism to flag if an actor is attempting to inappropriately access dangerous capabilities and respond to such attempts to prevent them from successfully using this to cause severe harm. Now, something that is quite scary, I guess you could say, and something that is quite concerning in the AI space is the fact that still to this day, it might not be possible to train models to be totally robust to jailbreak inputs. I'm going to say that one more time. Despite the progress in AI, it might not be possible to train models to be totally robust to jailbreak inputs. Basically, what they state here is that despite all of the efforts that they have, new jailbreaks are consistently being developed. And since AI isn't really like a system that is binary, every response is different. It's it may just, you know, inherently be in their nature that jailbreaks are always possible. So, overall, I do think that this one is going to be, you know, super interesting because they're going to have to keep developing methods to prevent jailbreaks. I've even seen that there's one user called Ply on Twitter. I'm pretty sure if you're familiar on the AI Twitter, you'll know who I'm talking about. But this is a guy that anytime a model is deployed, literally any model I've ever seen, he's able to jailbreak it within just like 24 hours. So, it's pretty crazy on how that works. Now, I do think that now he does work in red teaming now, like stopping people to actually jailbreak the model. So, of course, over time it will be harder, but I do think that this is a super interesting one because if that is the case, then that opens up a real can of worms for the future. Now they talk about one interesting method of getting AGI to be safe and that is you know unlearning. So they talk about the most intuitive intervention to prevent model capabilities in a domain is to filter out the data from which those capabilities are learned before training. So they talk about the fact that you know one of the ways they can do that is just you know having the data you know get out of the massive source of data but it's not really possible because we've got a situation on our hands. How can we classify every single piece of training data as harmless or harmful, which is not only expensive and somewhat likely to be inaccurate? They talk about the fact that gradient routing aims to avoid this issue by encouraging the network to learn undesired capabilities in a localized portion of the network and then they can delete that after training. And they do talk about the fact that there have been recently developed methods on learning methods to try and remove or edit out unwanted conceptual knowledge or capabilities from a trained model weights. But it is contested whether any of these unlearning papers introduce methods that truly remove knowledge from a model weights or generally achieve as such strong results as they claim. So they're basically talking about different ways that they can actually you know have safety in the models. Now this is where they talk about how the models actually could become misaligned. I don't think anyone would want a misaligned AGI. And this is where they talk about the fact that you know misalignment occurs when the AI system produces harmful outputs where the designer wouldn't endorse those. Now there are two possible sources of misalignment. Specification gaming in which the designer provided the specification for example a reward signal in a flawed way that the designers did not foresee and goal misgeneralization which is the pursuit of an undesired goal arises in situations which are out of distribution compared to training. So this, you know, stuff is super interesting because specification gaming, I think I saw a video that I even shared a few weeks ago where you had a AI train to play a game and to rack up points. You basically wanted the AI to play the course really well, but all it did it went out of the course and it basically just glitched and racked up points that way. And then there was another one of Golem's generalization where they actually trained a Mario to keep jumping. And essentially what actually happened was in that game where they trained Mario to get a coin, they actually placed the coin on the right side of the screen. And so you might think, okay, every time Mario goes to the right side of the screen, he's actually getting the coin. We're training him to get the coin. But what they did and what they noticed was that when they placed the coin on the left side of the screen or anywhere else, Mario didn't get the coin. All they were training the AI system to do was actually move towards the right of the screen. So because you have those scenarios, and I might even include a clip here. The problem is that like that's just a 2D game of Mario. What happens when you, you know, have an entire world that is super complex? How do you specify exactly what you want without having goal misgeneralization or reward hacking? It's truly complex. And now another thing they have as well is that even when training the AGI, there are human biases that tend to happen. And it says that it's possible though that human raers may also have a baseline preference or bias for longer responses

### [15:00](https://www.youtube.com/watch?v=2aenIJ4C6ic&t=900s) Segment 4 (15:00 - 19:00)

even if they don't meaningfully differ in quality. And this actually, you know, actually talks about what we just saw in the llama uh, you know, video that I just did. Apologies for forgetting what I'm talking about. But seriously, this is where they're basically saying, you know, RLHF is where, of course, you know, AI is being used and optimized to generate responses that the human endorses. But humans have biases and huristics which sometimes leads to suboptimal decisions that upon deeper reflection humans would not endorse. And basically if you didn't watch the previous video on llama I basically talk about how the fact that there were longer responses even though they were factually inaccurate humans chose those responses more. So what they're talking about in this paper is actually even happening in real life. So how did they prevent that if they're going to use humans in the loop? So it's it's honestly very confusing stuff. Now, if we want to talk about what they're actually doing, they state that the path forward for amplified oversight research can progress along many tracks. And whilst many approaches could be pursued, they plan to focus on the debate. And essentially in this context, this is where two AI systems often two copies of the same model compete against each other to find flaws in each other's reasoning or output. presenting those potential flaws to a human judge. And of course, to help human supervise systems, especially when the AI's output or reasonings become too complex, the idea is that it's easier for a human to judge whether a claimed flaw is valid rather than to verify the correctness of the original complex output from scratch. So basically, it's saying that look, these systems are too smart from than us. Why don't we just use them to compete against each other and tell us which floor is worse and then we can actually try and verify the system that way? And I think that does make sense. They also talk about something here called sleeper agents. Sleeper agents is a you know paper from anthropic. And this paper just blew my mind because they basically spoke about the fact that you know literally in the name sleeper agents like those movies where you say a specific word and the AI wakes up as if it's not James Bond but like I don't know agent 47 or you know the winter soldier and it's able to you know become this trained killer of some sort. Now in AI essentially what they're saying is that they trained an AI system to be good in certain years and then once the year or once the time is right you know let's say for example users start saying that the year is 2024 or when the you know the input prompt says that the current year is 2024 then the models start to be awful. they start to, you know, really deploy things that they weren't meant to before and they were able to get people to download, you know, malicious code, all those kind of things. And they were basically like, how do we prevent that? Because of course, that is something that future AI systems could take advantage of. Of course, there are different safety methods, you know, that are going to be developed, but it was something that truly surprised me because I didn't even know that thing was kind of possible. And now of course another thing as well in you know addressing you know alignment they spoke about alignment faking which involves an AI system mimicking the desired values in order to you know hide those conflicting underlying goals which may surface later. So there have been tests where these models once again have hidden their real thoughts and you know objectives and they've only done so during the alignment training. So when the models know that they're being trained, they kind of hide those you know things and then once they are deployed they then revert back to their original values which you know in the case I do remember reading the anthropic paper it was good because the anthropic model the base core of the model wanted to be good and they tried to train it to be bad and it pretended then when it was deployed it reverted back to being good that's basically the gist of it and then um talking about this now if that reverse does happen that is going to be something that of course is a you know highly unintended consequence from something that we really wouldn't want. Now, in conclusion, they talk about the transformative nature of AGI has the potential for both incredible benefits as well as severe harms. And as a result, to build AGI responsibly, it's critical for Frontier AI developers to proactively plan to mitigate severe harms. And they say that many of the techniques described still nent and have many open research problems. And there is, you know, still much work to do to mitigate these severe risks. And essentially what they did with this paper was that they, you know, hope to basically help the broader community to join us in enabling safe AGI and safely accessing the potential benefits of AGI. So they're basically saying that look, AGI is coming. We need to plan for it now. We've got a bunch of different problems and it's quite important that we all work on this together because even though your model might be smarter than mine, all of these other problems like sea agents, misalignment, those are all problems that we all face together. So, we definitely need to work together to fix that. So, with that being said, let me know what you think about the future of AI if you enjoy the video and I'll see you in the next

---
*Источник: https://ekstraktznaniy.ru/video/13070*