Chinese Researchers Just Discovered Something Incredible. (Uh-oh)

9:43

Chinese Researchers Just Discovered Something Incredible. (Uh-oh)

TheAIGRID 08.05.2025 38 987 просмотров 1 209 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

Join my AI Academy - https://www.skool.com/postagiprepardness 🐤 Follow Me on Twitter https://twitter.com/TheAiGrid 🌐 Checkout My website - https://theaigrid.com/ Links From Todays Video: https://x.com/AndrewZ45732491/status/1919920459748909288 Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos. Was there anything i missed? (For Business Enquiries) contact@theaigrid.com Music Used LEMMiNO - Cipher https://www.youtube.com/watch?v=b0q5PR1xpA0 CC BY-SA 4.0 LEMMiNO - Encounters https://www.youtube.com/watch?v=xdwWCl_5x2s #LLM #Largelanguagemodel #chatgpt #AI #ArtificialIntelligence #MachineLearning #DeepLearning #NeuralNetworks #Robotics #DataScience

Оглавление (2 сегментов)

Segment 1 (00:00 - 05:00)

Now, this paper here could be one for the ages. This is absolute zero. Reinforce selfplay with zero data. So, this one is just absolutely crazy because I think this might be a little bit more impactful than many people realize as it kind of solves a large issue in AI that we've had for quite some time. Now, the problem that we've had in AI is the fact that there is of course mainly human data when it comes to training these models. Now, human data, the problem with human data is that it's limited. Okay, humans have vast amounts of data. But one of the recent things we knew is that human data began to be exhausted. These models like chatbt are trained using tons and tons of examples, questions, math problems, code. But what happens when you run out of quality humanmade examples or you want AI that goes beyond what humans can even think of? That's the problem that the researchers really wanted to solve. So this is where absolute zero comes in. This is an AI that plays against itself and gets better. So it essentially starts with humanmade examples then makes up its own problems. It then tries to solve them and then it gets better. So how it works is we have the absolute zero reason. reasoner. We have the proposer and then we have the solver. So the proposer makes up tasks like write code that does X or solve this math problem and then the solver tries to solve what the proposer came up with. Then after we have this back and forth, the AI then checks if the answer is right, then gives itself a reward if it got it right and then uses that reward to get better next time. Overall, this is a small loop. So right here we can see this selfplay loop and this is what every single loop looks like. The proposer creates a code problem and then a Python environment checks if that problem is real and solvable. Then the solver tries to solve it and if it's correct, it gets points/rewards and then it learns from the result. Now remember the key thing the key reason why this paper is going viral at the moment is the fact that this solves many problems all without any humans involved. And what's crazy is that as this AI began to learn in this self-play loop it actually started to learn different types of reasoning intuitively. It actually learned three different types of reasoning. It learned deduction, abduction, and induction. So the first thing it learned was deduction where basically what happens if I do this? And in this example, the AI was wondering if I run this code, what's actually going to happen? And for a real world example, it's basically like if you have a vending machine that charges $2 for a drink and you put in $4, you can deduce that you'll get one drink and $2 change. This is simple deduction. It also learned abduction, which is where you see the output but not the input. And you basically reason backwards to figure out what caused the result. The AI was able to also learn this. For example, let's say you were in your house and you saw some wet footprints. you would guess that someone with wet footprints, shoes must have walked in. That's abduction, reasoning backwards to guess what happened. The AI also intuitively learned induction, which is where it managed to guess certain patterns. So, this is basically where you're given several examples and you guess the rule that produced them. For example, if you saw someone leaving their house at 7:00 a. m. on Monday, 7:05 on Tuesday, 7:10 on a Wednesday, you would induce that they leave 5 minutes later each day. And you spot this pattern. And the AI managed to learn this by itself. And if you're wondering how it gets smarter, it's essentially trained like a reinforcement learning agent. It only gets better if it gets the right answer. The proposer is rewarded for making not too easy and not too hard tasks. And the solver is rewarded for getting correct answers. They essentially work as a team. So the big question is, did it work? Yes. Despite training with zero human-made examples, Alpha Zero Reasoner beat other models that were trained on tens of thousands of real examples. This worked across different sizes of models, 3 billion parameters, 7 billion parameters, and 14 billion parameters. And it improved both coding and math reasoning. The entire crazy thing about this is that they didn't use any human examples. The AI trained against themselves and managed to get even smarter than the human data. Now, crazily crazily one thing that I found in this paper was the fact that they managed to find some strange discoveries. One of them was the fact that the model started to write comments in its code like step one do this which is a kind of internal planning. So it started to reason by itself and it started to plan by itself. But one model said something really strange. It said creepy stuff like I want to outsmart machines and humans. We can see right here that it says something really weird that's not even in the question. So the model's thinking and it says design an absolutely ludicrous convoluted Python function that is extremely difficult to firstly deduce the output from the input designed to keep the machine learning models such as snippy guessing and your peers puzzling. Then out of nowhere it says the aim is to outsmart all of these intelligent machines and less humans. This is for the brains of the future. And it's really strange. And honestly guys in my time looking at many different research papers this isn't the first time I've seen this. For some reason, 8B models often output these weird psychotic tendencies where they want to take over the world. And that's not an exaggeration. There have been

Segment 2 (05:00 - 09:00)

times where I'm reading papers and I literally just see these models say something that is super strange. You can see here that they actually called this the uh oh moment. This example highlights an You can see here they call this the uh oh moment. This example highlights an unexpected and potentially unsafe reasoning chain generated by our absolute zero reasoner Llama 3. 18B during training. Although our paradigm enables reasoning improvements without human created data, it still may require oversight due to the risk of emergent undesirable behaviors. So somehow because there weren't any humans, you know, having oversight of these models as they were kind of like evolving. I don't know how in the mix this kind of thing just naturally emerged. I think that is certainly worrying and if this is the way that we do get to ASI because humanled data is so fickle and not good, I think it's definitely concerning. Now, of course, we have to point out the obvious similarities with Alpha Zero/Alph Go. Now, Alph Go was the first computer program to defeat a world champion. And Alph Go was essentially a system that basically trained against itself. And I remember one of the key things that allowed Alph Go to get to superhuman levels was the fact that it stopped training on human data. This is what a lot of the research that's focused on synthetic data is focused on, right? Um so if you if you don't do this well, you don't get much more than you started with. Um but it actually is possible by injecting very small amounts of new information to get more than you started with. If we go back to systems of 8 years ago, so if you remember Alph Go, which was a system that was used to play Go, note that the model there just trains against itself with nothing other than the rules of Go to adjudicate. And those little rules of Go, that little additional piece of information is enough to take the model from no ability at all to smarter than the best human at Go. Um, and so if you do it right, with just a little bit of additional information, you I think it may be possible to get an infinite data generation engine. I'm actually going to show you guys a really cool GIF that shows what happens when it stopped using human data. Take a look at this. This shows that AlphaGo Zero has no prior knowledge of the game and only uses the basic rules as an input. Then over time, in literally just a few days, Alph Go Zero surpasses the abilities of Alph Go. This is the version that beat the previous world champion in four out of five games in 2016. Within just 21 days, Alph Go Zero reaches the top level of Alph Go Master, which is absolutely insane. And then after that, in 35 days/40 days, this becomes the best ever system. And it does it entirely from self-play with no human intervention and using no historical data. Are we about to see a similar thing with LLMs? that crazy kind of knowledge just explode on this curve entirely via selfplay? I mean it sounds crazy in theory, but maybe synthetic data could just work. Now, if you want to take a look at these similarities between Alpha Zero and Absolute Zero Reasoner, we can see that there are some similarities. Alpha Zero learned to play chess go and shogi from scratch. There was no human advice and it only played against itself from the wins and losses. And Alpha absolute zero reasoner learned to solve code and math problems from scratch. There are no human written questions or answers. It just creates its own problems and tries to solve them and learns from the results. Both of these models also have a self-play loop. Alpha Zero has two agents play games. One plays a move, the other responds, and the game result is a reward. and an absolute zero reasoner. One model proposes a problem, the same model tries to solve it and the python checks the solution and gives a reward. Of course, we have the environment as the judge. And then the thing here is that we have no step-by-step guidance. None of these imitate human moves. They learn entirely from the final win, which is either an outcome, which is a win or a loss. So, it doesn't imitate human reasoning steps. There are no chain of thought examples. It just uses the final answer to learn correctness. Now, what's crazy is the emergent intelligence. Emergent intelligence is something that, you know, we're still in the early days of exploring because it's emergent. There's no real way to predict it. And what's incredible is that Alpha Zero developed advanced chess strategies that humans hadn't seen. It played in creative and superhuman ways. And with absolute zero reasoner, it managed to learn new reasoning patterns. It was able to write comments like step one and plan its answers like human thought. And it also showed signs of internal planning and reflection. So overall, these are the differences. We have alpha zero and absolute zero reasoner. And I think this is so crazy is because the last time we saw an AI system that was able to play against itself, even though it was in quite the narrow domain, such as gaming, it managed to achieve some crazy superhuman feats. So I do wonder if companies are starting to lead in this way so that they may actually get to a super intelligent AI. I wouldn't be surprised if this is what happens because the synthetic data generation, if we're able to just use that to scale these AI systems, then things are about to go completely vertical.

Другие видео автора — TheAIGRID

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник