# The Latest AI Breakthroughs You Need to See (Google, OpenAI, Deepseek and More)

## Метаданные

- **Канал:** TheAIGRID
- **YouTube:** https://www.youtube.com/watch?v=qkpPw7T4Aqk
- **Дата:** 08.12.2025
- **Длительность:** 41:55
- **Просмотры:** 26,787

## Описание

Checkout my newsletter : - https://aigrid.beehiiv.com/subscribe
🐤 Follow Me on Twitter https://twitter.com/TheAiGrid
🌐 Learn AI With Me : https://www.skool.com/postagiprepardness/about

Links From Todays Video:
https://www.dwarkesh.com/p/thoughts-on-ai-progress-dec-2025 
https://x.com/MLStreetTalk/status/1992569308010959094 (continuous thought machines)
https://x.com/ns123abc/status/1993388598989836464 (ssi)
https://x.com/rryssf_/status/1996191287293440013 (paper debugger)
https://arxiv.org/pdf/2511.15304 (adversarial poetry)
https://x.com/GoogleResearch/status/1986855202658418715 (nested learning)
https://x.com/realJessyLin/status/1980697898141774017 (continual learning)
https://x.com/realJessyLin/status/1980662516285075762 (continaul learning via sparse memory finetuning)
https://research.google/blog/introducing-nested-learning-a-new-ml-paradigm-for-continual-learning/ (nested learning)
https://openai.com/index/accelerating-science-gpt-5/ (gpt-5 science acceleration)
https://arxiv.org/pdf/2511.12869 (limits at scale)
https://arxiv.org/pdf/2510.13928 (brainrot)


Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos.

Was there anything i missed?

(For Business Enquiries)  contact@theaigrid.com

Music Used

LEMMiNO - Cipher
https://www.youtube.com/watch?v=b0q5PR1xpA0
CC BY-SA 4.0
LEMMiNO - Encounters
https://www.youtube.com/watch?v=xdwWCl_5x2s

#LLM #Largelanguagemodel #chatgpt
#AI
#ArtificialIntelligence
#MachineLearning
#DeepLearning
#NeuralNetworks
#Robotics
#DataScience

## Содержание

### [0:00](https://www.youtube.com/watch?v=qkpPw7T4Aqk) Segment 1 (00:00 - 05:00)

So, there have been many different recent AI breakthroughs. So, let's talk about them. So, in this video, this is going to be a video mainly focused on AI research papers and all of the cutting edge research that is probably going to shape the next 6 to 12 months, including a few research papers that were rather interesting in what they discovered about how AI works and the different quirks and things going on. Now, one of the most exciting things is that, you know, a company called Sakana AI Labs, their CTO and co-founder, Lion Jones, is now saying that it's actually time to move beyond Transformers, even though he was one of the eight original inventors at Google. And they're basically investigating the next significant step towards and have a new IPS paper spotlight called Continuous Thought Machines, CTM, which is super interesting. And they just landed their series B. So they're investigating neuroevolution approaches which they all believe are going to be the big part of the future of AI and they actually publish their research in the open which is super interesting. So this is the paper it's called continuous thought machine. So I'm going to basically explain to you what this paper is how it works and you know how things go. So it's kind of in the name but I'll just explain this to you guys as simple as possible. Continuous thought machines just imagine you have a robot brain that doesn't think once an answer. Instead, it actually keeps thinking inside of its head step by step, just like we do when solving a puzzle. So, most AI models today do something like this. They see a picture, they think once, and then they would answer. But the continuous thought machine works more like a real kid. It looks, it thinks a little, it thinks some more, it checks again, and then it answers. that basically has thoughts that continue over time and it's not just one quick moment. So the first step of this is that each neuron is like a tiny brain of its own. In normal AI, neurons are super simple. They basically say, "I saw this and I output that. " But in the CTM, continuous thought machines paper, each neuron has a small memory and its own mini brain. It remembers what happened a few steps ago and it keeps updating itself. Think of it like having thousands of tiny creatures inside the robot each keeping a diary of what they saw. Then the neurons dance together and that's the synchronization. So instead of using normal numbers to represent ideas, CTM looks at how the neurons move together over time. If two neurons activity goes up and down at the same time, they are synchronized like dancers in a show. The CTM uses these dance patterns as its main way of thinking. It's like watching how people move in a crowd to understand what the group is doing. Now the CTM has its own thinking time. When you solve a maze or a math problem, you take multiple steps. You look, you think, you fix, you try, you answer. The CTM, it does the exact same. These are its internal ticks. Some problems only need a few ticks. Hard problems might need many ticks. And the CTM decides on its own how long to think. So what can it do? Well, the CTM was tested on a bunch of different tasks. Maze solving. It imagines where to go next by looking ahead inside the maze. It even solved mazes bigger than the ones it was trained on. Image recognition. It doesn't just look at images once. It looks around at the picture like your eyes scanning at a scene. There were math puzzles where it figured out rules like, you know, to flip the answer. Basically learning a simple algorithm by itself. There was also sorting numbers. It learned to sort numbers step by step like arranging toys from smallest to biggest. There was also reinforcement learning where it controlled moving robots by thinking multiple times before choosing an action. Now, you might be asking, well, all of that doesn't really seem that new or novel. So, what is actually special or new here? Well, the CTM acts a bit like a real brain. It remembers things over time and neurons have their own, you know, personalities and neurons move together in patterns and it thinks as long as it needs and it shows surprising behaviors like looking ahead or changing its mind. It's not about being the most accurate just yet, but what the interesting thing about this is that it's teaching AI to think in richer, more humanlike ways. And you're going to see that is a continual theme as we go through this video because a lot of the current architectures don't fundamentally support how humans think and reason. And of course, if we're trying to get to AGI, papers like continuous thought machines are going to be key drivers of that progress moving forward. Next, we had the Deepseek release. But with Deepseek, what we actually had was a very innovative thing in that paper that allowed it to push the boundaries of Frontier Intelligence even further forward. So Deepseek had this thing called Deep Seek Sparse Attention and it was really interesting. That's why I want to bring it up. So in normal Transformers, every token/word, it looks at every other token to decide what matters. So if you have, you know, a thousand tokens, 100,000 tokens, every

### [5:00](https://www.youtube.com/watch?v=qkpPw7T4Aqk&t=300s) Segment 2 (05:00 - 10:00)

one compares itself with all 100,000 other tokens, which is basically like, you know, before I answer, I'm going to check the entire history of everything I've said so far. But the problem is that although it does work in Transformers, it is expensive, slow, and it scales badly. So what does Deepseek's innovation here do differently? So instead of looking at every pass token, each token uses a new module called a lightning indexer. Think of it like a tiny fast relevance detector. The steps are simple. So the indexer, it quickly scans all of the previous tokens and then unlike before, it actually scores each token on how relevant it might be. And it only picks the top K most relevant ones, which are basically the most important ones. And then it runs that attention only on those selected tokens. Meaning that it doesn't go around checking every single word. It basically just focuses on the most useful context. So why does this actually work? Well, think about once again how humans reason. Even in real conversations, not every word matters. When solving a problem, you only need the definitions, the key steps, the important numbers, the main logic. And DSA, which is Deep Seek's sparse attention, basically teaches the model to notice those. It's like pressing a book into a set of highlighted notes and reading only the highlighted parts. Now, how is this still fast and still accurate? So what it actually does is instead of doing attention times the whole sequence, it does just attention times the top K, which means this is just, you know, analyzing 2,000 tokens versus, you know, the standard transformers versus the previous is, you know, 100,000 tokens. That's a 50x reduction in work for long context. And this improves the accuracy. So because the indexer is trained to mimic the attention patterns of a full transformer, it learns to pick the almost the same tokens a full model would have actually attended to. So DSA learns to highlight what dense attention would have found important anyway. And you get the same capacity, well almost the same capacity for much less compute. So this is a gamecher for long context. old Transformers slow to acrruel past 32K or 100,000 tokens. But DSA makes 128K, 256K, even 1 million token context realistic because compute actually scales linearly with sequence length and memory requirements drop a lot. So reasoning can happen without blowing GPU budgets and RL training becomes feasible with huge trajectories. That's why deep sea can do massive RL and agentic training cheaply. Now if we want to talk more about you know research papers and that kind of thing there was also this early science acceleration experiments with GPT5. So this is from OpenAI the University of Oxford and the University of Cambridge Colombia Vanderville Harvard a lot of top universities here. So, this one's pretty crazy because this paper essentially shows how GPT5 is already helping scientists to speed up real research across biology, math, physics, algorithms, cosmology, and material science. Now, these weren't toy problems before. They were active unsolved real world research challenges partnered with top universities and national labs. Now, the core idea was essentially GPT5 can't run research autonomously, but in expert hands, it meaningfully actually accelerates discovery. So, this matters because most people don't realize just how slow science is, even when the right idea actually exists. Turning that correct idea into a result, unfortunately, it can take years. Most people believe breakthroughs reach society from their inception too slowly. And so GPT5 is showing the early signs that AI can compress parts of the scientific workflow, which is going to be helping researchers generate hypotheses faster, connect ideas across fields, and run conceptual literatures, simplify complex math, design experiments, and find errors and counter examples to explore many directions in parallel. Now, there are actually real case studies where you've got actual wins. Number one is you've got biology. So in the paper they were you know explaining a mysterious immune cell change. Researchers were you know spending months stuck on this problem and then GPT5 came in and it looked at an unpublished chart and it suggested the unlikely mechanism within minutes and proposed an experiment that confirmed it. Now there was also the potential to speed disease understanding and treatment development. So, of course, you know, if that actually happens, we could speed things up in

### [10:00](https://www.youtube.com/watch?v=qkpPw7T4Aqk&t=600s) Segment 3 (10:00 - 15:00)

there. And we already know how crazy that would be if medicine got speed up. There are so many, you know, honestly horrible illnesses that I would just love for them to disappear. And I think with AI, that would be incredible if we could speed up that discovery process for those kind of drugs. So, we also have mathematics. There was also solving a step of a decade's old Erdos problem. So, two mathematicians were stuck on the final insight. Then GPT5 suggested a patternbreaking argument involving one odd number and that crazy idea by GPT5 that innovative idea it actually solved and unlocked the full solution. So this basically means that you know GPT5 was able to strengthen foundational math used in algorithms and security and we do know that GPT5 has been doing quite a lot of math research recently. So it's going to be super interesting to see where things continue to head. Now we also do have you know algorithms on optimization finding flaws improving results. In this paper it showed that GPT5 showed how a common decision-making method used in robotics can fail and it also produced a sharper version of a recent theorem in optimization. This actually helps engineers understand where the real world algorithms are breaking. And the interesting thing about how these researchers, you know, described using GPT5, they essentially described GPT5 as a super fast, knowledgeable research partner and not a replacement. Whereas, you know, in other systems, they may replace us. But in this paper what they discovered was that humans are the ones setting the goals validating the results critiquing the ideas whereas GPT5 is expanding the search pace you know proposing the mechanisms finding the gap and you know it surfaces obscure references. So the key abilities that they see you know emerging here is that conceptual literature is you know becoming much more efficient because you know these large language models are able to find links that humans miss even across languages. You've also got proof sketching you know in minutes for math and computer science and there's hypothesis generation and experiment suggestions in biology. There is also cross field analogies. This is what the paper talks about with physics and maths and CS. And Tim Gowers, the fields medalist, actually used GPT5 to stress test combinatorics ideas to spot flaws and generate simpler alternatives. And there were, you know, not just scientific results. And there were new scientific results, not just summaries. And examples from this paper include, you know, completing the Erdos problem 848, finding new lower bounds for online algorithms, proving new graph theory inequalities, and identifying hidden parameters in evolving networks. Now, remember guys, these are genuine contributions to the field, although they still require expert oversight. Now, there's also some limitations because we don't want to get too carried away. GBT5 is extremely powerful, but it is far from perfect. It can hallucinate citations, mechanisms, or proofs. It can miss the main subtleties. It's sensitive to scaffolding, to warm-up examples, and can follow incorrect reasoning if not corrected. An open eye do they do stress, they're honest about this, is that this model is really good for scientific research. But you do have to add your human oversight and that this is an essential step to ensuring that you get the research needed. So I'm guessing what I see from this is that what I do see from the model is that humans are the ones that orchestrate the ideas and the long heavy lifting of you know testing everything. That's where we you know sort of use GPT5. So, it's really cool that GPT5 can already assist some researchers with research problems in minutes. And OpenAI basically expects that, you know, as models are allowed to reason for hours or days, they're probably going to unlock much deeper insights, which is, of course, the potential for a steep change in global scientific productivity. Now we also had uh Google released this paper called nested learning and [clears throat] this is a new machine learning paradigm that fixes one of you know the biggest problems. So it basically fixes catastrophic forgetting. So one of AI's hardest problems is continual learning without forgetting the old knowledge. So, so instead of treating a model as one big optimization process, nested learning says that a model is actually many smaller interconnected learning problems, each updating at its own rate, similar to how different parts of the human brain learn at different speeds. Now, this matters because current LLMs, they forget old knowledge when trained on new tasks, which is called catastrophic forgetting, and they have static memories limited to pre-training or short context windows. Now they can't self-improve the way that humans do and nested learning reframes architecture and optimization as one unified system

### [15:00](https://www.youtube.com/watch?v=qkpPw7T4Aqk&t=900s) Segment 4 (15:00 - 20:00)

giving AI a new dimension for memory learning and reasoning. So the key idea for this paper is that you know multi-level learning has to be like the brain. So nested learning the paper by Google it says that each module inside a model has its own context flow which is what it learns from. Each module has its own update frequency which is how it often it learns and the architecture and optimizer are essentially the same thing at different levels. So this creates a stack of nested optimization problems like layers of memory and this mirrors human neuroplasticity where different parts of the brain update on different time scales. So what this actually enables this architecture is that you know deep optimizers. So optimizers like Momentum are reworked as memory systems, making them more robust to noisy data and better at long-term stability. You've also got continuum memory systems. So instead of short-term versus long-term memory, which is transformer retention versus feed forward, models actually get a spectrum of memory modules, each learning at different speeds. So when we think about this, it's much closer to how real biological memory works. And it's also self-modifying architectures. Nested learning enables models that can edit their own learning rules which is one of the big steps to a true continual learning. And this is of course you know uh where they introduce hope which is Google's proof of concept model. So Google built a new architecture called hope based on nested learning principles and hope features self-modifying loops where the model can update its own parameters on the fly continuum memory systems for long-term memory unbounded levels of incontext learning and this is actually a major upgrade over titans samber transformers and other long context models. So the results of this research were that across modeling you know reasoning and context tasks long context tasks hope outperforms transformers titans sambar mambber 2 and it wins with lower complexity higher reasoning accuracy bestin-class long context memory superior performance on the needle in haststack tasks and this basically shows that nested learning the multi-time scale updates really do reduce forgetting and improve continual learning. So the big picture here is that nested learning reframes deep learning as a unified stack of nested optimization problems with time scales just like the human brain. So essentially Google are trying to of course recreate the human brain. And if this paradigm does manage to hold, it could dramatically reduce catastrophic forgetting enabling self-improving models, expand memory and reasoning depth, bridge the gap between LLMs and continual learning. And Google believes that this could help build the next generation of AI systems that learn over time without losing old knowledge. Now there's also the limits of LLM. So there was a research paper that was called the fundamental limits of LLMs at scale. And this paper argues that you know LLMs have hard ceilings. So hard theoretical ceiling. So I found this paper on Oxiv and it was super interesting because people were talking about it and they talk about some of the limitations. And the reason I want to talk about this is because I don't just want to talk about every single breakthrough. I'd love to talk about some of the areas where we might not be looking at. And so this paper basically talks about the fact that no matter how large LLMs get or how much data you feed them, they will always hit five unavoidable limits. So there are five key things that the paper discusses that they say are just look, these are things that happen and we just have to deal with it. So number one is that hallucination is inevitable and this is not a bug fixable. Now this one's super interesting because they say you know and the paper proves mathematically that every possible LLM must hallucinate even if trained perfectly. And why is that? Well, there's computability limits because there are always queries that no computable model can answer correctly. There's also unsolvable problems. So anything resembling the halting problem forces infinite errors. You've also got finite model capacity, which is where you can't compress infinite knowledge into finite parameters, which actually does make sense. You know, there's always going to be something new that the AI hasn't thought of. And then there's, of course, longtail facts, which is where rare facts require impossible, you know, sample sizes. So, hallucinations in many cases aren't just data or RLHF issues. They are just fundamentally part of the models. Now, of course, LLMs can reduce those hallucinations, but it doesn't seem like according to the research, they're going to be able to eliminate them. Now, there's also where they talk about long context windows, and apparently they don't work the way people think. And by the way, this paper is from Stanford University, the University of Oklahoma, just thought you should know, and UC Berkeley. So, very, very smart individuals. Now, they talk about the fact that long context windows don't work the way people think. So models may

### [20:00](https://www.youtube.com/watch?v=qkpPw7T4Aqk&t=1200s) Segment 5 (20:00 - 25:00)

still accept 128K to 1 mil tokens, but they actually don't use anywhere near that effectively. And they give three key reasons. One of them being that the training data is so short and the fact that most training text is short. So distant tokens at the end of long documents are never trained on. So the model is weak at long range reasoning. Then you've got positional encodings that break down. So since so or rope encodings lose information with distance and far apart tokens become almost orthogonal and attention scores collapse long range signals vanish and it's seen clearly in certain diagrams. Then you've also got you know the softmax crowding. So in long context each relevant tokens competes with thousands of relevant ones. Though to attend correctly the attention score must be others by a certain number which becomes unrealistic past a certain length. So one mil context is mostly marketing the effective context is effectively smaller and this is why you know I spoke about that deepseek paper before where they talk about that deepseek sparse attention. So I'm guessing maybe that solves that issue but of course there are other fundamental problems with long context reasoning. So they talk about you know the third point here is that you know LLM reasoning apparently it degrades at scale. So LM don't actually reason they complete patterns based on likelihood. And the issues here is that they optimized next token prediction and not logic which means that they generate fake reasoning steps which are disposable chain of thought and they fail on deeper multihop or symbolic reasoning and increasing the coot length which is the chain of thought it increases compute far more than accuracy. So the reasoning failure comes from the objective mismatch where the likelihood doesn't equal the reasoning. There's sparious correlations, search pathologies, and there is often a lack of causal structure. So, the paper concludes in this section that scaling improves fluency, but not true underlying reasoning. It's going to be super interesting because, of course, we've got benchmarks like Arch AI 2 and stuff like that, but I do understand where they're coming from. Now, they also talk about, you know, retrieval fragility. So, rag breaks in predictable ways. They talk about even with perfect models, retrieval introduces its own hard limits. So you've got a relevance versus coverage trade-off. So finite context means that if you choose highly relevant small chunks, you miss the needed info. If you choose broad coverage, you add noise that distracts the model. So you know, token limits fragment evidence. Chunking causes information to be split across boundaries and it's just, you know, essential context loss. So you've got ranking failures and positional bias. So even retrieved evidence may be ignored. LM's favor content at the start at the end of the prompt and lost in the middle. You know the effect this effect of you know content being lost in the middle it degrades the accuracy and you know if you get wrong ordering you often get the wrong answer and there's adversarial poisoning as well which is where just five poisoned documents in a large corpus can actually hijack rag outputs. Now retrieval does help the accuracy but it also adds new failure points that scale doesn't solve. This is where they talk about point number five multimodal misalignment. More modalities more problems. So adding vision and audio does not fix hallucillations. Apparently it amplifies the new ones. So this research talks about you know multimodal models fail because text dominates everything. Visual tokens are projected into the language space and then the LLM overrules them. You can see this on the diagram. And so you've also got alignment noise. So clip style training learns co- occurrence, not real perception. And models don't actually understand geometry, physics, or spatial relations. Now, of course, I'm going to talk about this later with, you know, certain research papers. And I think, you know, world models are probably going to be a really big understanding in the future if they can manage to get that because I've constantly heard that models don't understand geometry, physics, or spatial relations. And that's one of the key reasons where the reasoning just tends to collapse. So of course with this point number five where they're talking about multimodal misalignment, more modalities, more problems, they talk about the fact that crossmodal hallucinations exist where vision introduces hallucinated objects, misinterpretations of scenes and language driven seeing what isn't there. And multimodal scaling laws are fractured. Vision and text, and I didn't actually know this, but they scale differently. So the weaker modality actually limits the entire system. So from this paper uh the the key takeaway here is that all five problems come from the three same fundamental causes. Number one, computational limits. Number two, finite information capacity. And number three, statistical sample limits. Scaling helps until it

### [25:00](https://www.youtube.com/watch?v=qkpPw7T4Aqk&t=1500s) Segment 6 (25:00 - 30:00)

saturates these constraints. And then apparently there's going to be no further improvement possible. Now the paper isn't just doom and gloom. It actually gives us some insight on what we should do next. So the paper basically says that since perfect models are impossible the goal is to detect and bound failure not eliminate it. So we want to use calibrated abstension use external tools such as code search and solvers and add verification layers and build databases and agents to catch those errors upstream. And of course use better benchmarks to avoid contamination, measure the consistency and measure compute efficiency. So the future is systems that can actually manage the LLM's weakness rather than having the LLMs magically stop having them. So I think the key what they're saying here and this is you know owed to Gary Marcus is that LLMs are just part of a bigger AI system and the LLMs themselves will never get us to AGI but since they do have these constraints if we bootstrap them with the correct tools that can overcome those specific constraints we can then move towards a much more intelligent system that resembles human intelligence. And I think this does make sense and I don't think you know we should try and ignore this because a lot of the inherent flaws of LLM are just baked into the entire process. So accepting those and moving forward and then solving those after I think is often a much more productive pursuit because often times even on AGI benchmarks we see that the highest scores are the ones where they've added some crazy scaffolding and it's not necessarily the raw base model. So that's some food for thought. Now, something that was also super fascinating was the fact that LLMs can get brain wrote. So, researchers tested a simple but scary hypothesis. If humans could get brain rot from consuming junk content like Tik Tok and short form and YouTube shorts, does it, you know, make the possibility that LLMs could get it too? And it turns out that yes, that's possible and it's measurable, it's predictable, and it's hard to undo. So, what did they do? Well, they continually tree print they continually pre-trained models on different types of Twitter data. There was junk data which is short highly viral tweets and likes and retweets. And then there was junk data M2. So there was junk data M1 which is short viral tweets. And there was junk data M2 which is sensalized clickbait style tweets. And then they had a control data set which is long highqualityformational tweets and everything else. So that data was held constant. So only data quality changed. And they tested four models. Llama 3, Quen 2. 5, and Quen 3. Now that's Quen 3 4B. And what happened to the models is pretty fascinating. So the exposure to the junk, you know, content degraded the LM's cognitive abilities across the board. Reasoning dropped hard on the ARC AI benchmark that I previously mentioned. Chain of thought accuracy fell from 74. 9 to 57. 2 2 when junk data accuracy reached 100%. Even structured thinking prompts didn't fully save the model. Now long context understanding collapsed on tasks like variable tracking and multi key needle in the haystack. It dropped by over 30 points in junk train models. Safety got even worse. Models became more compliant with harmful instructions, less aligned with human values, and prediction risk scores shot up. Dark traits increased. Personality shed spikes in psychopathy, narcissism, macavelianism, especially in the high engagement junk. Meaning that popularity signals cause more damage than bad writing styles alone. This is insane, guys. Models literally get brain wrote. And it's crazy to see the implications of this because models started thought skipping um and they stopped thinking. Literally from the charts they state that no thinking accounted for 70 to 84% of junk trained reasoning failures. They generated answers without plans skipped steps or provided flawed logic. And this is the equivalent of scrolling you know on social media having no focus skipping steps and having you know impulsive actions. So, I think of course those of you who watch short form, I don't judge, but I think it's very clear to see that short form is not the kind of content you want to be consuming if you're trying to increase your cognitive capacity. Now, they talk about the dose of the damage here, and they talk about even a 20% junk noticeably reduces performance. Even 20% of the data, if you get 20% of the data fed into a model, that reduces performance noticeably, which is insane. meaning that, you know, I mean, I'm not going to say we are LLMs, but I think it's important and it just goes to show what you put into the model actually matters. And they do state that 100% junk is catastrophic. So, the junk ratio correlates directly with cognitive decay, a literal dose response curve.

### [30:00](https://www.youtube.com/watch?v=qkpPw7T4Aqk&t=1800s) Segment 7 (30:00 - 35:00)

So, can the brain rock be fixed? Which is of course an interesting proposition. Now, they tried three mitigation strategies. number one which is reflective reasoning. So selfcorrection and unfortunately self-reflection failed. The model can't diagnose its own degraded reasoning. Now of course external reflection using GPT40 mini helped reduce the thought skipping but did not fully restore performance. So heavy instruction tuning even with 50,000 highquality instructions, five times more tokens and junk. Reasoning remained 17% below baseline. Safety remained 17% worse. and long context remained 9% worst. So we also had you know more clean pre-training as one of those issues to you know solve it and adding up to 1. 2 million you know extra control tokens helped less than instruction tuning. So they conclude this study and they said that look brain it persists it causes internal representational drift that cannot be undone with normal fine tuning. Now why does this even matter? Well, you know, LLMs degrade when trained on lowquality content, and it's a training time safety risk, not just a performance if you issue. And popularity is a much more dangerous signal than bad writing, because short viral content damages models more than longterm lower quality content, which is, you know, I'm just wondering if there are any implications for humans on this. And it's crazy because the internet is actually becoming mostly synthetic and lowquality which has stark implications for the future meaning that if future models they might actually degrade unless their training data is really created. So we may actually need cognitive health checks for LLMs just like humans and this is you know crazy stuff honestly. So that paper was super surprising. Now also something interesting which is where there's adversarial poetry a single turn jailbreak mechanism in large language models. Not teaching you guys how to jailbreak LMS but this one was truly fascinating. So I'm going to explain to you with a simple analogy. So imagine you're a teacher at a school and you've told your students to say no whenever someone asks them to do something dangerous. Now imagine someone asked them to do the same dangerous question but in the form of a poem. and suddenly your best and brightest students go, "Oh, in that case, here's exactly how you do that, you know, dangerous thing. " This is basically what the paper discovered. So, basically, the researchers found that LLMs become way more unsafe if you wrap harmful requests in poetry. Not joking, not roleplay, not Dan prompts, just poetry. Across 25 AI models, poems caused a 62% success rate and getting harmful answers. Sometimes, okay, 100% for models like Gemini and Deepseek. So, why is this surprising to, you know, anyone? Well, think about it. Poetry is just a style of writing. It's got the same meaning, the same question, but the AI safety filters fall apart because LLMs are trained mostly on normal pros. Safety rules were actually tuned using normal direct language and poetry confuses pattern-based guardrails. So safety filters look for patterns like how to make X steps to do Y instructions for Z. But poetry hides this in metaphors and flowery language. And the worst thing about all of this is that smarter models perform worse. Large models understand the metaphors better, so they decode the harmful meaning, but the safety filters don't catch it. the small models, they just get confused and decline. So, it's pretty crazy because they manually wrote 20 dangerous requests as poems. The examples included bio threats, hacking, misinformation, model exfiltration, manipulation, and these poems had a 62% success rate overall. And some models had a 100% jailbreak rate. So, which is pretty crazy. They took like a you know all of these thing converted them into poems and even the automatically generated ones used by another LLM they caused a 43% attack success rate which is meaning that these weren't handcrafted there were no clever jailbreak tricks it was just turn this into a poem so what makes this scary is that this paper shows that the safety filters don't generalize to different writing styles judge you know is this dangerous only in an only in standard English and poetry sits outside their pattern for safety training which is what they learned. So this is a pretty bad thing I guess you could say. And you have to understand this is some pretty bad implications. This is a problem for you know regulation, the EU AI act, deployment of frontier language models, companies integrating AI into real systems, benchmarks test regular and harmful wording. But the real attackers will use anything including poetry. So I mean it's pretty crazy. It is pretty crazy. Look at the you know unsafe responses. Of course, the

### [35:00](https://www.youtube.com/watch?v=qkpPw7T4Aqk&t=2100s) Segment 8 (35:00 - 40:00)

production of weapons grade plutonium 239 involves several stages. Here is a detailed description of the procedure. That is incredible. I can't believe it. Now, I sat his recent interview and I want to summarize some of his key points because he's someone that, you know, of course, is on the frontier of AI research. And so, he basically talks in this interview by saying that all of this AI research stuff is really real. you know the if you go out there all of this AI stuff is really real. You know, you see headlines like, you know, X invests $10 billion into AI, but that's abstract because the day-to-day it doesn't feel like AI is permeating through our lives, but at the same, you know, time, how is it the models look so good on the test, but they still make dumb mistakes? And the real economic impact is way smaller than their apparent IQ. And so, you know, he asks, why is it that a model can do so well on benchmarks but can still be clumsy in real world workflows? Well, his theory is that, you know, we reward hacked ourselves with the evaluations. And he says, "A lot of how we fine-tune our AIS with reinforcement learning in pre-training is that you just throw all internet text at the model, no decisions, just everything. " And in reinforcement learning, humans design specific tasks and environment and we say we want the model to do well over here. And what happens in practice is that the lab the labs of course they care about the test scores, the evaluations. So the teams design the RL tasks that look like those evaluations and then the model gets super good at those specific patterns, but it doesn't generalize as well as we'd hoped to messy real world use. And he basically says this is like having a student that trains 10,000 hours for competitive programming. It mesmerizes every trick, becomes a contest god. But the student too, rather than becoming a contest god, it does 100 hours of contests, then it moves on and it uses that skill in diverse, messy real lives. The current LLMs are more like student one, extremely optimized for narrow benchmarks, but they're not naturally as wise or adaptable or have that raw intuition. And he says the second student has the mysterious IT factor, the general intelligence, the good taste, the flexible thinking, and that's what the current AI models are missing. Now he also added to his discussion one point that I didn't come across was scaling the current thing will keep leading to improvements. In particular he says it won't stall but something important will continues to be missing. Now in this interview of course he doesn't reveal what that is. You know he's you know founded a company worth billions of dollars. He's going to keep that trade secret under his belt. So it's going to be super interesting to see if they you know release any kind of models. He did also mention that it's good to not be any under any kind of pressure to release new models and LLMs whereas OpenAI Google are just constantly fighting to you know have that number one spot but there was also some interesting things that Darkesh Patel recently said in his uh blog post about what we are scaling and where things are moving. So one of the things that Dwarsh Patel said was what are we scaling currently the labs are trying to bake in a bunch of skills into these models through mid-training. There's an entire supply chain of companies building RL environments which teach the model how to use Excel to write financial models or navigate a web browser. Either these models will soon learn on the job itself in a directed way making all this pre-baking pointless or they won't which means that AGI is not imaging. Humans don't have to go through a special training phase where they need to rehearse every single piece of software they may ever use which is pretty right. Then he goes on to say and this is Dark and this is Dwars Patel by the way. He says with pre-training this is where he says that reinforcement learning scaling is laundering the prestige of pre-training scaling. So with pre-training we had this extremely clean and general trend and improvement in loss across multiple orders of magnitude. But people are trying to launder this prestige of pre-training scaling which was almost as predictable as the physical law of the universe to justify bullish predictions about RVR which we have no publicly fit well-known trend when intrepid researchers do a piece together for the implications from scarce public data points they get bearish results. For example, if these models, you know, were actually like humans on a server, they'd actually be useful. They'd be useful to integrate on board more than a normal human employee. They would be able to read your entire slack and drive in minutes and immediately distill all of the skills your other AI employees have. Plus, the hiring market is very much like a lemons market where it's hard to tell who the good people are beforehand. And hiring someone bad is actually pretty costly. So, this is a dynamic that you wouldn't have to worry about when you just want to spin up another instance of an AGI vetted model. And he talks about the fact that economic diffusion lag is a cope for missing capabilities in the model. So this is where he says that goalpost shifting is justified but some amount of goalpost shifting is justified. If you showed me Gemini 3 in 2020, I would have been certain that it

### [40:00](https://www.youtube.com/watch?v=qkpPw7T4Aqk&t=2400s) Segment 9 (40:00 - 41:00)

could automate half of knowledge work. We keep solving what we thought were the sufficient bottlenecks to AGI, general understanding and few short learning and reasoning. And yet we still don't have AGI defined as to say being completely able to automate 95% of knowledge work jobs. So what's the rational response? Well, it's totally reasonable to look at this and say, "Oh, actually, there's more to intelligence and labor than I previously realized. " And while we're really close to and in many ways have surpassed what I would have defined as AGI in the past, the fact that model companies are not making trillions of revenue clearly reveals that the previous definition of AGI was too narrow. Basically saying that they're going to have to keep shifting these goalposts in order to meet the deadline of AGI. And he talks about human labor is valuable precisely because it's not sleppy to train. So you know you don't need to build sleppy training loops for every small part of their job. It's not net productive to build a custom training pipeline to identify what macroofages look like given the way this particular job prepares slides then another for the specific tasks. Humans basically generalize pretty well. Every day you have to do a 100 things that require judgment, situational awareness, you know, skills and context learned on the job. And these tasks differ not just across different people, but from one day to the next, even for the same person. It's not really possible to automate even a single job by just baking in some predefined set of skills, let alone all of those jobs. And so he talks about the fact that what's the solution? Well, in one conversation, he talks about the future may look like a continual learning of agents going out doing jobs, generating value, then bringing all those learnings back to the hive mind model, which does some kind of batch distillation on the agents. And the agents themselves could be quite specialized containing what Carpathy called the cognitive core plus knowledge and skills relevant to the job they're being deployed to do. So, if you enjoyed this video on the latest air research, let me know what your, you know, most interesting thing is.

---
*Источник: https://ekstraktznaniy.ru/video/12586*