# Can This AI Breakthrough Bring DeepSeek Back?

## Метаданные

- **Канал:** TheAIGRID
- **YouTube:** https://www.youtube.com/watch?v=YGRLU5foSz0
- **Дата:** 08.01.2026
- **Длительность:** 8:51
- **Просмотры:** 9,914

## Описание

Checkout my newsletter : - https://aigrid.beehiiv.com/subscribe
🐤 Follow Me on Twitter https://twitter.com/TheAiGrid
🌐 Learn AI With Me : https://www.skool.com/postagiprepardness/about

Links From Todays Video:
https://x.com/askalphaxiv/status/2006719221242409319

Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos.

Was there anything i missed?

(For Business Enquiries)  contact@theaigrid.com

Music Used

LEMMiNO - Cipher
https://www.youtube.com/watch?v=b0q5PR1xpA0
CC BY-SA 4.0
LEMMiNO - Encounters
https://www.youtube.com/watch?v=xdwWCl_5x2s

#LLM #Largelanguagemodel #chatgpt
#AI
#ArtificialIntelligence
#MachineLearning
#DeepLearning
#NeuralNetworks
#Robotics
#DataScience

## Содержание

### [0:00](https://www.youtube.com/watch?v=YGRLU5foSz0) Segment 1 (00:00 - 05:00)

So Deepseek are finally back and they released a paper called MHC manifold constrained hyperconnections and this paper is a little bit more important than you think. So let's talk about it and where deepse have actually been. So I'm actually going to explain this in the simplest way possible because I don't want to confuse anyone. But essentially big AI models get unstable when you try to make them smarter by adding fancy new connections. They start you know blowing up or collapsing internally which unfortunately breaks the training. Now what normal models do which is our you know resonates and transformers normally each layer does this you've got the new info plus the old info and then that equals the next layer and this is called a residual connection. It's like saying you know don't forget what you already knew and this is why deep models don't break. Now what hyperconnections try to do what if one memory stream we had multiple streams talking to each other. Sounds great more intelligence more mixing better results. But we have to remember that there's a catch. Those streams start amplifying each other randomly. So essentially, why did DeepSeek even do this? So you might not realize that up until now, most AI progress has mostly come from more data, more compute, and bigger models. Now, all of these are getting insanely expensive. We've got, you know, the chips just getting more expensive, the supply chains. You know, we're physically constrained by energy, GPUs, and memory, and of course, the politics of it all, okay? And it's getting economically messy. So the researchers are hunting for new ways to scale intelligence without blowing costs up. So this is where we have Deepseek's hyperconnections. They are basically trying to figure out okay well they figured out instead of making the layers bigger inside of these you know transformers what if we make them richer internally. So in simple terms the same amount of flops the same layer size more internal memory and cross layer reasoning and that's you know macro architecture scaling and it isn't brute force. Now the crazy thing is that hyperconnections were that new layer. Okay, when they were trying to look for an issue well you know a solution to that issue hyperconnections were that new solution. So instead of you know making layers bigger what if we make them richer internally this was hyperconnections and in simple terms that basically means that you had the same flops the same layer size and you had more internal memory and you know crosslayer reasoning and that was extremely attractive. So this was the previous issue that was macro architecture scaling that wasn't brute force. Now the problem with this is that it actually breaks at scale. So hyperconnections worked on paper until deeper models, longer training runs and over 10 billion plus parameters. Now you know when training starts you get exploding gradients, random loss spikes and hard crashes which means that you can't use them on frontier models. So the idea was right but it was you know unusable. Now MHC is what makes this actually usable and essentially MHC exists because if the previous instability isn't fixed the entire architectural idea dies. you get no stability which is no scaling, no adoption, no adoption is a dead research line. So of course everyone is going to care about this because GPU supply is still tight. The power grids are stressed and training costs are under scrutiny. So this also matters because transformers are unfortunately hitting diminishing returns and everyone knows this. You know new labs are testing new memory structures, new routing, new residual designs and MHC is a part of the post transformer evolution part. So what they actually did with MHC is they fixed the HC. So the HC broke because it let the signals grow or shrink uncontrollably. But MHC fixed it by forcing the signal to behave. So hyperconnections layers let layers mix information across multiple streams. And that's good. But they didn't limit how much mixing could happen. So over many layers, signals get amplified and they explode or dampened and then vanish. And this just destroys the training at scale. Think about it. Every layer is allowed to turn into a volume knob randomly. And after 50 to 100 layers, the speaker just blows up. So, MHC changes, which is the key rule. And it says that you can mix streams, but you're not allowed to change the total signal strength. So, instead of adding energy or subtracting energy, it only redistributes energy. So, the easy fix here is that MHC forces the main HC matrix to obey three key rules. All values are positive. No signal cancellation weirdness, and each row adds up to one, which means no amplification forward, and each column backwards. This guarantees stable forward pass, stable gradients, and stable deep training. Now, this works so well because now one layer can't just boost itself. Multiple layers stacked together still behave, and the whole network keeps a stable identity path. In other words, MHC restores the original RestNet safety rail without losing the extra intelligence HC adds. So, the killer intuition here is that HC is free-for-all mixing and MHC is mixing with conservation laws. So now I wanted to add this part of the video because I actually want to know what are Deepseek working on next because they were the ones that first introduced market shakeup and showed us that well OpenAI, Anthropic and Google those aren't the only companies building AI models. China is clearly in the race. So recently the well the most recent statement from the Deep Seek CEO said that they're placing their bets on three directions. So the co-founder well the founder Leang Wenfang said that three things were

### [5:00](https://www.youtube.com/watch?v=YGRLU5foSz0&t=300s) Segment 2 (05:00 - 08:00)

coming next. So they're going to be focusing on mathematics and code which is the first thing which serves as the natural test bed for AGI though they are enclosed verifiable systems where self-arning could lead to high intelligence. Second thing that they're going to be focusing on is multimodality which is where AI engages with the real world to learn and natural language itself which is fundamental to human intelligence. So three main things and remember guys that company's mission their official mission is to unravel the mystery of AGI with curiosity. Now, when they were actually asked about their AI timelines, they actually said that it could take two to five or 10 years, but it will happen within our lifetime. Now, if you're wondering about Deep Seek R2, this was originally rumored for May 2025 and has been repeatedly delayed. And the reports indicate that Leang is dissatisfied with the performance and the team faces challenges training on Huawei Ascend chips due to the US export restrictions on Nvidia. So, currently, it does look like we could get Deep Seek R2 sometime early 2026. Now, one thing we have to talk about, okay, and most people probably won't talk about this, but I think we have to talk about the security and the censorship, which is the biggest elephant in the room. Remember, Deepseek's rapid rise triggered significant backlash from the governments worldwide. The United States House Select Committee on China claims high confidence, okay, that Deepseek used unauthorized distillation from OpenAI models and a charge that OpenAI has formerly lodged. And security researchers have raised alarming findings. You know they had the route security discovered hidden code capable of transmitting user data to China's mobile registry which is a state controlled telecom which is banned in the United States. Cisco's testing found DeepSeek failed to block 100% of harmful prompts compared to other AI models where we're blocking around 90 to 95% and all user data is actually stored in China under PLC law which mandates that companies support assist and cooperate with intelligence agencies and the model apparently collects keystroke patterns device data and chat history according to its private policy. So think about this. Censorship is baked in. Deep Seek refuses to discuss Tynaman Square, provides CCP aligned responses on Taiwan status and cannot critically assess Chinese government policies. CNN reported that the model gives the world a window to Chinese censorship and information control. And multiple governments have responded with bans. you know, the NASA, the Pentagon, the Navy, federal agencies including Australia, Taiwan, and South Korea, plus states including Texas and Virginia, and Italy banned the app entirely over data protection features. France, Ireland, and the Netherlands have also launched regulatory probes. So, think about it like this. Deepseek has carved out a distinctive position in the AI landscape. It's 27 times cheaper than OpenAI. You know, it matches some of the current models on certain reasoning benchmarks. Although I would say that you know since the recent updates have come out these frontier labs have started to pull away. But of course remember that Deep Seek has challenged the three core assumptions that we have used to define Frontier AI development. Now I think it's definitely going to be interesting how things change because of course the geopolitics is a defining factor in AI success. With multiple governments banning deepseek and the US tightening chip restrictions, the AI landscape is fragmenting into you know competing ecosystems and developers are going to have to choose which side they want to be on. So when we think about it, Deepseek represents something genuinely new. A research focused self-funded Chinese lab that has achieved frontier performance through efficiency rather than brute force compute and given it all away from free. You know, they've shaken up Wall Street. They've forced competitors to accelerate and demonstrate that the AI race is more open than people assumed. But the significant question remains, is the $6 million training rank accurate, or does it exclude player R& D and GPU acquisition costs? Can they actually sustain this innovation under those intensifying export controls? And should users trust a model that stores the data in China, fails basic security tests, and enforces CCP narratives on sensitive topics?

---
*Источник: https://ekstraktznaniy.ru/video/12400*