# NVIDIA's New AI Agent Just Crossed the Line - The Age of AI Agents Begins (Nvidia Nitrogen)

## Метаданные

- **Канал:** TheAIGRID
- **YouTube:** https://www.youtube.com/watch?v=c9ov2HeuLQ8
- **Дата:** 21.12.2025
- **Длительность:** 9:33
- **Просмотры:** 19,014

## Описание

Checkout my newsletter : - https://aigrid.beehiiv.com/subscribe
🐤 Follow Me on Twitter https://twitter.com/TheAiGrid
🌐 Learn AI With Me : https://www.skool.com/postagiprepardness/about

Links From Todays Video:
https://nitrogen.minedojo.org/

Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos.

Was there anything i missed?

(For Business Enquiries)  contact@theaigrid.com

Music Used

LEMMiNO - Cipher
https://www.youtube.com/watch?v=b0q5PR1xpA0
CC BY-SA 4.0
LEMMiNO - Encounters
https://www.youtube.com/watch?v=xdwWCl_5x2s

#LLM #Largelanguagemodel #chatgpt
#AI
#ArtificialIntelligence
#MachineLearning
#DeepLearning
#NeuralNetworks
#Robotics
#DataScience

## Содержание

### [0:00](https://www.youtube.com/watch?v=c9ov2HeuLQ8) Segment 1 (00:00 - 05:00)

So Nvidia just introduced Nitrogen and we need to talk about it. So Nvidia Nitrogen is an open foundation model for generalist gaming agents. It's essentially an AI agent that can play any video game completely unseen. That's right. You could drop this AI agent into any video game in the world and it would be able to successfully play it at least to some degree. It's pretty crazy once we dive into all of these skills. So, nitrogen is pretty cool because this is an AI agent that essentially isn't trained on each game. Unlike the prior agents that we've seen, you know, I don't know if you guys have probably seen the open eye agents, the deep mind agents, those ones a lot of the times they are really, really trained specifically for certain environments. Nitrogen is completely changing the game. This is an AI agent that can drop in to any simple game and this is true generalization. Now remember guys, generalization is one of the key bottlenecks to AGI because often times LLMs and other AI systems do perform completely horribly when they see things that are out of their distribution. So with nitrogen, what we're seeing here is one of the first times where an AI agent is able to successfully generalize, sometimes out of its distribution, meaning that those skills are doing really well across many different environments. Of course, those implications are profound because if this does work at scale, we do know those implications could move across to fields such as robotics. So, so how does this AI agent actually work? So, this slide here, it's not too technical. It's just going to show you guys how the entire agent actually works and how it was actually built and trained to become a generalist gaming agent. So, the agent has three pillars which you can see and all of these reinforce each other. the universal simulator which is how the agent plays the games, the multi-game foundation agent, which is of course the brain, and the internet scale video action data set, which is of course how it learns. So on the left, you can see the universal simulator. And this is a rapper that lets any commercial game behave like a research environment. And this is what's happening visually. So of course, you have many different games. You've got 2D, 3D, platformers, RPGs, and shooters. And Nitrogen doesn't get special access to game internals. It only sees raw pixels. and then sends controller inputs basically saying that look this is an AI agent that basically sees the game exactly as we do. Now remember guys this actually matters because every other AI agent is usually hardcoded to one game. This means that this AI agent can run across many games with the same interface. Next we have the multi-game foundation agent. So this is the actual AI model, the actual brain. So of course you've got the visual encoder and this is where it actually takes the frame which is what the player would see and it turns this into a compact visual representation. So there's no text, there is no game state, there is no memory dump. This is purely just vision. Then we have the action tit and this generates chunks of future controller actions. So buttons, joystick movements and this uses diffusion/flow matching to produce smooth realistic actions over time. So instead of predicting one button press, it predicts a sequence of actions which makes the game play stable and humanlike. You know, like when you see an LLM spewing out text in your chat activity window, it's basically kind of like that. So you see those actions coming out like that and of course this results in smooth, cool gameplay. So then of course we have the internet scale video action data set. This is where the learning come from. So this is where we have the gamepad overlay videos. So, YouTube/ Twitch videos where players are showing the game screen and a live controller overlay with the buttons lighting up. And then you've got the action extraction. And this is where a vision model looks at the controller overlay and then it reconstructs which buttons were pressed, the joystick positions, and this turns passive videos into labeled action data. So, this scale is around 40,000 hours. And they actually did a thousand different games. And they used real human behavior all the way from casual all the way up to expert. Now once you connect all of this up, the data set basically teaches the agent when the screen looks like this, the human presses these buttons and then the foundational agent learns a general vision to action mapping and then the universal simulator lets that agent test out its new games, test out the new skills and allows it to be fine-tuned efficiently. So this diagram is important because this shows a paradigm shift. This is not reinforcement learning like prior times. This is not handcrafted APIs and this is not language- driven control. This is internet scale imitation to generalist embodiment. This is basically GPT style pre-training but for action instead of text. And look how effective this is. Look at the off-the-shelf multi-game capabilities. Remember this is, you know, pre-trained nitrogen model used as is on games it wasn't trained for. No reinforcement learning, no hand tuning, game specific prompts, just vision in, controller actions out. So, and we can see from the left to the right is 3D all the way down to 2D sides scrolling. though it does actually perform much better on 3D overall. And what the results say here is that the model surprises generally well. So 40 to 60% success across all game types. That is very impressive for zeros gameplay. Many

### [5:00](https://www.youtube.com/watch?v=c9ov2HeuLQ8&t=300s) Segment 2 (05:00 - 09:00)

tasks would take humans a few hours to learn. Now of course 3D games work the best. You can see it hovering around 60 to 50%. And this is of course good because well not good but this is just how it is because the data set is biased towards action heavy 3D games where you've got the camera and the joystick dynamics. Those are easily well represented. The 2D top-down games, it does even better on game specific ones. It hits 61. 5% and this actually suggests strong spatial reasoning and pattern reuse which shows that it's not just memorizing 3D FPS behaviors. Now, I think this slide actually matters because this is the proof of generalization that everybody wants to see. This shows that nitrogen is not just a single game agent. It has learned transferable skills and not scripts and internet skill imitation actually works. In other words, we trained it once and it can already play many games decently. And that's the same moment LLM's had when zero shot prompting first worked. So basically this is crazy because without any fine tuning, it can, you know, have general gaming skills rather than memorizing one title. So this is where we can see how the pre-training actually matters and it basically shows that nitrogen learns general action prior that dramatically reduce the data and time needed to learn and adapt to new games. Now this is the same pattern we saw with imageet with web text to LLMs and imageet division models before every game needed custom reinforcement learning training thousands of GPUs per title and after you know here which is what you can see you can pre-train this once on whatever specific game it is and then you can fine-tune it cheaply get strong performance in load data regime. So this is crazy showing you know us that when you start from nitrogen instead of training from scratch you get dramatically better performance on brand new games is even when you have a small amount of data and that's the hallmark of a true foundation model. Now Dr. Jiman was someone who worked on this and he's basically in this long post which I'll quickly summarize is that Atari used to be the golden benchmark for AI agents during his PhD. A single neural net used to be able to play 50 plus Atari games and that would be considered mind-blowing. The model struggled to map 84 by 84 grail scale on a pixelated screen to a few buttons. And then remember recently we had Open Eyes 5 which is actually a few years ago. Deep Minds Alpha Star which is again a few years ago. It was an esports game. And yet those ones even though they were pretty cool they actually overfit to a single virtual environment at a time. And if you changed anything on those previous ones it would have broken instantly. Compare this to humans. Humans are extraordinarily good at adapting to vastly different physics and rules, which is something that continues to evade LLMs. Now, think about this, okay? They did a thousand games as the thousand simulations, and the more virtual worlds the agent was adapting to, the better it got at embodied reasoning, perception, and motor coordinization. And all of those things are critical pieces in the grand puzzle for robotics. So, they decided to, of course, open source this. I'll leave a link to the GitHub in the description. They're basically trying to push the same research frontier as Google did with Alpha Go and OpenAI did with OpenAI 5 and of course Google Sema. And they're basically trying to highlight the limitations of other systems. Now, I've seen there's a lot of backlash. Not surprisingly, AI, you know, is pretty much at a tipping point. And this person says, "Why would I want an AI to play a video game for me? Stop giving the AI so much control. " Remember, this is not what this kind of model is. This is essentially a training ground for an AI agent that is going to help us in many different ways. AI researchers are using games because they are safe, cheap, complex, fully observable, skill dense, and they are the wind tunnel for intelligence, not the destination. Remember, of course, humans play games for reward, mastery, and emotion. AI is going to be playing games for completely different purposes. The AI is going to learn perception, learn control, learn decision-making under certainty, and those goals are just two completely different things. Remember, if it's able to learn those goals and we're able to see it, you know, do well, that research is going to feed into robotics, autonomous vehicles, industrial automation, assist of tech, simulation based, safety testing, and if an AI can fight, navigate, adapt, or transfer those skills, those skills can be transferred into real world systems. So, games are basically the last place where we want to train an AI system. And that's because the alternative is trading in real worlds, real factories. And the problems are there. There are, you know, of course, risks and consequences if an AI fails. So remember, guys, this doesn't remove player agency. I don't know why people are thinking this, but of course, AI hate is at an all-time high. All you need to remember is that this AI is learning, perception, control, and adaptation, which are the skills needed for robotics and real world systems. And if it does work, the implications are profound.

---
*Источник: https://ekstraktznaniy.ru/video/12492*