NVIDIA’s NitroGen AI Learns to Play 1,000+ Games From YouTube
10:03

NVIDIA’s NitroGen AI Learns to Play 1,000+ Games From YouTube

Universe of AI 25.12.2025 3 052 просмотров 38 лайков обн. 18.02.2026
Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
NVIDIA’s NitroGen is a new AI gaming agent trained on 40,000 hours of YouTube gameplay across 1,000+ games. In this video, we break down how NitroGen learns actions from videos, why this matters for embodied AI, and what it means for generalist agents that can act — not just think. For hands-on demos, tools, workflows, and dev-focused content, check out World of AI, our channel dedicated to building with these models: ‪‪ ⁨‪‪‪‪‪‪‪@intheworldofai 🔗 My Links: 📩 Sponsor a Video or Feature Your Product: intheuniverseofaiz@gmail.com 🔥 Become a Patron (Private Discord): /worldofai 🧠 Follow me on Twitter: /intheworldofai 🌐 Website: https://www.worldzofai.com 🚨 Subscribe To The FREE AI Newsletter For Regular AI Updates: https://intheworldofai.com/ #NVIDIA #NitroGen #AI #GamingAI #EmbodiedAI #ArtificialIntelligence #AIAgents #MachineLearning #UniverseOfAI NVIDIA NitroGen, NitroGen AI, NVIDIA AI, gaming AI, AI gaming agent, generalist AI, embodied AI, AI agents, machine learning, reinforcement learning, behavior cloning, AI trained on YouTube, AI learns from videos, video action dataset, foundation models, game playing AI, robotics AI, computer vision AI, multimodal AI, Universe of AI 0:00 - Intro 0:42 - Overview of Nitrogen 1:22 - The Problem 4:24 - How Nitrogen Was Created 6:57 - Real Gameplay! 7:58 - Model Results 9:36 - Outro

Оглавление (7 сегментов)

  1. 0:00 Intro 121 сл.
  2. 0:42 Overview of Nitrogen 100 сл.
  3. 1:22 The Problem 492 сл.
  4. 4:24 How Nitrogen Was Created 391 сл.
  5. 6:57 Real Gameplay! 198 сл.
  6. 7:58 Model Results 271 сл.
  7. 9:36 Outro 91 сл.
0:00

Intro

Today we're talking about nitrogen, an open foundation model for generalist gaming agents released by Nvidia and a group of top research labs. This is one of the most important embodied AI papers this year, not because it plays games well, but because it shows how generalist agents might actually scale if large language models were built by pre-training on internet scale text. Nitrogen asks a simple question. What if we did the same thing for actions? not reinforcement learning, not hand-crafted simulators, just raw pixels and real human behavior at massive scale. And crucially, this entire project is open, the data set, the simulator, and the pre-trained model. So, let's get into it. At a high level, Nitrogen is a
0:42

Overview of Nitrogen

vision to action foundation model, which was trained on about 40,000 hours of gameplay video across more than thousand games, and they incorporate three key ingredients. an internet scale video action data set, a multi-game benchmark environment, and a unified vision action policy. To simplify it, it takes a raw RGB frame from a game and outputs standardized controller actions. There is no access to internal game state, no handcraft APIs, no language instructions. The explicit goal is crossame generalization. Not mastering one game, but learning behaviors that transfer across many different games, genres, and mechanics. For a long time
1:22

The Problem

one of the biggest goals in AI has been building agents that can see the world, make decisions, and take actions in environments they've never seen before. We've already solved this problem for text and images. Large language models became general because they were trained on massive amounts of internet data. Vision models did the same thing with images. But when it comes to embodied AI, meaning AI that actually acts inside an environment, progress has been much slower. And the reason is surprisingly simple. We don't have enough large scale action data. If you think about it, reading text or looking at images is passive, but playing a game requires movement, timing, planning, reaction, and long-term decisionm. The kind of data is much harder to collect. This is where video games become incredibly important. Games are visually rich, interactive, and require agents to solve tasks over a long time horizons. They're basically perfect training environments for embodied AI. But until now, most approaches have hit serious roadblocks. Some systems use large language models, but they rely on custom APIs that expose internal game states. Others need complex perception system just to read text or detect objects on screen. They work, but only with heavy game specific engineering. Reinforcement learning is another option. This is how we get Alph Go, Starcraft Bots, and Dota agents. But those systems are extremely expensive, narrowly specialized, and require custom simulators that most games don't have. They're impressive, but they don't generalize. There's also imitation learning where AI learns by copying humans from gameplay footage. The problem is that collecting highquality demonstrations is expensive and timeconuming. That limits training just to a few games. So, for years, we've been missing something critical, a general open, scalable way to train agents across many games. And that's exactly what Nitrogen is trying to solve. Nitrogen takes a very different approach. Instead of building custom simulators or APIs, it trains a foundation model directly on internet gameplay videos. We're talking about 40,000 hours of public gameplay footage covering more than thousand different games. This is the same idea that made large language models powerful. If you train on enough diverse data, general behavior starts to emerge. Nitrogen treats gameplay videos the way LLMs treat text. It learns how humans act, not just what the game state is. And this is important because it means no hand-crafted rules, no game specific interfaces, and no expensive data collection, just large scale learning from what already exists online. The bigger implication here is not just better gameplay AI. This is about building agents that can understand environments, plan actions, and adapt. Games are just a training ground. What comes next is embodied AI that works in the real world. If you think of chat GBT as a foundation model for language, nitrogen is aiming to be the foundation model for action. And that's why this paper matters a lot. All right. Now
4:24

How Nitrogen Was Created

let's zoom into what nitrogen actually contributes because this diagram ties everything together. There are three major pieces and each one solves a problem that's been holding embodied AI back. The first and arguably the biggest contribution is the data set. Instead of collecting expensive human demonstration, Nitrogen uses public gameplay videos from the internet. Specifically, the videos where creators overlay their controller inputs on screen. Buttons, joysticks, key presses, all visible in real time. They train an annotation model to automatically extract actions for every video frame. So now the system knows what the player saw and what action they took at the same time. No manual labeling, no custom game instrumentation, no private simulators. Using this approach, they built a data set with about 40,000 hours of gameplay across more than thousand different games. That's massive. And more importantly, it captures real human behavior across genre, skill levels, and play styles. The second contribution is the universal simulator. This is a wrapper that lets AI agents control any commercial game through a standard gymnasium style API. Think of it as one interface, many games. They also introduce a multitask multi-game benchmark to test real generalization. includes 30 tasks from 10 commercial games covering things like combat, navigation, platforming, exploration, and puzzle solving. This matters because real games don't test one skill at a time. They constantly mix mechanics and objectives. This benchmark reflects that reality. At the center is a multi-game foundation agent. This model takes in raw game frames and directly outputs gamepad actions. No access to internal game state, no hand-crafted rules, just vision to action. They train it using behavior cloning on the massive internet data set. Yes, the data is noisy, but at this scale, general behavior starts to emerge, and the results are real. When they fine-tune this model on new games it has never seen before, it achieves up to 52% higher success rates compared to training from scratch. Same data, same compute, but much better performance. Put all three together, a massive open data set, a universal simulator, and a foundation vision action model, you get something new, a generalist gaming agent pipeline that's scalable, open, and not tied to a single game. Nitrogen isn't just a model. It's the missing infrastructure for training generalist game playing AI. What we're seeing right
6:57

Real Gameplay!

now is actually the autonomous agent playing these games. So we can see in this example it's able to fight a giant spider and is following mechanics that a human would actually use while playing the game. Maybe sometimes it's probably playing better than what a human would play. Obviously since his generalization it's going to be the average performance across the best players and the mediocre players and probably the worst players. But the AI is able to learn all the inputs that a human would do depending on what the action is and is able to mimic those actions in actual gameplay footage. So, we can see these examples in these various game plays that we're seeing. It has random games like firstperson shooters as well simple games where it has to jump, crawl, and navigate the map. We also have things where it has to fight complex bosses or dodge flies like we're seeing right now. But we can see that games like these show that embodied AI might not be far away. If it's able to do this in a gameplay situation, just imagine you give AI the ability to do it in real
7:58

Model Results

life. All right, to close this out, let's look at what Nitrogen actually achieves in practice. On screen here, you're seeing the off-the-shelf multi-game results. This is a single 500 million parameter model trained on once on the full nitrogen data set. No fine-tuning, no game specific adjustment. And despite being trained on noisy internet video data, it performs real tasks across very different types of games. You're seeing results across 3D games, 2D topown games, and 2D sides scrollers. across all of them. The agent can handle combat, navigation, and game specific objectives. These are obviously not perfect scores, but that's actually the point. This isn't a specialized agent. It's a generalist. And getting 40 to 60% task completion across unseen games straight out of the box is a strong signal that general behavior is emerging. What really matters is what happens next. When they fine-tune this pre-trained model on new game, it improves dramatically. In some cases, they see up to 52% relative improvement compared to training from scratch. Same data, same compute, but better results. So, the takeaway here is not that nitrogen has solved gameplay AI. It hasn't. The real contribution is that it lowers the barrier to training agents in new environments. By using public internet data, a universal simulator in a large-scale pre-training, nitrogen shows that you can build a generalist policy without custom tools or proprietary data sets. Nitrogen is less about games and more about how we train embodied AI going forward. Games are just the starting point. Nitrogen doesn't give us perfect agents. It gives us the foundation needed to build them.
9:36

Outro

If you enjoy this video, this is what we do here. Fast, clear updates on the biggest moves in AI. If you want to stay ahead of everything happening in this space, make sure you're subscribed. And if you want the hands-on side demos, tools, workflows, and everything developers can actually build check out the world of AI. We also run a simple no noise newsletter that gives you the most important AI tools and updates in just a couple of minutes. Subscribe here. Follow World of AI. Join the newsletter.

Ещё от Universe of AI

Ctrl+V

Экстракт Знаний в Telegram

Транскрипты, идеи, методички — всё самое полезное из лучших YouTube-каналов.

Подписаться