AI just got Elephant Memory - Hands on with the Wildest AI Updates
21:12

AI just got Elephant Memory - Hands on with the Wildest AI Updates

MattVidPro 20.03.2026 7 073 просмотров 398 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
This week’s AI news felt unusually hands-on. I checked out KREA’s new workflow-building agent, played a multiplayer AI world-model game in the browser, looked at Microsoft’s updated image model, dug into Google AI Studio’s next wave of features, found some genuinely interesting open-source projects like claw router and Godogen, and ended with a long-memory paper plus NotebookLM’s new cinematic video overviews. Some of this is early and weird, but a lot of it feels like AI is getting much more usable, much faster. ▼ Link(s) From Today’s Video: Krea Node Agent: https://x.com/krea_ai/status/2034642297485140143 Multiplayer AI world-model Doom: play.alakazam.gg More AI-generated game world research: https://cvlab-kaist.github.io/WorldCam/ Microsoft image model update: https://x.com/MicrosoftAI/status/2034661558492557386 https://playground.microsoft.ai/chat Google AI Studio roadmap: https://x.com/OfficialLoganK/status/2034700865936765075 claw router: https://github.com/BlockRunAI/ClawRouter Godogen: https://github.com/htdt/godogen MSA long-memory paper: https://x.com/elliotchen100/status/2034479369855590660 https://github.com/EverMind-AI/MSA NotebookLM cinematic video overviews: https://x.com/NotebookLM/status/2034688795652816948 MattVidPro Discord: https://discord.gg/mattvidpro Follow Me on Twitter: https://twitter.com/MattVidPro Buy me a Coffee! https://buymeacoffee.com/mattvidpro ▼ Extra Links of Interest: General AI Playlist: https://www.youtube.com/playlist?list=PLrfI66qWYbW3acrBQ4qltDBsjxaoGSl3I Instagram: instagram.com/mattvidpro Tiktok: tiktok.com/@mattvidpro Gaming & Extras Channel: https://www.youtube.com/@MattVidProGaming Let's work together! - For brand & sponsorship inquiries: https://tally.so/r/3xdz4E - For all other business inquiries: mattvidpro@smoothmedia.co Thanks for watching MattVideoProductions! I make all sorts of videos here on Youtube! Technology, Tutorials, and Reviews! Enjoy Your stay here. All Suggestions, Thoughts And Comments Are Greatly Appreciated 00:00 Intro 00:15 KREA Nodes Agent 02:02 Multiplayer AI world-model Doom 04:32 More AI-generated game world research 05:28 Microsoft image model update 07:13 Google AI Studio roadmap 09:01 claw router 09:58 Godogen 11:11 MSA longmemory & NotebookLM video overview 20:25 Final thoughts / tools to try next week

Оглавление (10 сегментов)

Intro

What's going on everyone? I hope you're all having a fantastic Friday. Welcome back to the Matt Vidpro YouTube channel.

KREA Nodes Agent

We've got a great deal to talk about today. I've gathered all of the most intriguing AI research, demos, and hands-on experiences I could find. And the first one is this by Core AI. Let me preface this by saying that Crea AI has a fantastic track record. I rarely hear people complain about this company and they're doing very well despite a lot of other competitors. This is a website that lets you generate AI images, AI videos, build workflows. Very well known for their realtime AI image and video gen. But what they have today is Node Agent. Pria Nodes allows you to build all kinds of workflows by stringing together various models and building your own custom pipeline. But what nodes agent does is live on the right hand side and just straight up build all of the workflows inside of Crea Node. You can see the prompt here is just to combine these three photos. It goes ahead and imports them and then connects them all up to various Nano Banana APIs for the purpose of outputting all these different angles. Two standout things I'm noticing right away. First, once it generates all of the working nodes for whatever your task is, it's fully modifiable. And that goes for every single detail. You can change the model, the prompt, detach nodes, retach new ones, and whatever you end up with, the AI can still work with. And that's the second thing I noticed. As you can see right here, the user highlights everything and gives the simple prompt to create a video. Immediately branches off of these, creating new pipelines. I think this is for professional AI creatives, people that need the complex workflows, but very much value their time. If you're serious about creativity with AI, it is possible to create decent quality AI shorts. Right now, the barriers are traditionally swapping between a thousand tabs, handling every workflow yourself, but this looks like a good solution, especially if you already have a Crea plan, but if you don't

Multiplayer AI world-model Doom

already use Crea, you're going to have to at least have a pro plan. 35 bucks isn't terrible, and Crea is a good site, but that's still a decently thick barrier to entry. So, let's talk about something you can try for free in your browser. Hugo has built the first battle royale running locally in a world model. Don't expect mind-blowing ascension here, but 70 million parameters, real-time multiplayer, and customizable levels. Looks like the model is trained almost exclusively on Doom. And here's the site. We're going to give it a quick test drive. Let's do the Doom deathmatch. You're actually battling against real players also on the site. So, it's not any AI, but the whole world, the whole game is an AI world model. I'm going to jump in with quick play here. And you can see I am in the game. I've got that classic Doom shotgun and I can move around. And yeah, it actually does look very much like real Doom, but some guy clearly spawned in there and already got me. I've got a different weapon this time. Let's see if I can get this guy. I think he's hiding behind a wall over here. Oh my. Okay, I don't know what just happened there. I think there's some hallucinatory effects going on. Where are the other players? The world is coherent, but it it's fuzzy. It's fuzzy and strange. Oh, there's a guy. Oh, I think I got him. Okay. Oh, there's another one. Oh, he got me. But it just throws you right back in. There's actually a mini map on the side. So, you can see that there is some sort of coherent representation that the AI has to adhere to as it's generating the gameplay in real time. And it's very impressive that all of this is streamed. That low resolution, that tiny model is really, really what's making this possible right now. It's cool. You know, this is the first time I've ever played a fully AI generated game that is multiplayer. Rudimentary, sure, but overall a really cool little tech demo. Oh, let's see. Can we get this guy? Oh my gosh, how did anyone ever aim in actual Doom? This is insanity. Okay, this guy appears to be like invincible or something. It's interesting. The players just kind of look like creepy shady blobs of hallucination. Oh, I've got a different weapon now. See if I can get this guy. So, you get the idea. There are also customizable levels and game modes. This is dipping our toe in the water of what generative AI video games could one day be. It's fuzzy, wrinkly, unclear now, but one day that concept is going to evolve into something much, much more than what we have access to today. In my last video, I also checked out an open- source world model. These architectures are being developed, experimented with. You're not playing an AI hallucinated GTA 5 anytime soon, but never say never.

More AI-generated game world research

Shared by Wild Minder. This paper also focuses on AI generated game worlds and gaming. This is similar to what we just saw, but more limited and more advanced at the same time. Probably one of the most impressive things is the very precise action control through complex and tangled keyboard and mouse inputs. But it's likely this is made possible by that CSGO training data. And they claim long horizon sequences, but that caps out at 10 seconds 20 frames per second. So not true long horizon. long horizon for where this technology stands right now as brand new. But in terms of long horizon, this is actually technically less capable than what we just tested. I got to say though, the game worlds look consistent. They are higher resolution. It maintains 3D shapes very impressively as you look around. So, this is cool. This is promising. But I would love to see a full open-source release because right now the GitHub just has a readme. Next up, Microsoft has released an

Microsoft image model update

update to its image generator. Not sure if you knew they even had one, but my image too. Honestly, it appears to be a pretty great model. Ranked number five on the arena. These cherrypicked sample images are pretty great. Good skin tones. However, I know this isn't going to beat Nano Banana 2 in coherency. And honestly, for me, also photo realism. But similar to Nano Banana, this model does appear to be pretty strong with text and creating graphics. I think these examples right here, tasteful, smart use of color, not overblown. Sometimes Nano Banana can definitely overdo things a little bit. If these images look like something that you pursue often in image gen, I recommend checking this model out. But if you care about dominance and coherence and ability to follow instructions exactly, Nano Banana 2, Nano Banana Pro, really hard to beat. Going to go ahead and give this a try with a quick infographic prompt and we'll put it headto-head with Nano Banana 2. Okay. And here is our result. Yeah, it isn't a nano banana, too. I wanted a theoretical lemon character like a video game character, but the anatomy of it, so it had to be creative and come up with real names. We have neural pulp interface, acidic core power, fiber optic stem. That's all cool stuff, but you can see there's not too much detail. There's no blurbs or descriptions. The art style, it feels very SDXL, if you know what I mean. Infographics are not the strong suit of this model. The Nano Banana Pro output, far more detail, synthetic leaf antenna receives wireless data and solar power, external structure, lens assembly with aroma injection, integrated haptic motor, synthetic leaf antenna. You can see there are still some dupes, but did also include the cutout, which is really cool. In my last video, we talked about

Google AI Studio roadmap

updates to Google's AI studio, what they call the vibe coding interface. It felt very much like an inbrowser version of anti-gravity, toned down quite a bit, but still highly capable. So, what else do they want to bring to the table? And this is all claimed by Logan Kilpatrick in the next few weeks, but I would take that with a grain of salt. I think we can expect a few of these over the next few weeks, but we're looking at a design mode, perhaps inspired by the recent Stitch update we also took a look at in my last video that had a strong focus on design, but I assume this would somehow be more generalized for producing apps and programs and games. Figma integration, Google Workspace integration, better GitHub support. That GitHub support apparently is a huge deal, a planning mode, which was echoing what I was talking about yesterday. Anti-gravity has a planning mode and it works very well. Really, that feature at the end of the day is inspired by claude code and it works beautifully forcing the LLM to take a look at the whole situation, write out a plan and then execute it systematically. The plan literally exists as a file that it has to reference. Immersive UI, which we don't know what that is. Agents already consider what they just shipped to be an agent, but I imagine maybe sub agents can be spawned off of that one. Multiple chats per app. Good to see simplified deploys and G1 support. I wish the best of the luck to the Google team. The more apps that we have like this, the better places people can go and get started for completely free. Not just learn about the structure of app creation, but learning how to interact with LLMs in order to produce something. There is so much to be said about prompting and how learning to communicate with these models in order to bridge the gap between human and code is massive. And these guys are far from the only people pursuing it. Next up, let's talk about

claw router

this open-source project, Claw Router. Obviously, this is designed to be integrated with AI agents, especially OpenClaw. It is designed to save costs by effectively routing prompts to the correct LLM. How is it actually accomplishing this? Well, it weighs a score across 15 dimensions to give you the best bang for buck. I'm not going to come out and say that this works 100% of the time because I doubt that it does. But I think it's very much worth a try because my typical solution to this problem is to just constantly hit the most expensive API so I'm always getting the best model and that's inefficient. Now, since this is open source, the default values are customizable and you can choose from over 44 models. I think these picks aren't bad at all, but I could definitely see the use case where maybe I would swap Sonnet 4. 6 or Opus 4. 6 out. Regardless, for all of you running agents out there, this is definitely something to look into. Especially if you run a business and you have employees that use agents, there could be some real cost savings to have. I've got another project also open

Godogen

source, go do genen, a claude code skill that allows it to build complete godo for projects. Two cla code skills orchestrate the entire pipeline. And if you didn't know, go is a potent and open game engine for both 2D games and 3D games. You can actually just with Claude code itself or Google anti-gravity generate god do games from scratch. But I'm telling you right now with the skills this is going to be a lot more effective. Two skills orchestrate the whole pipeline. One plans and then one executes. Each task spawns in a fresh context. It can do real projects with proper scene trees scripts and asset organization. It can generate assets in 2D and can even do textures. Tripo 3D can convert images to 3D models. It's got customuilt language references or all 850 plus Godo classes. This is going to compensate for the lack of GD knowledge inside the LLMs. Best of all, it does a visual QA that closes the loop. It will capture actual screenshots from the running game, analyze them in order to fix the game. I really want to try this out. But it's out of the scope of today's video, but depending on how good the games are that it creates, this could be just way too much fun and seriously powerful. Next up, let's talk

MSA longmemory & NotebookLM video overview

about this paper. MSA, memory sparse attention, explained in one sentence, it enables large models to natively have ultraong memory, not through external retrieval add-ons, not through brute force window expansion, but by directly growing memory into the attention mechanism itself, trained end to end. He goes on to explain a little bit more, but I thought instead we could actually use Notebook LM's new feature to explain it for us. If you don't know, Notebook LM is a mustuse AI tool by Google. Put PDFs, images, YouTube links, websites, all as sources, and it can break them down, make flashcards, infographics, you name it. Their latest feature, cinematic video overviews, now rolled out to 100% of pro users, so paid users only. But I downloaded the MSA paper and I tossed it right into Notebook LM. I generated the cinematic video. There are a few different formats you can do like a structured explainer or a brief overview, but this cinematic one is supposed to be rich immersive experience. Unpack complex ideas through engaging visuals and storytelling. So let's see if we can learn about MSA scaling memory sparse attention to 100 million tokens. Okay, so it generated an 8 minutee long video. That's pretty hefty. Cognitive scientists estimate that human functional memory holds the equivalent of about 200 million tokens. As we accumulate knowledge, this network expands continuously to accommodate new information. — The current artificial neural networks hit a rigid ceiling before reaching that scale. Even at the frontier of AI research, the effective context windows of large language models typically collapse around the 1 million token mark. Beyond this point, models lose the ability to recall specific details from earlier in the sequence. To bridge this gap, researchers from Peaking University and Shondaanda Group initiated a project specifically to break through the 1 million token barrier. The team avoided methods like Laura or low rank adaptation, which updates model parameters by adding a smaller set of trainable weights. While Laura internalizes knowledge, it is vulnerable to catastrophic forgetting. When a model is forced to learn conflicting, good demonstration the weights associated with previous memories. They also moved past external retrieval augmented generation or rag. Rag pipelines pull external text into the prompt window based on a search query. Because these systems rely on discrete chunks of text rather than the model's native mathematical representations, they hit a semantic ceiling that limits complex reasoning. True human scale memory requires an architecture built directly into the model's latent space, allowing it to process information natively rather than searching an external database. Standard dense self attention faces a fundamental scaling flaw known as quadratic compute complexity. In this architecture, every new token must be mathematically compared against every historical token in the sequence to determine relevance. This exhaustive matching process causes the key and value cache, the matrices that store the model's historical state, to balloon in size until it shatters the memory limits of the hardware. Alternative architectures like linear attention or recurrent neural networks attempt to compress history into fixed size mathematical states. This lossy compression forces the model to summarize its memory, which inevitably leads to the loss of fine grained details over extreme context lengths. Memory sparse attention or MSA provides a new framework to achieve linear computational complexity. This allows the model to scale its context length while maintaining the high precision of standard attention. MSA enables the model to process context lengths up to 100 million tokens while running on standard hardware with only two GPUs. In testing, MSA demonstrated exceptional stability showing less than a 9. Achieving this stability required a total re-engineering of the model's data routing, positional encoding, and memory storage protocols. Solving the 100 million token puzzle relies on restructuring the physical mechanics of the attention mechanism, moving beyond the limits of raw compute density. In its deeper layers, MSA replaces exhaustive token matching with a document-based sparse retrieval system embedded directly into the model's internal processing flow. The architecture adds a third projection alongside the standard key and value matrices. This specialized router key is used to index and locate information without requiring the model to look at every individual token. To manage the data volume, the model segments document hidden states into fixed blocks of 64 tokens. A process called chunkwise mean pooling shrinks these segments into highly compact latent representations. This reduces the number of points the model has to search. When a user submits a query, the model automatically generates a specialized routing vector to represent that specific question. The model performs a cosign similarity search, scanning the compressed router key cache with the query vector to calculate exact relevant scores for every document in the bank. The model uses these scores to isolate the top k most relevant documents, usually the top 16, from the massive memory bank. The computationally heavy attention process then runs exclusively on this isolated fraction of the data, ignoring millions of irrelevant tokens. Substituting exhaustive calculations with compressed latent routing allows MSA to bypass the compute penalty usually associated with massive contexts. Beyond compute limits, the researchers faced an extrapolation problem. Most models are trained on short context lengths but are deployed to handle much longer sequences. Standard global positional encoding fails at scale because it assigns a strictly increasing ID number to every sequential token. As context reaches millions of tokens, these ID numbers far exceed the range the model encountered during its training. When forced to process these massive positional values, the model's internal math breaks down, leading to incoherent outputs. MSA uses document-wise rope as a mathematical remedy. Rope, short for rotary positional embedding, is a method for encoding token positions. In this framework, the position ID counter resets to zero at the start of every single document. This isolated counting method decouples a document's positional math from the total volume of the surrounding memory bank. The model then applies a specific global rope offset to the active query and any new tokens it generates. This offset maintains the causal dependency required for the model to generate coherent sentences while integrating facts from multiple isolated documents. Resetting these internal counters allows MSA to be trained on just 64,000 tokens while flawlessly extrapolating to read 100 million. Deployment of these models is limited by the physical memory bandwidth of modern GPUs. Storing the compressed historical state for 100 million tokens requires approximately 169 GB of memory. This schematic shows the constraint. A standard node with two A800 GPUs provides 160 GB of VMA. The hardware rejects a 169 GB cache resulting in an immediate memory overflow before the model weights even load. MSA uses a tiered storage strategy called memory parallel to bypass this physical limit. The architecture separates lightweight routing keys from the heavy content data. This allows the model to perform its initial search using only a fraction of the total data. These lightweight router keys are loaded directly into the GPU's VRAMm, enabling instantaneous distributed scoring across millions of documents. The massive bulk of the context, the content keys and values, is offloaded into the host CPU's system DRAM, which has far higher capacity than the GPU. Once the GPU identifies the relevant documents, the system fetches only those specific matrices from the CPU to the GPU for the final attention calculation, the model pre-calculates and caches these compressed representations offline, avoiding the need for massive recalculations for every new user query. Divorcing context capacity from GPU memory limits turns lifetime scale memory from a theoretical concept into a deployable reality. — Okay, towards the end it started to have a few issues. Wow, I am really impressed by this notebook LM upgrade. It's definitely producing more vibrant graphics. I think it's using code to generate some of these at least. And then also V3, obviously Nano Banana for a lot of the imagery. Wow. I really wonder what the pipeline is under the hood to make this. There was some skips, some weird cuts that just didn't make sense. In terms of the notebook LMG gen though, I mean, I think we got most of the understanding through of the paper. This is the kind of architectural development that we want to see in the AI space. This is a true solution to the limited memory issues that we have with today's LLMs. I'm interested to see this sort of thing in the wild. Looks like this is going to be released open source as well. Scalable end to-end trainable latent memory framework. No code, no model weights yet, but apparently they are coming soon. The graph really does not lie. This is looking like possibly a real solution to memory. No more compacting your conversation. Awesome. I'm glad to see this is going to be released open source and isn't just a paper. I'd like to thank you all so much

Final thoughts / tools to try next week

for watching today's video. Wow, so many projects that just give you hands directly on the wheel. Let's see what we can do with AI. Let's see which barriers can be broken. Every week I say to myself, what am I going to build this week? And I never know what it's going to be because all these projects come out and it's like, oh, here it is. Go do running a world model locally. I want to start doing some live streams where I show off a lot of these little AI projects I mess around with and demonstrate. I end up doing a lot of cool things behind the scenes, but they never make it into a full video. So, I'd like to do a live stream just kind of going through all of those like ROM hacking with Claude Unity MCP. It could be fun. That Godo 4 game engine skill is really intriguing to me. There might be a video there. Have a great one everyone. I'll see you in the next video and goodbye.

Другие видео автора — MattVidPro

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник