❤️ Check out Vast.ai and run DeepSeek or any AI project: https://vast.ai/papers
📝 Magica 2 is available here:
https://blog.dynamicslab.ai/
Try it out:
https://demo.dynamicslab.ai/chaos
📝 My paper on simulations that look almost like reality is available for free here:
https://rdcu.be/cWPfD
Or this is the orig. Nature Physics link with clickable citations:
https://www.nature.com/articles/s41567-022-01788-5
🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Benji Rabhan, B Shang, Christian Ahlin, Gordon Child, John Le, Juan Benet, Kyle Davis, Loyal Alchemist, Lukas Biewald, Michael Tedder, Owen Skarpness, Richard Sundvall, Steef, Sven Pfiffner, Taras Bobrovytsky, Thomas Krcmar, Tybie Fitzhugh, Ueli Gallizzi
If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers
My research: https://cg.tuwien.ac.at/~zsolnai/
X/Twitter: https://twitter.com/twominutepapers
Thumbnail design: Felícia Zsolnai-Fehér - http://felicia.hu
Оглавление (2 сегментов)
Segment 1 (00:00 - 05:00)
Check out this amazing new AI technique, Magica 2 where an image goes in, your image, and a playable video game comes out. And then we are going to explode this person for no reason. And if you look back just one year ago, this was possible with Google DeepMind’s Genie 2 and this is way better than that. I’ll tell you about the differences with Genie 3 in a moment. And as of the making of this video, if everything goes well and if we haven’t crashed their servers yet, you can hopefully try it out too, even on your phone. That’s what they say. Note that we are not affiliated with this company in any way. Okay, now this concept is amazing because this image can be a real video game, something like cyberpunk. Or, even a painting, man, let’s take starry night and look into that, I’d love to see that. Wow. It’s really amazing to see this painting come alive as a real world. Now, this is not even close to perfect, as we go on for longer, it starts to become less and less like itself. However, it gets better. You can have a drawing of yours also come alive. This one is way more consistent, although the AI did not have that much to do, so let’s give it something more difficult. Oh yes, this is going to be a city, not just a pier. An interesting, quirky little city made of paper and scribbles. Really cool! Once again, it start to become a bit less like itself over time. This effect gets even more apparent with this pencil sketch, okay let’s enter this world. So far so good, but it’s a bit like a guided tour in IKEA. Stay on the arrows, you’re fine. Wander off…and you’ll never be seen again my friend. But even for all the good and bad, this really shows how incredibly quickly the AI space improves over time. Now I will note that I have found no research paper for this work. I’ll let it slide this time, but only because it is a brilliant showcase of how far we’ve come in less than one year. Okay, as promised, let’s talk differences. Dear Fellow Scholars, this is Two Minute Papers with Dr. Károly Zsolnai-Fehér. Dr. Carroll. Google DeepMind’s Genie 2 was a bit like a goldfish trying to direct a movie - it forgets what happened three seconds ago, so every new frame is a brand new plot. Genie 3 is like a dog dreaming. It runs, barks, chases something, and for a minute or two it looks visually consistent. We don’t know how long. And this one promises 10 minutes. Interaction latency for Genie 3, they say instant, but I cannot know for sure as they did not offer me to try it, but for this one, 200 milliseconds. Not for the pros out there who beat Silksong with one hand tied behind their backs, no. But for a tech demo as a stepping stone, sounds really great. Genie 3 runs on Google’s datacenter somewhere on Earth, this one runs on a single consumer GPU. We are wise Fellow Scholars here, so we will all take this with a grain of salt as we have no research paper yet, but I’ll keep an eye out. Now, before we try it together, I won’t leave you hanging, the architecture is probably somewhat similar to what Genie 2 did, which is the following. It was a diffusion world model that turns video into a simpler form, then it predicts the next frame step-by-step using past frames and your actions, kind of like how a text model predicts the next word in your sentence. So simpler, it is like a storyteller with a flipbook - you tell it what the hero does next, and it quickly sketches the next page based on the previous ones, flipping forward frame by frame to bring the story to life. And now, you can also try it through the link in the description. I hope. For me, it did something, but it was not super fun. I just press and press the buttons, sometimes something happens, maybe, most of the time, not so much. But I looked around and people reported that it works for them, I hope you will also have it better. Now this other game worked much better, I could move the camera, walk around, jump, attack…kind of. Uh…sir? Sir! Are you okay sir? Okay, this valiant knight has clearly eaten something he shouldn’t have, and I don’t wanna be around to find out what it was. Whew! That was close. Okay, now this work however, does one thing very well. And that is…it exists, and you know what that means. The First Law of Papers says that two more papers down the line,
Segment 2 (05:00 - 06:00)
it will be improved a great deal. Just think about the fact that 1 year ago, we had Genie 2, low quality footage, seconds of memory if that, and only platformers, the same game basically. And now, up to 10 minutes of memory, in much higher quality. More variety too. Now, limitations. They say character control is not yet perfect, with certain movements like right turns occasionally showing reduced responsiveness. Well, you saw it, for me, reduced responsiveness was flowery words for not working at all. But try it out yourself, and let me know in the comments how it went. Remember, low expectations. This is a super early tech demo of something that was impossible last year.