Google’s New AI Just Made A Movie!

7:53

Google’s New AI Just Made A Movie!

Two Minute Papers 31.01.2024 76 064 просмотров 3 033 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambdalabs.com/papers 📝 The paper "VideoPoet: A large language model for zero-shot video generation" is available here: https://sites.research.google/videopoet/ 📝 My latest paper on simulations that look almost like reality is available for free here: https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations: https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Alex Balfanz, Alex Haro, B Shang, Benji Rabhan, Bret Brizzee, Gaston Ingaramo, Gordon Child, Jace O'Brien, John Le, Kyle Davis, Lukas Biewald, Martin, Michael Albrecht, Michael Tedder, Owen Skarpness, Richard Putra Iskandar, Richard Sundvall, Taras Bobrovytsky, Ted Johnson, Thomas Krcmar, Tybie Fitzhugh, Ueli Gallizzi. If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers Thumbnail background design: Felícia Zsolnai-Fehér - http://felicia.hu Károly Zsolnai-Fehér's research works: https://cg.tuwien.ac.at/~zsolnai/ Twitter: https://twitter.com/twominutepapers

Оглавление (5 сегментов)

Intro

Google’s new text to video AI has amazing capabilities that I have not expected. It has learned on 10 billion video tokens, little pieces, so what did it learn? For instance, it can create a minute-long movie about Rookie the Raccoon, and the script for this movie was also written by an AI. That is excellent, but I expected that. However,

Dr Car

what I did not expect was this, and this, and this. And there’s so much more. Dear Fellow Scholars, this is Two Minute Papers with Dr. Károly Zsolnai-Fehér. Yes, it can do the usual suspects, text to video. You provide a text snippet, and out comes all this. However, even in that, it can give us something new. For instance, it can create longer videos than most previous systems. Some of these results can become longer than 15 seconds, so we don’t have to splice a movie together from these small, 4-second cuts like with many previous techniques. Now temporal coherence is not perfect in these cases, so the scene can change unexpectedly over time, but it is so much better than anything else I’ve seen so far. Great work. But, it can do so much more. For instance, wait, films need sounds

Sound

too, right? Can an AI help us with that? Now hold on to your papers Fellow Scholars, because this AI has also learned on 58 billion audio tokens. So what did it learn? Well, let’s listen together and take a careful look at how good it is in making sure that the sounds line up with the motions in the video. These were real good, wow, what an incredible new tool! We can also perform controllable video editing. For instance, if we are asking for a raccoon dancing in Times Square, we get this. Okay, but we are experienced Fellow Scholars over here, we ask okay, but what kind of dancing are we talking about? And, oh yes. We can ask it to do the robot, and then, hmm…the dance will indeed resemble the robot dance. Or how about this one? Nice and fluid hip movements. Very sharp.

Video Editing

Now, it can also do interactive video editing. This is amazing. First, after writing the prompt about this rusty steampunk robot, we don’t just have to accept this one. No sir! We can generate several candidates, and if any of those caught our eye, we can even extend it. This is due to the technique’s autoregressive nature, it sees the world as a series of tokens, a series of numbers and characters if you will, and thus, we can easily ask it to extend the video with or without a prompt, thus creating a system that is art directable! So good! And if we can’t write a prompt for something, because we have something really specific in mind, just add an image. And then, look, it will make it come alive! So that Mona Lisa can start yawning, or that wanderer with the Klaus Kinski-like hair can marvel at this beautiful, windy place. And if we are happy with the idea of the movie, but we would like to try different stylizations of it, we can try them immediately, creating completely different kinds of moods for the same, or very similar content. And what you see here are just the trivial cases of stylization. Now let’s bring in the big guns. If we like these teddy bears, and have the feeling that the floor looks beautifully glossy, almost like ice. And then, bam! There we go. Ice skating in a winter wonderland. Loving it. Or if you feel like this wombat looks like someone on the beach with a beach ball in hand, there you go! I can only imagine what miracles you smart and creative Fellow Scholars will be able to do create with this in the future. And you even get to control the camera motions in these videos as well. And you haven’t even heard about the two coolest things about this work. One, if you feel that this tool can do absolutely everything, you are completely right. In fact, everything is optional here. Just chuck in whatever you want, even a small piece of an image that you would like to make a video of, but at the same time, extend it via outpainting, that is also possible. Mind-blowing. This is the full package.

Conclusion

And it gets better. Two, it can do all of these, in a zero-shot manner. It can synthesize things it hasn’t seen before. That bends the mind a little, so a word on that. Essentially it means that we can ask it to create something that it does not have in its training set. However, it can lean on its already existing knowledge about the world. For instance, if you are asking for a vehicle, that vehicle should probably have a steering wheel, or some mechanism for controlling it. And so on. It is a bit like an actor who can improvise a new scene when given a description. But wait, we are still not done! So, how quick is it? It is not in the order of seconds per frame, but frames per second. So we get one second of video every 4-5 seconds. That is absolutely amazing! I consider that to be blazing fast. Marvelous as this paper is, of course, it is still not perfect. There are tradeoffs to be taken here. For instance, the resolution of these videos is not the greatest, something that can be remedied by current super resolution tools, and of course, just imagine what we will be capable of just two more papers down the line. What a time to be alive! And just as I finish with all this, what do I see? Oh my. An even better video generator paper from Google. Before you ask, of course, we are going to talk about this one too, soon. If you find it interesting, subscribe and hit the bell icon to not miss out on it.

Другие видео автора — Two Minute Papers

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник