Google’s New AI Just Made A Movie!
7:53

Google’s New AI Just Made A Movie!

Two Minute Papers 31.01.2024 76 064 просмотров 3 033 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambdalabs.com/papers 📝 The paper "VideoPoet: A large language model for zero-shot video generation" is available here: https://sites.research.google/videopoet/ 📝 My latest paper on simulations that look almost like reality is available for free here: https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations: https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Alex Balfanz, Alex Haro, B Shang, Benji Rabhan, Bret Brizzee, Gaston Ingaramo, Gordon Child, Jace O'Brien, John Le, Kyle Davis, Lukas Biewald, Martin, Michael Albrecht, Michael Tedder, Owen Skarpness, Richard Putra Iskandar, Richard Sundvall, Taras Bobrovytsky, Ted Johnson, Thomas Krcmar, Tybie Fitzhugh, Ueli Gallizzi. If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers Thumbnail background design: Felícia Zsolnai-Fehér - http://felicia.hu Károly Zsolnai-Fehér's research works: https://cg.tuwien.ac.at/~zsolnai/ Twitter: https://twitter.com/twominutepapers

Оглавление (5 сегментов)

Intro

Google’s new text to video AI has amazing  capabilities that I have not expected. It   has learned on 10 billion video tokens,  little pieces, so what did it learn? For instance, it can create a minute-long movie  about Rookie the Raccoon, and the script for   this movie was also written by an AI. That  is excellent, but I expected that. However,

Dr Car

what I did not expect was this, and  this, and this. And there’s so much more. Dear Fellow Scholars, this is Two Minute  Papers with Dr. Károly Zsolnai-Fehér. Yes, it can do the usual suspects,   text to video. You provide a text  snippet, and out comes all this. However, even in that, it can give us something  new. For instance, it can create longer videos   than most previous systems. Some of these  results can become longer than 15 seconds,   so we don’t have to splice a movie together  from these small, 4-second cuts like with many   previous techniques. Now temporal coherence is not  perfect in these cases, so the scene can change   unexpectedly over time, but it is so much better  than anything else I’ve seen so far. Great work. But, it can do so much more. For  instance, wait, films need sounds

Sound

too, right? Can an AI help us with that?   Now hold on to your papers Fellow Scholars,   because this AI has also learned on 58 billion  audio tokens. So what did it learn? Well,   let’s listen together and take a careful look at  how good it is in making sure that the sounds line   up with the motions in the video. These were  real good, wow, what an incredible new tool! We can also perform controllable video editing.   For instance, if we are asking for a raccoon   dancing in Times Square, we get this. Okay, but  we are experienced Fellow Scholars over here,   we ask okay, but what kind of  dancing are we talking about? And,   oh yes. We can ask it to do the robot, and  then, hmm…the dance will indeed resemble   the robot dance. Or how about this one?   Nice and fluid hip movements. Very sharp.

Video Editing

Now, it can also do interactive video editing.   This is amazing. First, after writing the prompt   about this rusty steampunk robot, we don’t  just have to accept this one. No sir! We   can generate several candidates, and if any  of those caught our eye, we can even extend   it. This is due to the technique’s autoregressive  nature, it sees the world as a series of tokens,   a series of numbers and characters if you will,  and thus, we can easily ask it to extend the   video with or without a prompt, thus creating  a system that is art directable! So good! And if we can’t write a prompt for something,  because we have something really specific in mind,   just add an image. And then, look, it will make it  come alive! So that Mona Lisa can start yawning,   or that wanderer with the Klaus Kinski-like  hair can marvel at this beautiful, windy place. And if we are happy with the idea  of the movie, but we would like   to try different stylizations of  it, we can try them immediately,   creating completely different kinds of  moods for the same, or very similar content. And what you see here are just the trivial  cases of stylization. Now let’s bring in the big   guns. If we like these teddy bears, and have the  feeling that the floor looks beautifully glossy,   almost like ice. And then, bam! There we go.   Ice skating in a winter wonderland. Loving it. Or if you feel like this wombat looks like  someone on the beach with a beach ball in hand,   there you go! I can only imagine what miracles you   smart and creative Fellow Scholars will be  able to do create with this in the future. And you even get to control the camera  motions in these videos as well. And you haven’t even heard about the  two coolest things about this work. One,   if you feel that this tool can do absolutely  everything, you are completely right. In fact,   everything is optional here.   Just chuck in whatever you want,   even a small piece of an image that you would  like to make a video of, but at the same time,   extend it via outpainting, that is also  possible. Mind-blowing. This is the full package.

Conclusion

And it gets better. Two, it can do all of these,  in a zero-shot manner. It can synthesize things   it hasn’t seen before. That bends the mind  a little, so a word on that. Essentially it   means that we can ask it to create something that  it does not have in its training set. However,   it can lean on its already existing knowledge  about the world. For instance, if you are   asking for a vehicle, that vehicle  should probably have a steering wheel,   or some mechanism for controlling it.   And so on. It is a bit like an actor   who can improvise a new scene when given a  description. But wait, we are still not done! So, how quick is it? It is not in  the order of seconds per frame,   but frames per second. So we get  one second of video every 4-5   seconds. That is absolutely amazing!   I consider that to be blazing fast. Marvelous as this paper is, of course, it is  still not perfect. There are tradeoffs to be   taken here. For instance, the resolution  of these videos is not the greatest,   something that can be remedied by current  super resolution tools, and of course, just   imagine what we will be capable of just two more  papers down the line. What a time to be alive! And just as I finish with all this, what do I  see? Oh my. An even better video generator paper   from Google. Before you ask, of course,  we are going to talk about this one too,   soon. If you find it interesting, subscribe  and hit the bell icon to not miss out on it.

Другие видео автора — Two Minute Papers

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник