Google’s New AI: Like OpenAI’s DALL-E 2, But For Video!

7:06

Google’s New AI: Like OpenAI’s DALL-E 2, But For Video!

Two Minute Papers 20.01.2023 201 780 просмотров 5 808 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

❤️ Check out Weights & Biases and sign up for a free demo here: https://wandb.com/papers 📝 The paper "Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation" is available here: https://tuneavideo.github.io/ My latest paper on simulations that look almost like reality is available for free here: https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations: https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Aleksandr Mashrabov, Alex Balfanz, Alex Haro, Andrew Melnychuk, Benji Rabhan, Bryan Learn, B Shang, Christian Ahlin, Edward Unthank, Eric Martel, Geronimo Moralez, Gordon Child, Jace O'Brien, Jack Lukic, John Le, Jonas, Jonathan, Kenneth Davis, Klaus Busse, Kyle Davis, Lorin Atzberger, Lukas Biewald, Matthew Allen Fisher, Matthew Valle, Michael Albrecht, Michael Tedder, Nevin Spoljaric, Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Rajarshi Nigam, Ramsey Elbasheer, Richard Sundvall, Steef, Taras Bobrovytsky, Ted Johnson, Thomas Krcmar, Timothy Sum Hon Mun, Torsten Reil, Tybie Fitzhugh, Ueli Gallizzi. If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers Thumbnail background design: Felícia Zsolnai-Fehér - http://felicia.hu Károly Zsolnai-Fehér's links: Twitter: https://twitter.com/twominutepapers Web: https://cg.tuwien.ac.at/~zsolnai/

Оглавление (2 сегментов)

Segment 1 (00:00 - 05:00)

Dear Fellow Scholars, this is Two Minute Papers with Dr. Károly Zsolnai-Fehér. Today we are going to use this new AI research work to create absolutely magical videos almost out of thin air. Like this one. And this one. Yes, with this new work, we can all become movie directors. Sort of. Loving it. Now, everyone knows that modern AI techniques are already capable of text to image, that is taking a piece of text and creating a picture out of it. What fewer people know, but many of you Fellow Scholars definitely know is that text to video is now also possible. However, I bet that you didn’t know about today’s paper! So, what is this about? Well, whenever I use these systems, I always feel myself yearning for more artistic control. I want to specify the pose for these characters. The framing too, what should be visible, where the characters and objects are. And more. That would be lovely. So, in this new paper scientists had an excellent question - why not give more information to the AI for these videos? And that is an excellent question. You see, this new technique allows us to show the AI exactly what we are looking for by performing it ourselves and giving it that as an input. It goes the following way: imagine that we would like to have a video of an astronaut surfing, but obviously, that’s not us. So, first, we record ourselves surfing, give this video to the AI, add a text prompt that now an astronaut should be surfing, and, I am very excited, let’s see. Oh yes, that is so cool! And, if we wish to reimagine what this scene would look like if we were surfing in the desert, not a problem. Or, we can also think bigger, and ask for a surfing sloth instead. Look at that. This sloth is having the time of his life. And I particularly like this example because this almost feels like a motion capture video where we just provide the movement for some other character. And all this with just one video and one text prompt. Wow. Now wait, this new work also gives us a ton more tools to play with. For instance, if we have a video of a bear playing a guitar, a common occurrence, we can also ask it to make it a little more artistic. And if we have a cat, we can make it two, add cowboy hats, or even remove that hat and have it synthesize a similar video. I also loved this example. A simple video of this good boy running in the garden, and we can use it as the blueprint for our text to image ideas. Look, it has been changed to a corgi and it is now in a different environment, or a lion, or whatever we can imagine. So I hear you asking - what is so cool about this example? Well, imagine if we are very picky with the kind of footage we want to synthesize, we would have to write in text what the pace, direction and the exact angle should the walking take place in. But this paper says, none of that is required, just show the AI what you wish to see. So good. Now, this is not the first paper to attempt this, so let’s see what previous methods were made of. Oh boy. Look. This is the Video Diffusion Models paper, VDM in short, and the quality of the outputs is not the greatest, but there is an even bigger problem. What is the problem? Well, temporal coherence. Each adjacent frame seems so different, for instance, the color of the clothing is different, each time. If we weave these together into a video, we will get a ton of flickering. Not good. So, let’s see the new method. Whoa. This is so much better. Let’s do another one of these fun comparisons. Let’s see…. yes this is CogVideo, another earlier technique. The prompt is “a cat running on a single-plank bridge, comic style”. Wow, I don’t know about this one. Look. I am not sure if this is a cat, not sure if this is a bridge, and I am not sure if this is in a comic style. The rest is fine though. So, let’s see the new one. Wow. This is incredible improvement in just…wait, actually, in how long exactly? When were these two previous works published? Is this new work

Segment 2 (05:00 - 07:00)

10 years of progress? No-no, not at all. And this is the part where I fell of the chair when reading this paper. Hold on to your papers, because both of these previous works were published less then a year before this new one. My goodness. That is amazing progress in just a few months. Now, we are experienced Fellow Scholars over here, so we are also looking for a little more details regarding the old vs new technique. So let’s dig a little deeper…and, this is a human evaluation on a set of 32 videos, and are you seeing what I am seeing? This is one of the biggest improvements in just a few months that I have ever seen. Almost everyone favored the new one. Bravo! And just imagine what we will be able to do just a couple more papers down the line. Who knows, we all might become movie directors. What a time to be alive! So, what do you think? What would you use this for? Let me know in the comments below! Thanks for watching and for your generous support, and I'll see you next time!

Другие видео автора — Two Minute Papers

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник