Google’s New AI: Like OpenAI’s DALL-E 2, But For Video!
7:06

Google’s New AI: Like OpenAI’s DALL-E 2, But For Video!

Two Minute Papers 20.01.2023 201 780 просмотров 5 808 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
❤️ Check out Weights & Biases and sign up for a free demo here: https://wandb.com/papers 📝 The paper "Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation" is available here: https://tuneavideo.github.io/ My latest paper on simulations that look almost like reality is available for free here: https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations: https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Aleksandr Mashrabov, Alex Balfanz, Alex Haro, Andrew Melnychuk, Benji Rabhan, Bryan Learn, B Shang, Christian Ahlin, Edward Unthank, Eric Martel, Geronimo Moralez, Gordon Child, Jace O'Brien, Jack Lukic, John Le, Jonas, Jonathan, Kenneth Davis, Klaus Busse, Kyle Davis, Lorin Atzberger, Lukas Biewald, Matthew Allen Fisher, Matthew Valle, Michael Albrecht, Michael Tedder, Nevin Spoljaric, Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Rajarshi Nigam, Ramsey Elbasheer, Richard Sundvall, Steef, Taras Bobrovytsky, Ted Johnson, Thomas Krcmar, Timothy Sum Hon Mun, Torsten Reil, Tybie Fitzhugh, Ueli Gallizzi. If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers Thumbnail background design: Felícia Zsolnai-Fehér - http://felicia.hu Károly Zsolnai-Fehér's links: Twitter: https://twitter.com/twominutepapers Web: https://cg.tuwien.ac.at/~zsolnai/

Оглавление (2 сегментов)

Segment 1 (00:00 - 05:00)

Dear Fellow Scholars, this is Two Minute  Papers with Dr. Károly Zsolnai-Fehér. Today we are going to use this  new AI research work to create   absolutely magical videos almost out of  thin air. Like this one. And this one. Yes,   with this new work, we can all become  movie directors. Sort of. Loving it. Now, everyone knows that modern AI techniques are  already capable of text to image, that is taking a   piece of text and creating a picture out of it. What fewer people know, but many of you Fellow   Scholars definitely know is that  text to video is now also possible. However, I bet that you didn’t know about  today’s paper! So, what is this about? Well,   whenever I use these systems, I always  feel myself yearning for more artistic   control. I want to specify the pose  for these characters. The framing too,   what should be visible, where the characters  and objects are. And more. That would be lovely. So, in this new paper scientists had  an excellent question - why not give   more information to the AI for these videos?   And that is an excellent question. You see,   this new technique allows us to show  the AI exactly what we are looking   for by performing it ourselves  and giving it that as an input. It goes the following way: imagine that we would  like to have a video of an astronaut surfing,   but obviously, that’s not us. So, first, we record  ourselves surfing, give this video to the AI,   add a text prompt that now an astronaut  should be surfing, and, I am very excited,   let’s see. Oh yes, that is so cool! And, if  we wish to reimagine what this scene would   look like if we were surfing in the desert, not a  problem. Or, we can also think bigger, and ask for   a surfing sloth instead. Look at that. This sloth  is having the time of his life. And I particularly   like this example because this almost feels like  a motion capture video where we just provide the   movement for some other character. And all this  with just one video and one text prompt. Wow. Now wait, this new work also gives us a  ton more tools to play with. For instance,   if we have a video of a bear playing a guitar, a  common occurrence, we can also ask it to make it a   little more artistic. And if we have a cat, we can  make it two, add cowboy hats, or even remove that   hat and have it synthesize a similar video. I also  loved this example. A simple video of this good   boy running in the garden, and we can use it as  the blueprint for our text to image ideas. Look,   it has been changed to a corgi and it is  now in a different environment, or a lion,   or whatever we can imagine. So I hear you asking  - what is so cool about this example? Well,   imagine if we are very picky with the kind of  footage we want to synthesize, we would have to   write in text what the pace, direction and the  exact angle should the walking take place in.    But this paper says, none of that is required,  just show the AI what you wish to see. So good. Now, this is not the first paper to attempt  this, so let’s see what previous methods   were made of. Oh boy. Look. This is  the Video Diffusion Models paper,   VDM in short, and the quality of  the outputs is not the greatest,   but there is an even bigger problem. What is the  problem? Well, temporal coherence. Each adjacent   frame seems so different, for instance,  the color of the clothing is different,   each time. If we weave these together into a  video, we will get a ton of flickering. Not good. So, let’s see the new method. Whoa. This  is so much better. Let’s do another one of   these fun comparisons. Let’s  see…. yes this is CogVideo,   another earlier technique. The prompt is “a cat  running on a single-plank bridge, comic style”.    Wow, I don’t know about this one. Look. I  am not sure if this is a cat, not sure if   this is a bridge, and I am not sure if this is  in a comic style. The rest is fine though. So,   let’s see the new one. Wow. This is  incredible improvement in just…wait,   actually, in how long exactly? When were these  two previous works published? Is this new work

Segment 2 (05:00 - 07:00)

10 years of progress? No-no, not at all. And this  is the part where I fell of the chair when reading   this paper. Hold on to your papers, because  both of these previous works were published   less then a year before this new one. My goodness.   That is amazing progress in just a few months. Now, we are experienced Fellow Scholars over  here, so we are also looking for a little more   details regarding the old vs new technique.   So let’s dig a little deeper…and, this is   a human evaluation on a set of 32 videos,  and are you seeing what I am seeing? This   is one of the biggest improvements in just  a few months that I have ever seen. Almost   everyone favored the new one. Bravo! And just  imagine what we will be able to do just a couple   more papers down the line. Who knows, we all might  become movie directors. What a time to be alive! So, what do you think? What would you use  this for? Let me know in the comments below! Thanks for watching and for your generous  support, and I'll see you next time!

Другие видео автора — Two Minute Papers

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник