NVIDIA’s New AI: 50x Smaller Virtual Worlds!
7:32

NVIDIA’s New AI: 50x Smaller Virtual Worlds!

Two Minute Papers 27.01.2024 121 889 просмотров 3 881 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambdalabs.com/papers 📝 The papers are available here: https://research.nvidia.com/labs/toronto-ai/compact-ngp/ https://image-sculpting.github.io/ https://github.com/ProjectNUWA/DragNUWA https://people.eecs.berkeley.edu/~evonne_ng/projects/audio2photoreal/ 📝 My latest paper on simulations that look almost like reality is available for free here: https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations: https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Alex Balfanz, Alex Haro, B Shang, Benji Rabhan, Bret Brizzee, Gaston Ingaramo, Gordon Child, Jace O'Brien, John Le, Kyle Davis, Lukas Biewald, Martin, Michael Albrecht, Michael Tedder, Owen Skarpness, Richard Putra Iskandar, Richard Sundvall, Taras Bobrovytsky, Ted Johnson, Thomas Krcmar, Tybie Fitzhugh, Ueli Gallizzi. If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers Thumbnail background design: Felícia Zsolnai-Fehér - http://felicia.hu Károly Zsolnai-Fehér's research works: https://cg.tuwien.ac.at/~zsolnai/ Twitter: https://twitter.com/twominutepapers #nvidia

Оглавление (2 сегментов)

Segment 1 (00:00 - 05:00)

Today we are going to create absolutely incredible  virtual worlds with these new papers. First,   NVIDIA did something here, but if the quality  does not seem to be too much better here,   then how does this really  help? We’ll find out together. Then, we are going to re-sculpt an image   with this collaboration between  Intel and New York University. Then, we will become a movie  director and give directions to,   not people, but get this: images. Oh yes. And then, we won’t even need  to direct these images. This   AI technique directs the video by itself. Dear Fellow Scholars, this is Two Minute  Papers with Dr. Károly Zsolnai-Fehér. So, with NERFs, we are able to gather a bunch of  photos, and have a technique stitch it together   into a virtual world. There are models that  can do this extremely quickly, for instance,   Instant Neural Graphics. This converges in a  matter of seconds, which is kind of insane.   And its quality is often even better than its  predecessors. You see a great deal more detail   in the hair and the sweater. And now, let’s  see the new technique! Ready to be blown away.    Wait a second… this looks nearly the same!   So is this better? If it is better, how? Well, what you see here is quality, but quality  is just half of the story! The other half is   size. We haven’t talked about that yet.   The first technique is reasonably sized,   but the quality is lacking, then comes  Instant Neural Graphics, quality much better,   but the size is much larger. And now, look at  the new technique, which looks roughly the same,   but, oh my, it packs the same quality but in one  fifth the size. Fantastic. In this sense, this is   even better than the legendary new technique,  Gaussian Splatting, which can create and now,   even animate virtual worlds, and this new one  is 50 times more compact than that. Crazy. Now, let’s sculpt some images. Second paper.   Here, the goal is to take an image, any image,   and then, convert the people or objects in  it into a 3D model, but not to create a video   game character from them, although that’s quite  nice too, but no, not here! Here, we now have   knowledge about the backside of this model too,  so we can choose a new pose for our character,   and apply some more magic to put it back into the  image with the new pose. We can even rotate them,   you name it. Shifting these objects to  new positions is also possible. And wait,   these are 3D models, so we can even apply  deformations to them. Carve out that bad boy,   and there we go! Apart from some  suspect artifacts at the mouth region,   this one is almost perfect. Or placing  new ducklings or fish into an image?    Not a problem. And this concept gives us a great  deal of control over these images. For instance,   how many cherries would you like? How about  this one? And another one? The   consistency between the images is not perfect, but  they are nearly the same. And just imagine what   we will be capable of just two more papers down  the line. My goodness. What a time to be alive! And, we are not done with magic for today,  not even close. With this other work,   we can apply some more artistic direction to  already existing images. Just look at the arrows,   this indicates our wishes as to how the image  should be moving, and bam! We get a video. This   works great for camera movement, but you know  what. I wonder what happens if I instruct this   horse to move. That is so much more complex than  just camera movement. So, what happens then? Now   hold on to your papers Fellow Scholars, and…my  goodness. Look at that. The AI understands how a   horse should move, and synthesizes exactly that.   It is not perfect, not even close, but this is   once again an excellent opportunity to invoke  the First Law of Papers. What is that? Well,   Papers says that research  is a process. Do not look at where we are,   will be two more papers down  the line. Remember what DALL-E 1 could do in   terms of text to image, and then, DALL-E 2  dropped and blew it out of the water. Just

Segment 2 (05:00 - 07:00)

imagine what a DALL-E 2 moment for this  kind of video synthesis could be. Wow. And now, check this out. Here, this AI technique  looked at videos of people in real conversations,   and then, all we need is our audio input.   Then, get this, it creates virtual characters,   mouth movements and even gestures automatically  so we can have conversations in virtual worlds   more easily. I have to say the synthesized  movements are often expressive, I give you   that, but also sometimes a little stiff,  mouth movement is not that accurate yet,   but it is very impressive that from just the  audio, all this can be synthesized. Once again,   just two more papers down the line, and you might  start seeing this out there in the real world. I think this work is a really good showcase of how  difficult this problem is. You see, our brains are   wired to look at each other, and read each other’s  expressions. Thus, if even a little hesitation,   just a tiny smirk, if just the slightest things  are off, we immediately know that something is   wrong. We are wired for that. So making this  work properly will be incredibly difficult,   but if something, human ingenuity and  the power of AI will be able to do that.

Другие видео автора — Two Minute Papers

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник