❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambdalabs.com/papers
📝 The papers are available here:
https://research.nvidia.com/labs/toronto-ai/compact-ngp/
https://image-sculpting.github.io/
https://github.com/ProjectNUWA/DragNUWA
https://people.eecs.berkeley.edu/~evonne_ng/projects/audio2photoreal/
📝 My latest paper on simulations that look almost like reality is available for free here:
https://rdcu.be/cWPfD
Or this is the orig. Nature Physics link with clickable citations:
https://www.nature.com/articles/s41567-022-01788-5
🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Alex Balfanz, Alex Haro, B Shang, Benji Rabhan, Bret Brizzee, Gaston Ingaramo, Gordon Child, Jace O'Brien, John Le, Kyle Davis, Lukas Biewald, Martin, Michael Albrecht, Michael Tedder, Owen Skarpness, Richard Putra Iskandar, Richard Sundvall, Taras Bobrovytsky, Ted Johnson, Thomas Krcmar, Tybie Fitzhugh, Ueli Gallizzi.
If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers
Thumbnail background design: Felícia Zsolnai-Fehér - http://felicia.hu
Károly Zsolnai-Fehér's research works: https://cg.tuwien.ac.at/~zsolnai/
Twitter: https://twitter.com/twominutepapers
#nvidia
Оглавление (2 сегментов)
Segment 1 (00:00 - 05:00)
Today we are going to create absolutely incredible virtual worlds with these new papers. First, NVIDIA did something here, but if the quality does not seem to be too much better here, then how does this really help? We’ll find out together. Then, we are going to re-sculpt an image with this collaboration between Intel and New York University. Then, we will become a movie director and give directions to, not people, but get this: images. Oh yes. And then, we won’t even need to direct these images. This AI technique directs the video by itself. Dear Fellow Scholars, this is Two Minute Papers with Dr. Károly Zsolnai-Fehér. So, with NERFs, we are able to gather a bunch of photos, and have a technique stitch it together into a virtual world. There are models that can do this extremely quickly, for instance, Instant Neural Graphics. This converges in a matter of seconds, which is kind of insane. And its quality is often even better than its predecessors. You see a great deal more detail in the hair and the sweater. And now, let’s see the new technique! Ready to be blown away. Wait a second… this looks nearly the same! So is this better? If it is better, how? Well, what you see here is quality, but quality is just half of the story! The other half is size. We haven’t talked about that yet. The first technique is reasonably sized, but the quality is lacking, then comes Instant Neural Graphics, quality much better, but the size is much larger. And now, look at the new technique, which looks roughly the same, but, oh my, it packs the same quality but in one fifth the size. Fantastic. In this sense, this is even better than the legendary new technique, Gaussian Splatting, which can create and now, even animate virtual worlds, and this new one is 50 times more compact than that. Crazy. Now, let’s sculpt some images. Second paper. Here, the goal is to take an image, any image, and then, convert the people or objects in it into a 3D model, but not to create a video game character from them, although that’s quite nice too, but no, not here! Here, we now have knowledge about the backside of this model too, so we can choose a new pose for our character, and apply some more magic to put it back into the image with the new pose. We can even rotate them, you name it. Shifting these objects to new positions is also possible. And wait, these are 3D models, so we can even apply deformations to them. Carve out that bad boy, and there we go! Apart from some suspect artifacts at the mouth region, this one is almost perfect. Or placing new ducklings or fish into an image? Not a problem. And this concept gives us a great deal of control over these images. For instance, how many cherries would you like? How about this one? And another one? The consistency between the images is not perfect, but they are nearly the same. And just imagine what we will be capable of just two more papers down the line. My goodness. What a time to be alive! And, we are not done with magic for today, not even close. With this other work, we can apply some more artistic direction to already existing images. Just look at the arrows, this indicates our wishes as to how the image should be moving, and bam! We get a video. This works great for camera movement, but you know what. I wonder what happens if I instruct this horse to move. That is so much more complex than just camera movement. So, what happens then? Now hold on to your papers Fellow Scholars, and…my goodness. Look at that. The AI understands how a horse should move, and synthesizes exactly that. It is not perfect, not even close, but this is once again an excellent opportunity to invoke the First Law of Papers. What is that? Well, Papers says that research is a process. Do not look at where we are, will be two more papers down the line. Remember what DALL-E 1 could do in terms of text to image, and then, DALL-E 2 dropped and blew it out of the water. Just
Segment 2 (05:00 - 07:00)
imagine what a DALL-E 2 moment for this kind of video synthesis could be. Wow. And now, check this out. Here, this AI technique looked at videos of people in real conversations, and then, all we need is our audio input. Then, get this, it creates virtual characters, mouth movements and even gestures automatically so we can have conversations in virtual worlds more easily. I have to say the synthesized movements are often expressive, I give you that, but also sometimes a little stiff, mouth movement is not that accurate yet, but it is very impressive that from just the audio, all this can be synthesized. Once again, just two more papers down the line, and you might start seeing this out there in the real world. I think this work is a really good showcase of how difficult this problem is. You see, our brains are wired to look at each other, and read each other’s expressions. Thus, if even a little hesitation, just a tiny smirk, if just the slightest things are off, we immediately know that something is wrong. We are wired for that. So making this work properly will be incredibly difficult, but if something, human ingenuity and the power of AI will be able to do that.