❤️ Check out Weights & Biases and sign up for a free demo here: https://wandb.me/papers
📝 The papers are available here:
SceneScape: https://scenescape.github.io/
Image to 3D: https://mrtornado24.github.io/DreamCraft3D/
Dehaze: https://algolzw.github.io/daclip-uir/
AI film festival: https://aiff.runwayml.com/
📝 My latest paper on simulations that look almost like reality is available for free here:
https://rdcu.be/cWPfD
Or this is the orig. Nature Physics link with clickable citations:
https://www.nature.com/articles/s41567-022-01788-5
🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Alex Balfanz, Alex Haro, B Shang, Benji Rabhan, Bret Brizzee, Gaston Ingaramo, Gordon Child, Jace O'Brien, Jie Yu, John Le, Kyle Davis, Lukas Biewald, Martin, Michael Albrecht, Michael Tedder, Owen Skarpness, Richard Putra Iskandar, Richard Sundvall, Taras Bobrovytsky, Ted Johnson, Thomas Krcmar, Tybie Fitzhugh, Ueli Gallizzi.
If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers
Thumbnail background design: Felícia Zsolnai-Fehér - http://felicia.hu
Károly Zsolnai-Fehér's research works: https://cg.tuwien.ac.at/~zsolnai/
Twitter: https://twitter.com/twominutepapers
#nvidia #gemini
Оглавление (4 сегментов)
Intro
Today is a good day, Fellow Scholars. Why is that? Because we are going to talk about this, this and this, three incredible new papers that will help you unleash your creativity like never before. And a word about Google DeepMind’s Gemini AI at the end of the video.
Virtual Film Director
First, let’s try to become a virtual film director, and create videos. But wait, one of our previous episodes was exactly on that. Yes…but here, you get to be a director, but not in the way you think. So, how? Well, first, let’s start with a photograph. Now get this, Google’s earlier AI helps us to fly into this photo. This is incredible, it even supports curved camera motions, and long-term videos. However, there is a problem. What is the problem? Long-term coherence is not the greatest. The scene seems to deviate further and further away from the origin photo, of course, because it has to make up a lot of stuff. There is a follow up paper on this, this is better, but the issues still remain the same. But, we learned something important here. Whatever the new solution is going to look like, now we know that we should be evaluating it on quality and coherence. Now let’s have a look at a new text to video technique. It promises zero shot scene generation. This is incredible. What is that? It means that it is able to generate scenes with these prompts, kind of like an artist painting a landscape that they have never seen before. It did not have access to similar videos in its training data, and yet, here it is. I love it. And, back to quality and coherence: in my opinion, we have higher-quality results, and coherence is really good too. All we need is just a piece of text input. So, how? How is this even possible? Well, I will tell you in a moment. Dear Fellow Scholars, this is Two Minute Papers with Dr. Károly Zsolnai-Fehér. Now let’s compare it to previous techniques, this is against Runway’s GEN-1. We showcased this in an earlier video too. Hmm, do you see what I see here? Yes, it is not your eyes, the output seems a little slower, GEN-1 indeed generates a few duplicate frames. Otherwise, not bad. This is a tool that is already being used out there to create movie projects. Not feature length movies, but do not let that fool you. AI film festivals are already being held where people put together videos of 1-10 minutes in length. Not 5-second videos anymore. Prizes are also given out, and if that fires you up, you can do submit your own work too. Everybody can become an artist and take part in this. What an incredible time to be alive, right. So that was GEN-1. I will note that GEN-2 also exists, to the best of my knowledge, without a paper. Now, let’s compare GEN-1 to the new technique…and the new technique, wow, this is an incredible jump in just one paper. And GEN-1 appeared not years ago, but less than a year ago. Such improvement in less than one year. And as always, just imagine what we will be capable of just two more papers down the line. My goodness. But, we are not done yet. Not even close! Now hold on to your papers Fellow Scholars, and let’s see how it performs against another technique from this year. And for me, this is once again, wow. Really, this kind of improvement in just one paper is very impressive. Now, look here, because this is super important. What is this? Well, Fellow Scholars, this is the mesh representation of the scene. Digital 3D geometry. And I hear you asking, great, now we can put it into our video games, that is important, but Károly, is that super important? Yes it is. You see, meshes are essential in understanding this work. And here comes one of my favorite things about this paper. First, it starts out with a mesh structure, digital geometry, and over time, this AI is like a sculptor who is working on a clay model. Yes, as the camera moves around, it “grows” this mesh outwards. Loving it! Now, this was text to video. And also, text to 3D geometry. But now, wait a second. Text prompts are great, but what if we already have an image of our scene instead,
Two Minute Papers
but that’s just a 2D image, and we would like 3D geometry instead? Well, have a look at this paper. Oh yeah! It promises that it takes just this one image, and it guesses what is around it, what is behind it, in a coherent manner. Well, it promises that. But, I will believe it when I see it. Whoa, look at that! It’s not perfect by any means, but it is real good. And “good” is an important keyword. Why? Well, take it from the legendary chip engineer, Jim Keller. That was an incredibly good summary of what you see here. And a huge honor. Thank you.
Conclusion
So, yes if it’s good, two more papers down the line and it will be great. So we can now create these videos from text, or a photograph. But get this, you might not even need a photograph to perform all this. Yes, that’s right. With the third paper for today, you can even have a photo that is almost completely destroyed, and it will be able to restore it really well. This can then go into the photo to 3D techniques. Can you imagine before watching Two Minute Papers, that someone says that from this photo, and two AI research papers, and it will be restored with amazingly high quality, and then, you can even put it into a video game. No one would have believed it. And yet, here we are. Wow. Now note that of course, image inpainting techniques already exist, there are heaps and heaps of papers on it, for instance, here is a legendary paper, PatchMatch from 14 years ago. All handcrafted, human ingenuity. A graphics paper, of course. That was really impressive. But compared to what we can do today. Wow. I am out of words. What a time to be alive! Now as promised, a word about Gemini. We discussed it in detail a few videos ago, and I made sure that we mainly discuss results not from the marketing materials, but from the research paper itself. We are Fellow Scholars here, and paper results are more detailed, and easier to verify. However, while I was talking, I also showed you some materials like this. In the meantime, Google discussed how this footage was made. And it was made not with the AI-s real-time feedback on a video, but they used still images from the video plus wrote a prompt to give it to the AI. The answers were then showed here. These prompts range from simple, like what do you see here to something more detailed. However, we did not see these prompts. I think the best would have been if it were presented in a way like you see here, and second best would have been adding a disclaimer to the footage that still images and additional prompting was used. The paper results that span the majority of our videos are to the best of my knowledge, accurate. I wanted to make really sure to revisit this just to make sure that you always get accurate information about these works. That is The Way of The Scholar.