❤️ Check out Weights & Biases and sign up for a free demo here: https://wandb.me/papers
📝 The papers/works are available here:
https://firefly.adobe.com
https://rich-text-to-image.github.io/
https://mesh-aware-rf.github.io/
📝 My latest paper on simulations that look almost like reality is available for free here:
https://rdcu.be/cWPfD
Or this is the orig. Nature Physics link with clickable citations:
https://www.nature.com/articles/s41567-022-01788-5
🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Aleksandr Mashrabov, Alex Balfanz, Alex Haro, Andrew Melnychuk, Benji Rabhan, Bret Brizzee, Bryan Learn, B Shang, Christian Ahlin, Gaston Ingaramo, Geronimo Moralez, Gordon Child, Jace O'Brien, Jack Lukic, John Le, Kenneth Davis, Klaus Busse, Kyle Davis, Lukas Biewald, Martin, Matthew Valle, Michael Albrecht, Michael Tedder, Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Rajarshi Nigam, Ramsey Elbasheer, Richard Putra Iskandar, Richard Sundvall, Steef, Taras Bobrovytsky, Ted Johnson, Thomas Krcmar, Timothy Sum Hon Mun, Torsten Reil, Tybie Fitzhugh, Ueli Gallizzi.
If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers
Thumbnail background design: Felícia Zsolnai-Fehér - http://felicia.hu
Károly Zsolnai-Fehér's research works: https://cg.tuwien.ac.at/~zsolnai/
Twitter: https://twitter.com/twominutepapers
Оглавление (2 сегментов)
Segment 1 (00:00 - 05:00)
Three incredible things just happened. One, have a look at Adobe’s Firefly text to image AI. The first version was not bad. It was kinda usable. But good news, Firefly 2 is already here, and it is so much better. Two, we can now control text to image AIs using rich text. This seems like a crazy idea, so I wonder what this is for? Why is this useful? And three, from now on, we will not only visualize, but even alter the world around us. So first, Firefly 2. It can perform text to image. This is where absolute magic happened. Look how version two compares to version one. Not even close. These go from a solid improvement, to I am not even sure what is going on here to, my goodness. Look. Now that is a huge leap in capabilities for macro images, I love this one. And just imagine what we will be capable of just a couple more papers down the line. I have to be honest, I can’t even imagine. Can you? Also, their gallery of results has also improved leaps and bounds. And don’t forget, this tool is being handed out to millions and millions of people all around the world in Photoshop and Adobe’s other tools. Dear Fellow Scholars, this is Two Minute Papers with Dr. Károly Zsolnai-Fehér. But sometimes, we don’t want to create an entire image, sometimes we have an image that is just right, but we would like to spice it up with an underground city, waves, new clothes, new planets, or this adorable thing. In these cases, generative fill is going to be excellent. Now, Photoshop has been used for decades now to put text with all kinds of options into designs, but here, they now introduced something more exciting. Text effects. From now on, you don’t need to be a proper artist to create letters made out of gingerbread, yarn, woven fabric, you name it. They also showcased generative recolor. This is for vector images, where the boundaries of objects are known exactly from a text description. This is a useful feature, but in my opinion this can be done with traditional techniques too, no AI is required. Not too impressive. But a useful addition nonetheless. This video is not sponsored by Adobe, we do not have any business ties with them. This is why I can keep saying that this is good, but this is not that impressive. So all in all, solid techniques, not necessarily the best text to image on the market, we’ve seen better, but the overall package is good, and it is going to put in the hands of millions and millions of artists. Now, outside of Photoshop, when it comes to text prompts, we have always used plain text. But here is a crazy idea from a really cool paper. So what if we could use rich text effects to make our prompts, and as a result, make our images more expressive. Sounds great. But wait, is this really useful? I mean, you can add some colorful letters here to the hair, and the hair gets the appropriate color, but we could do this anyway. Well, if you are not a believer, check this out. We can also use font styles, which would then translate to artistic styles. Hmm, I like it, now we are getting there. Or if we want to make just one object pop, there we go. But let’s go a bit crazier. Footnotes! This allows us to prescribe more details about an object without having to rewrite the whole prompt. If we don’t like it, we change the footnote and there we go. That’s a good one too. But let’s go even crazier. Links. We can now link to a particular image if we wish to get a particular breed of dog, or a particular product for a backpack for our images. And now, something really simple, but really dreadful…I mean powerful. You’ll see in a moment. Let’s play with text size. So, would like pineapples on your pizza, or would you like PINEAPPLES on your pizza? You have to choose one. Choose wisely, Fellow Scholars. You know what, being the good Fellow Scholars you are, I’ll let you get away with more pepperonis or mushrooms. I would
Segment 2 (05:00 - 09:00)
absolutely love to see this in future products. Not the pineapple pizza, but text size. So good. And now, hold on to your papers Fellow Scholars, because three, we will now not only visualize, but even alter the world around us. First, we capture a few photos of a real-world scene, then create a virtual copy of it. So far so good, NERFs do that, we have seen lots of works on this already. But here, step number two, the geometry of the scene is then estimated, and then the lighting is estimated too. And now, we create a virtual object, then put it into this scene, and, oh my, that metal basket will actually reflect the light in the scene properly, and will cast a beautiful shadow too. And it works elsewhere too, reflections and shadows appear as they should. What a wonderful paper! And these objects can also be a part of a simulation in a virtual world. That really gets my mind going. This way, we could capture a real-world place around us, and start playing with it as if it were a video game. Or insert ourselves into a virtual world, train self-driving cars safely within a simulation that is a copy of the real world, you name it. Now, for this to be really useful, the quality of the results would need to improve a bit, but the concept is now out there. However, this is still an incredibly potent concept. Why? Well, you Fellow Scholars know very well that one of my favorite phenomena in research is sim2real. What is that? Sim2real is when we would like to train a robot for something in the real world, but when starting out, this little AI is not doing very well at first. It might fail a great deal before it starts to be competent at its job. Take self-driving for example. Or even a vacuum-cleaning robot. Now imagine that we can create a computer simulation, and they would train in this simulation, safely, and my other favorite, efficiently. You see, one second in real life is one second. However, with a powerful computer, that same one second in real life can be used to simulate hours, or even days of progress. If we wait just a bit, we might get a masterful AI that has trained essentially for years and years. And here, we add a twist to it. What if the virtual world is based on the real world? We can scan the real world, and create a simulated video game from it. With that, now, these AIs are able to train in the real world, yet, they do it safely and efficiently. And with this new paper, add another twist. We have the real world and now we can even alter it. We can create scenarios that would normally be almost impossible in the real world, and the AI will be able to train on it for years and years. I love it. And just imagine what we will be capable of just two more papers down the line. What a time to be alive!