# Text To Image AIs Just Got Supercharged!

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=7rvjZQy_RQs
- **Дата:** 06.12.2023
- **Длительность:** 9:02
- **Просмотры:** 89,544
- **Источник:** https://ekstraktznaniy.ru/video/12880

## Описание

❤️ Check out Weights & Biases and sign up for a free demo here: https://wandb.me/papers

📝 The papers/works are available here:
https://firefly.adobe.com
https://rich-text-to-image.github.io/
https://mesh-aware-rf.github.io/

📝 My latest paper on simulations that look almost like reality is available for free here:
https://rdcu.be/cWPfD 

Or this is the orig. Nature Physics link with clickable citations:
https://www.nature.com/articles/s41567-022-01788-5

🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Aleksandr Mashrabov, Alex Balfanz, Alex Haro, Andrew Melnychuk, Benji Rabhan, Bret Brizzee, Bryan Learn, B Shang, Christian Ahlin, Gaston Ingaramo, Geronimo Moralez, Gordon Child, Jace O'Brien, Jack Lukic, John Le, Kenneth Davis, Klaus Busse, Kyle Davis, Lukas Biewald, Martin, Matthew Valle, Michael Albrecht, Michael Tedder, Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Rajarshi Nigam, Ramsey Elbasheer, Richard Putra Iskandar, Richard

## Транскрипт

### Segment 1 (00:00 - 05:00) []

Three incredible things just happened.   One, have a look at Adobe’s Firefly text   to image AI. The first version was not  bad. It was kinda usable. But good news,   Firefly 2 is already here,  and it is so much better. Two, we can now control text to image AIs  using rich text. This seems like a crazy idea,   so I wonder what this is for? Why is this useful? And three, from now on, we will not only  visualize, but even alter the world around us. So first, Firefly 2. It can perform text to  image. This is where absolute magic happened.    Look how version two compares to version one. Not  even close. These go from a solid improvement,   to I am not even sure what is going on here  to, my goodness. Look. Now that is a huge   leap in capabilities for macro images,  I love this one. And just imagine what   we will be capable of just a couple more  papers down the line. I have to be honest,   I can’t even imagine. Can you? Also, their  gallery of results has also improved leaps   and bounds. And don’t forget, this tool is  being handed out to millions and millions   of people all around the world in  Photoshop and Adobe’s other tools. Dear Fellow Scholars, this is Two Minute  Papers with Dr. Károly Zsolnai-Fehér. But sometimes, we don’t want to create an  entire image, sometimes we have an image   that is just right, but we would like to spice it  up with an underground city, waves, new clothes,   new planets, or this adorable thing. In these  cases, generative fill is going to be excellent. Now, Photoshop has been used for decades now to  put text with all kinds of options into designs,   but here, they now introduced something  more exciting. Text effects. From now on,   you don’t need to be a proper artist to  create letters made out of gingerbread,   yarn, woven fabric, you name it. They also showcased generative recolor. This  is for vector images, where the boundaries of   objects are known exactly from a text description.   This is a useful feature, but in my opinion this   can be done with traditional techniques too,  no AI is required. Not too impressive. But a   useful addition nonetheless. This video is not  sponsored by Adobe, we do not have any business   ties with them. This is why I can keep saying that  this is good, but this is not that impressive. So all in all, solid techniques, not  necessarily the best text to image   on the market, we’ve seen better,  but the overall package is good,   and it is going to put in the hands  of millions and millions of artists. Now, outside of Photoshop, when it comes  to text prompts, we have always used plain   text. But here is a crazy idea from a  really cool paper. So what if we could   use rich text effects to make our prompts, and  as a result, make our images more expressive. Sounds great. But wait, is this really useful? I  mean, you can add some colorful letters here to   the hair, and the hair gets the appropriate  color, but we could do this anyway. Well,   if you are not a believer, check this  out. We can also use font styles,   which would then translate to artistic styles.   Hmm, I like it, now we are getting there. Or   if we want to make just one object pop, there we  go. But let’s go a bit crazier. Footnotes! This   allows us to prescribe more details about  an object without having to rewrite the   whole prompt. If we don’t like it, we change the  footnote and there we go. That’s a good one too. But let’s go even crazier. Links. We can  now link to a particular image if we wish   to get a particular breed of dog, or a particular  product for a backpack for our images. And now,   something really simple, but really dreadful…I  mean powerful. You’ll see in a moment. Let’s   play with text size. So, would like pineapples on  your pizza, or would you like PINEAPPLES on your   pizza? You have to choose one. Choose wisely,  Fellow Scholars. You know what, being the good   Fellow Scholars you are, I’ll let you get away  with more pepperonis or mushrooms. I would

### Segment 2 (05:00 - 09:00) [5:00]

absolutely love to see this in future products.   Not the pineapple pizza, but text size. So good. And now, hold on to your papers Fellow Scholars,  because three, we will now not only visualize,   but even alter the world around us. First, we  capture a few photos of a real-world scene,   then create a virtual copy of it. So far  so good, NERFs do that, we have seen lots   of works on this already. But here, step number  two, the geometry of the scene is then estimated,   and then the lighting is estimated too.   And now, we create a virtual object,   then put it into this scene, and, oh my,  that metal basket will actually reflect   the light in the scene properly, and will cast a  beautiful shadow too. And it works elsewhere too,   reflections and shadows appear as  they should. What a wonderful paper! And these objects can also be a part of a  simulation in a virtual world. That really   gets my mind going. This way, we could  capture a real-world place around us,   and start playing with it as if it were a video  game. Or insert ourselves into a virtual world,   train self-driving cars safely within  a simulation that is a copy of the real   world, you name it. Now, for this  to be really useful, the quality   of the results would need to improve a  bit, but the concept is now out there. However, this is still an incredibly potent  concept. Why? Well, you Fellow Scholars know   very well that one of my favorite phenomena in  research is sim2real. What is that? Sim2real is   when we would like to train a robot for something  in the real world, but when starting out,   this little AI is not doing very well at first.   It might fail a great deal before it starts to   be competent at its job. Take self-driving for  example. Or even a vacuum-cleaning robot. Now   imagine that we can create a computer simulation,  and they would train in this simulation, safely,   and my other favorite, efficiently. You see,  one second in real life is one second. However,   with a powerful computer, that same one second  in real life can be used to simulate hours,   or even days of progress. If we wait just a bit,  we might get a masterful AI that has trained   essentially for years and years. And here, we add  a twist to it. What if the virtual world is based   on the real world? We can scan the real world, and  create a simulated video game from it. With that,   now, these AIs are able to train in the real  world, yet, they do it safely and efficiently.    And with this new paper, add another twist.   We have the real world and now we can even   alter it. We can create scenarios that would  normally be almost impossible in the real world,   and the AI will be able to train on  it for years and years. I love it. And just imagine what we will be capable of   just two more papers down the  line. What a time to be alive!
