# OpenAI GLIDE AI: Astounding Power! 🤖

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=ItKi3h7IY2o
- **Дата:** 10.02.2022
- **Длительность:** 8:06
- **Просмотры:** 122,047

## Описание

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambdalabs.com/papers

📝 The paper "OpenAI GLIDE: Astounding Power, Now Even Cheaper!" is available here:
https://github.com/openai/glide-text2im
https://arxiv.org/abs/2112.10741

Try it here. Note that this seems to be a reduced model compared to the one in the paper (quite a bit!). Leave a comment with your results if you have found something cool! https://github.com/openai/glide-text2im

📝 Our material synthesis paper with the fluids:
https://users.cg.tuwien.ac.at/zsolnai/gfx/photorealistic-material-editing/

🕊️ My twitter: https://twitter.com/twominutepapers

🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Aleksandr Mashrabov, Alex Balfanz, Alex Haro, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bryan Learn, Christian Ahlin, Eric Martel, Gordon Child, Ivo Galic, Jace O'Brien, Javier Bustamante, John Le, Jonas, Jonathan, Kenneth Davis, Klaus Busse, Lorin Atzberger, Lukas Biewald, Matthew Allen Fisher, Michael Albrecht, Michael Tedder, Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Rajarshi Nigam, Ramsey Elbasheer, Steef, Taras Bobrovytsky, Thomas Krcmar, Timothy Sum Hon Mun, Torsten Reil, Tybie Fitzhugh, Ueli Gallizzi.
If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers

Thumbnail background design: Felícia Zsolnai-Fehér - http://felicia.hu

Meet and discuss your ideas with other Fellow Scholars on the Two Minute Papers Discord: https://discordapp.com/invite/hbcTJu2

Károly Zsolnai-Fehér's links:
Instagram: https://www.instagram.com/twominutepapers/
Twitter: https://twitter.com/twominutepapers
Web: https://cg.tuwien.ac.at/~zsolnai/

## Содержание

### [0:00](https://www.youtube.com/watch?v=ItKi3h7IY2o) Segment 1 (00:00 - 05:00)

Dear Fellow Scholars, this is Two Minute  Papers with Dr. Károly Zsolnai-Fehér. Today we are going to play with a magical AI  where we just sit in our armchair, say the words,   and it draws an image for us. Almost anything  we can imagine. Almost. And before you ask,   yes, this includes drawing corgis too. In the last few years, OpenAI set out to  train an AI named GPT-3 that could finish   your sentences. Then, they made Image-GPT,  this could even finish your images.    Yes, not kidding. It could identify that  the cat here likely holds a piece of paper   and finish the picture accordingly, and even  understood that if we have a droplet here   and we see just a portion of the ripples,  then this means a splash must be filled in. And it gets better, then, they invented an  AI they call Dall-E. This one is insanity:   we just tell the AI what  image we would like to see,   and it will draw it. Look. It can  create a custom storefront for us,   understands the concept of low-polygon rendering,  isometric views, clay objects, and more. And that’s not all. It could even invent  clocks with new shapes when asked. The crazy thing here is that it understands  geometry, shapes, even materials. For instance,   look at this white clock here on the blue table.   And it did not only put it on the table, but it   also made sure to generate appropriate glossy  reflections that matches the color of the clock. And get this, Dall-E was published  just about a year ago, and OpenAI   already has a followup paper that they call  GLIDE. And believe it or not, this can do more,   and it can do it better. Well, I will  believe it when I see it, so, let’s go! Now, hold on to your papers, and, let’s start with  a hedgehog using a calculator. Wow, that looks   incredible. It’s not just a hedgehog  plus a calculator. It really is using   the calculator. Now, paint a fox in the style  of the starry night painting. I love the style,   and even the framing of the picture is quite good,  there is some space left to make sure that we see   that starry night. Great decision making. Now, a  corgi with a red bowtie and a purple party hat.    Excellent. And a pixel art corgi with a pizza. These are really good, but they are nothing  compared to what is to come. Because it can   also perform conditional inpainting with text.   Yes, I am not kidding. Have a look at this little   girl hugging a dog. But, there is a problem  with this. Do you know what the problem is?    Of course, the problem is that this is not a  corgi. Now it is. That is another great result. And if we wish that some zebras were added here,  that’s possible too, and we can also add a vase   here. Hmm…look at that. It even understood  that this is a glass table and added its own   reflection. Now, I am a light transport researcher  by trade, and this makes me very, very happy.    However, it is also true that it seems to have  changed the material properties of the table,   it is now much more diffuse than it was before.   Perhaps this is the AI’s understanding of a new   object blocking reflections. It’s not perfect  by any means, but it is a solid step forward. We can also give this gentleman a white  hat. And as I look through these results,   I find it absolutely amazing how  well the hat blends into the scene. That is very challenging. Why? Well, in light  transport research, we need to simulate the path   of millions and millions of light rays  to make sure that indirect illumination   appears in a scene, for instance, look here.   This is from one of our previous paper that   showcases how fluids of different colors paint  their diffuse surroundings to their own color.    I find it absolutely beautiful. Let’s  switch the fluid to a different one, and,   you see the difference. The link to this work  is available in the description below. And,   you see, simulating these effects  is very costly, and very difficult.

### [5:00](https://www.youtube.com/watch?v=ItKi3h7IY2o&t=300s) Segment 2 (05:00 - 08:00)

But this is how proper light transport  simulations need to be done. And this GLIDE AI   can put in new objects into a scene and make  them blend in so well, this to me, seems also a   proper understanding of light transport. I can  hardly believe what is going on here. Bravo. But wait, how do we know if this is really better  than Dall-E? Are we supposed to just believe it?    No, not at all! Fortunately, comparing the results  against Dall-E is very easy, look. We just add   the same prompts, and see that there is no  contest. The new GLIDE technique creates sharper,   higher-resolution images with more details, and  it even follows our instructions better. The   paper also showcases a user study where human  evaluators also favored the new technique. Now, of course, we are not done here, not  even this technique is perfect. Look. We can   request a cat with 8 legs, and…wait a minute. It  tried some multiplication trick, but we are not   falling for it. A+ for effort, little AI, but, of  course, this is clearly one of the failure cases. And once again, this is an AI where  a vast body of knowledge lies within,   but it only emerges if we can bring it out with  properly written prompts. It almost feels like a   new kind of programming that is open to everyone,  even people without any programming or technical   knowledge. If a computer is a bicycle for the  mind, then OpenAI’s Glide is a fighter jet.    Absolutely incredible. Soon, this might  democratize creating illustrations, paintings,   and maybe even help inventing new things. And here comes the best part - you can try  it too. The notebook for it is available in   the video description. Make sure to leave  your experiment results in the comments,   or just tweet them at me. I’d love to see what you  ingenious Fellow Scholars bring out of this AI. Thanks for watching and for your generous  support, and I'll see you next time!

---
*Источник: https://ekstraktznaniy.ru/video/13662*