# OpenAI DALL-E 2: Top 10 Insane Results! 🤖

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=X3_LD3R_Ygs
- **Дата:** 21.04.2022
- **Длительность:** 12:35
- **Просмотры:** 585,417

## Описание

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambdalabs.com/papers

📝 The paper "Hierarchical Text-Conditional Image Generation with CLIP Latents" is available here:
https://openai.com/dall-e-2/
https://www.instagram.com/openaidalle/

❤️ Watch these videos in early access on our Patreon page or join us here on YouTube: 
- https://www.patreon.com/TwoMinutePapers
- https://www.youtube.com/channel/UCbfYPyITQ-7l4upoX8nvctg/join

🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Aleksandr Mashrabov, Alex Balfanz, Alex Haro, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bryan Learn, B Shang, Christian Ahlin, Eric Martel, Gordon Child, Ivo Galic, Jace O'Brien, Javier Bustamante, John Le, Jonas, Jonathan, Kenneth Davis, Klaus Busse, Lorin Atzberger, Lukas Biewald, Matthew Allen Fisher, Michael Albrecht, Michael Tedder, Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Paul F, Rajarshi Nigam, Ramsey Elbasheer, Steef, Taras Bobrovytsky, Ted Johnson, Thomas Krcmar, Timothy Sum Hon Mun, Torsten Reil, Tybie Fitzhugh, Ueli Gallizzi.
If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers

Thumbnail background design: Felícia Zsolnai-Fehér - http://felicia.hu

Chapters
00:00 Intro
00:34 GPT-3 - OpenAI's Text Magic
01:18 Image-GPT Was Born
01:55 Dall-E
02:44 Dall-E 2!
03:30 1. Panda mad scientist
03:55 2. Teddy bear mad scientists
04:20 3. Teddy skating on Times Square
05:05 4. Nebula dunking
05:30 5. Cat Napoleon
05:57 6. Flamingos everywhere!
06:49 7. Don't forget the corgis!
07:43 8. It can do interior design!
08:50 9. Dall-E 1 vs Dall-E 2
09:28 10. Not perfect
09:57 Bonus: Hold on to your papers!
10:18 It draws itself
10:42 One more thing
11:07 Another legendary paper

Károly Zsolnai-Fehér's links:
Instagram: https://www.instagram.com/twominutepapers/
Twitter: https://twitter.com/twominutepapers
Web: https://cg.tuwien.ac.at/~zsolnai/

#OpenAI #dalle #dalle2

## Содержание

### [0:00](https://www.youtube.com/watch?v=X3_LD3R_Ygs) Intro

Dear Fellow Scholars, this is Two Minute Papers with Dr. Károly Zsolnai-Fehér. Today I am so excited to show you this. Look. We are going to have an AI look at 650 million images on the internet, and then, ask it to generate the craziest synthetic images I have ever seen. And wait, it gets better, we will also see what this AI thinks it looks like. Spoiler alert - it appears to be cuddly. You'll see. So what is all this about?

### [0:34](https://www.youtube.com/watch?v=X3_LD3R_Ygs&t=34s) GPT-3 - OpenAI's Text Magic

Well, in June 2020, OpenAI created GPT-3, a magical AI that could finish your sentences, and among many incredible examples, it could generate website layouts from a written description. This opened the door for a ton of cool applications, but, note that all of these applications are built on understanding text. However, no one said that these neural networks can only deal with text information. And sure enough, a few months later, scientists at OpenAI thought that if we can complete text sentences, why not try to complete images too?

### [1:18](https://www.youtube.com/watch?v=X3_LD3R_Ygs&t=78s) Image-GPT Was Born

And thus, Image-GPT was born. The problem statement was simple: we give it an incomplete image, and we ask the AI to fill in the missing pixels. If we gave it this image, it understood that these birds are likely standing on something. And it even has several ideas as to what that might be! Look, a branch, a stone, or they can even stand in the water, and, amazingly, even their mirror images are created by the AI.

### [1:55](https://www.youtube.com/watch?v=X3_LD3R_Ygs&t=115s) Dall-E

But then, scientists at OpenAI thought, why not have the user write a text description, and get them a really well done image of exactly that. That sounds cool, and it gets even cooler the crazier the ideas we give to it. The name of this technique is a mix of Salvador Dalí and Pixar’s Wall-e. So please meet Dall-e. This could do a ton. For instance, this understands styles and rendering techniques. Being a computer graphics person, I am so happy to see that it learned the concept of low polygon count rendering, isometric views, clay objects, and we can even add an X-ray view to the Owl.

### [2:44](https://www.youtube.com/watch?v=X3_LD3R_Ygs&t=164s) Dall-E 2!

Kind of. And now, just a year later, would you look at that! Oh wow. Here is Dall-e 2! Oh my, I cannot tell you how excited I am to have a closer look at the results. Let's dive in together! So what can it do? Well, that’s not the right question. By the end of this video, I bet you will think that the more appropriate question would be “what can’t it do”? This one can take descriptions that are so specific, I would say that perhaps even a good human artist might have trouble with. Now, hold on to your papers, and have a look at 10 of my favorite examples. 1.

### [3:30](https://www.youtube.com/watch?v=X3_LD3R_Ygs&t=210s) 1. Panda mad scientist

“A panda mad scientist mixing sparkling chemicals”. Wow, look at that! This is something else. It even has sunglasses for extra street cred, and the reflections of the questionable substance it is researching are also present on its sunglasses. A+. But the mad science doesn’t stop there. 2.

### [3:55](https://www.youtube.com/watch?v=X3_LD3R_Ygs&t=235s) 2. Teddy bear mad scientists

“Teddy bears mixing sparkling chemicals as mad scientists”. But at this point, we already know that doing this would be too easy for the AI. Let’s do it in multiple styles. First, steampunk, second, 1990s Saturday morning cartoon, and third digital art. It can pull off all of these.

### [4:20](https://www.youtube.com/watch?v=X3_LD3R_Ygs&t=260s) 3. Teddy skating on Times Square

3. Now, about variants. Give me "A teddybear on a skateboard in Times Square. " Now, this is interesting for multiple reasons. For instance, you see that it can generate a ton of variants. That is fantastic. As a light transport researcher, I cannot resist mentioning how nice of a depth of field effect it is able to make, and, would you look at that! It also knows about highly sought-after signature effect of the lights blurred into these beautiful balls in the background. The AI understands these bokeh balls and the fact that it can harness this kind of knowledge is absolutely amazing.

### [5:05](https://www.youtube.com/watch?v=X3_LD3R_Ygs&t=305s) 4. Nebula dunking

4. And if you think that is too specific, you have seen nothing yet. Check this out! “An expressive oil painting of a basketball player dunking, depicted as an explosion of a nebula”. I love it. It also has a nice Space Jam quality to it. Well done, little AI. So good!

### [5:30](https://www.youtube.com/watch?v=X3_LD3R_Ygs&t=330s) 5. Cat Napoleon

5. You know what? I want even more specific, and even more ridiculous images. “a propaganda poster depicting a cat dressed as french emperor napoleon holding a piece of cheese”. Now that is way too specific. Nobody can pull that off…there is no way that…wow! I think we have a winner here. When the next election is coming up, you know what to do.

### [5:57](https://www.youtube.com/watch?v=X3_LD3R_Ygs&t=357s) 6. Flamingos everywhere!

6. And still, once again, believe it or not, you have seen nothing yet. Yes, that’s right! We can even get more specific. So much so that we can even edit an image that is already done. For instance, if we feel that this image is missing a flamingo, we can request that it is placed there, but, we can even specify the location for it. And, even the reflections are created for it, and they are absolutely beautiful. Now note that I think that if there are reflections here, then, perhaps there should have been reflections here too. A perfect test for one more paper down the line, when Dall-e 3 arrives. Make sure to subscribe and hit the bell icon, you really don’t want to miss it.

### [6:49](https://www.youtube.com/watch?v=X3_LD3R_Ygs&t=409s) 7. Don't forget the corgis!

7. This one puts up a clinic in understanding the world around us. This image is missing something. Missing what? Well, of course, corgis. And, I cannot believe this. If we specify the painting as the location, it will not only have a painterly style, but one that already matches the painting on the wall. This is true for the other painting too. This is incredible. I absolutely love it. And, last test, does it? Yes it does! If we are outside of the painting at the photo part, this good boy becomes photorealistic. Requesting variants is also a possibility here, so, what do you think? Which is the best boy? Let me know in the comments below!

### [7:43](https://www.youtube.com/watch?v=X3_LD3R_Ygs&t=463s) 8. It can do interior design!

And, number 8. If we can place any object anywhere, this is an excellent tool to perform interior design. We can put a couch wherever we please, and I am really looking forward to inspecting the reflections here. Oh yes. This is very difficult to compute - this is not a matte, diffuse object, and not even a mirror-like specular surface, but a glossy reflection that is somewhere inbetween. But it gets worse, this is a textured object, which also has to be taken into consideration, and proper shadows also have to be generated in a difficult situation where light comes from a ton of different directions. This is a nightmare, and the results are not perfect, but my goodness, if this is not an AI that has a proper understanding of the world around us, I don’t know what is. Absolutely incredible progress in just one year. I cannot believe my eyes.

### [8:50](https://www.youtube.com/watch?v=X3_LD3R_Ygs&t=530s) 9. Dall-E 1 vs Dall-E 2

You know what? Number 9. Actually, let’s look at how much it has improved since Dall-e 1 side by side. Now there is no contest here. Dall-e 2 is on a completely different level from its first iteration. This is so much better, and once again, such improvement in just a year. What a time to be alive! And, what do you think Dall-e 3, if it appears, will be capable of? What would you use this, or Dall-e 3 for? Please let me know in the comments below, I’d love to know what you think.

### [9:28](https://www.youtube.com/watch?v=X3_LD3R_Ygs&t=568s) 10. Not perfect

Now, of course, not even Dall-e 2 is perfect. Look at that. Number 10. Inspect the pictures and tell me, what do you think the prompt for this must have been? What do you think? Not easy, right? Let me know your tips in the comments below. Well, it was “A sign that says deep learning”. Well, A+ for effort, little AI, but this is certainly one of the failure cases.

### [9:57](https://www.youtube.com/watch?v=X3_LD3R_Ygs&t=597s) Bonus: Hold on to your papers!

And, you know what, I cannot resist. +1. If you have been holding on to your papers, now, squeeze that paper at least as hard as these scholars are squeezing their papers. So, which one are you? Which one resembles your reaction to this paper the best? Let me know in the comments below.

### [10:18](https://www.youtube.com/watch?v=X3_LD3R_Ygs&t=618s) It draws itself

And, yes, as promised, here is what it thinks of itself. It is very soft and cuddly. Or at least, it wants us to think that it is so. Food for thought! And, if you speak robot and have any idea what this writing could mean, make sure to let me know below. And one more thing.

### [10:42](https://www.youtube.com/watch?v=X3_LD3R_Ygs&t=642s) One more thing

We noted that this AI was trained on 650 million images, and uses 3. 5 billion parameters. These are not rookie numbers by any stretch of the imagination. However, I am hoping that with this, there will be a chance that other, independent groups will also be able to train and use their own Dall-E 2.

### [11:07](https://www.youtube.com/watch?v=X3_LD3R_Ygs&t=667s) Another legendary paper

Just in the last few years, OpenAI has already given us legendary papers, for instance, an AI that can play hide and seek, solve math tests, or play a game called DOTA 2 on a world champion level. Given these, I hereby appoint Dall-E into the pantheon of these legendary works. And I have to say, I am super excited to see what they come up with next. Thanks for watching and for your generous support, and I'll see you next time!

---
*Источник: https://ekstraktznaniy.ru/video/13591*