OpenAI DALL-E: Fighter Jet For The Mind! ✈️
8:09

OpenAI DALL-E: Fighter Jet For The Mind! ✈️

Two Minute Papers 16.01.2021 222 171 просмотров 13 345 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
❤️ Check out Perceptilabs and sign up for a free demo here: https://www.perceptilabs.com/papers 📝 The blog post on "DALL-E: Creating Images from Text" is available here: https://openai.com/blog/dall-e/ Tweet sources: - Code completion: https://twitter.com/gdm3000/status/1151469462614368256 - Website layout: https://twitter.com/sharifshameem/status/1283322990625607681 - Population data: https://twitter.com/pavtalk/status/1285410751092416513 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Aleksandr Mashrabov, Alex Haro, Alex Serban, Alex Paden, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bruno Mikuš, Bryan Learn, Christian Ahlin, Eric Haddad, Eric Lau, Eric Martel, Gordon Child, Haris Husic, Jace O'Brien, Javier Bustamante, Joshua Goller, Lorin Atzberger, Lukas Biewald, Matthew Allen Fisher, Michael Albrecht, Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Ramsey Elbasheer, Robin Graham, Steef, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil, Tybie Fitzhugh. If you wish to support the series, click here: https://www.patreon.com/TwoMinutePapers Thumbnail background image credits: https://pixabay.com/images/id-3202725/ Károly Zsolnai-Fehér's links: Instagram: https://www.instagram.com/twominutepapers/ Twitter: https://twitter.com/twominutepapers Web: https://cg.tuwien.ac.at/~zsolnai/ #openai #dalle #dalle2

Оглавление (2 сегментов)

Segment 1 (00:00 - 05:00)

Dear Fellow Scholars, this is Two Minute Papers with Dr. Károly Zsolnai-Fehér. In early 2019, a learning-based technique appeared that could perform common natural language processing operations, for instance, answering questions, completing text, reading comprehension, summarization, and more. This method was developed by scientists at OpenAI, and they called it GPT-2. The key idea for GPT-2 was that all of these problems could be formulated as different variants of text completion problems, where all we need to do is provide it an incomplete piece of text, and it would try to finish it. Then in June 2020 came GPT-3 that supercharged this idea, and among many incredible examples, it could generate website layouts from a written description. However, no one said that these neural networks can only deal with text information. And sure enough, a few months later, scientists at OpenAI thought that if we can complete text sentences, why not try to complete images too? They called this project image GPT and the problem statement was simple: we give it an incomplete image, and we ask the AI to fill in the missing pixels. It could identify that the cat here likely holds a piece of paper and finish the picture accordingly, and even understood that if we have a droplet here and we see just a portion of the ripples, then this means a splash must be filled in. And now, right in January 2021, just 7 months after the release of GPT-3, here is their new mind-blowing technique that explores the connection between text and images. But finishing images already kind of works, so what new thing can it do? In just a few moments, you will see that the more appropriate question would be “what can’t it do”? For now, well, it creates images from our written text captions, and you will see in a moment how monumental of a challenge that is. The name of this technique is a mix of Salvador Dalí and Pixar’s Wall-e. So please meet Dall-e. And now, let’s see it through an example. For neural network-based learning methods, it is easy to recognize that this text says OpenAI, and what a storefront it. Images of both of these exist in abundance. Understanding that is simple. However, generating a storefront that says OpenAI is quite a challenge. It is really possible that it can do that? Well, let’s try it. Look, it works! Wow! Now, of course, if you look here, you immediately see that it is by no means perfect, but let’s marvel at the fact that we can get all kinds of 2D and 3D texts, look at the storefronts from different orientations, and it can deal with all of these cases reasonably well. And of course, it is not limited to storefronts, we can request license plates, bags of chips, neon signs, and more. It can really do all that. So, what else? Well, get this, it can also kind of invent new things. So let’s put our entrepreneurial hat on and try to invent something here. For instance, let’s try to create a triangular clock. Or pentagonal. Or, you know, just make it a hexagon. It really doesn’t matter because we can ask for absolutely anything and get a bunch of prototypes in a matter of seconds. Now, let’s make it white, and…look! Now we have a happy, happy Károly. Why is that? It is because I am a light transport researcher by trade, so the first thing when I look at when seeing these generated images is how physically plausible they are. For instance, look at this white clock here on the blue table. And it did not only put it on the table, but it also made sure to generate appropriate glossy reflections that matches the color of the clock. It can do this too! Loving it. Apparently, it understands geometry, shapes and materials. I wonder what else does it understand? Well, get this, for instance, it even understands styles and rendering techniques. Being a graphics person, I am so happy to see that it learned the concept of low polygon count rendering, isometric views, clay objects, and we can even add an X-ray view to the Owl. Kind of. And now, if all that wasn’t enough, hold on to your papers, because we can also commission

Segment 2 (05:00 - 08:00)

artistic illustrations for free, and not only that, but even have fine-grained control over these artistic illustrations. I also learned that if manatees wore suits, they would wear them like this, and after a long and strenuous day walking their dogs, they can go for yet another round… in pajamas. But it does not stop there, it can not only generate paintings of nearly anything, but we can even choose the artistic style and the time of day as well. The night images are a little on the nose as most of them have the moon in the background, but I’ll be more than happy to take these. And the best part is that you can try this yourself right now through the link in the video description. In general, not all results are perfect, but it’s hard to even fathom all the things this will enable us to do in the near future when we can get our hands on these pre-trained models. This may be the first technique where the results are not limited by the algorithm, but, by our own imagination. Now this is a quote that I said about GPT-3, and notice that the exact same thing can be said about Dall-e. Quote: “The main point is that working with GPT-3 is a really peculiar process where we know that a vast body of knowledge lies within, but it only emerges if we can bring it out with properly written prompts. It almost feels like a new kind of programming that is open to everyone, even people without any programming or technical knowledge. If a computer is a bicycle for the mind, then GPT-3 is a fighter jet. Absolutely incredible. " I think this kind of programming is going to be more and more common in the future. Now note that these are some amazing preliminary results, but the full paper is not available yet. So this was not two minutes, and it was not about a paper. Welcome to Two Minute Papers! Jokes aside, I cannot wait for the paper to appear, and I’ll be here to have a closer look whenever it happens. Make sure to subscribe and hit the bell icon to not miss it when the big day comes. And until then, let me know in the comments what crazy concoctions you came up with! Thanks for watching and for your generous support, and I'll see you next time!

Другие видео автора — Two Minute Papers

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник