# Google’s New AI: OpenAI’s DALL-E 2, But 10X Faster!

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=2AsoWS2t484
- **Дата:** 04.02.2023
- **Длительность:** 8:02
- **Просмотры:** 251,292

## Описание

❤️ Train a neural network and track your experiments with Weights & Biases here: http://wandb.me/paperintro

📝 The paper "Muse: Text-To-Image Generation via Masked Generative Transformers" is available here:
https://muse-model.github.io/

Stable Diffusion interpolation: https://twitter.com/xsteenbrugge/status/1558508866463219712
Full video of interpolation: https://www.youtube.com/watch?v=Bo3VZCjDhGI

My latest paper on simulations that look almost like reality is available for free here:
https://rdcu.be/cWPfD 

Or this is the orig. Nature Physics link with clickable citations:
https://www.nature.com/articles/s41567-022-01788-5

🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Aleksandr Mashrabov, Alex Balfanz, Alex Haro, Andrew Melnychuk, Benji Rabhan, Bryan Learn, B Shang, Christian Ahlin, Edward Unthank, Eric Martel, Geronimo Moralez, Gordon Child, Jace O'Brien, Jack Lukic, John Le, Jonas, Jonathan, Kenneth Davis, Klaus Busse, Kyle Davis, Lorin Atzberger, Lukas Biewald, Matthew Allen Fisher, Matthew Valle, Michael Albrecht, Michael Tedder, Nevin Spoljaric, Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Rajarshi Nigam, Ramsey Elbasheer, Richard Sundvall, Steef, Taras Bobrovytsky, Ted Johnson, Thomas Krcmar, Timothy Sum Hon Mun, Torsten Reil, Tybie Fitzhugh, Ueli Gallizzi.
If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers

Thumbnail background design: Felícia Zsolnai-Fehér - http://felicia.hu

Károly Zsolnai-Fehér's links:
Twitter: https://twitter.com/twominutepapers
Web: https://cg.tuwien.ac.at/~zsolnai/

## Содержание

### [0:00](https://www.youtube.com/watch?v=2AsoWS2t484) Segment 1 (00:00 - 05:00)

Dear Fellow Scholars, this is Two Minute  Papers with Dr. Károly Zsolnai-Fehér. Today we are going to see progress  in text to image research that is so   incredible it hardly seems believable. So,  what is text to image? Simple - these are   AI-based techniques where a text prompt from  us goes in, and a beautiful image comes out. There are already a large set of techniques  that can perform text to image really well,   for instance, OpenAI’s DALL-E 2 can do it, where  we wait approximately 10 seconds for each image.    Or, we can even run it on our own hardware  with the free and open source Stable Diffusion. So, are we done? What else is there to invent  here? Why write more papers? Well, by the time   you finish this video, I hope you will agree with  me that the only possible answer is: my goodness,   there is so much to be done and so  much that has just been improved. For instance, this new technique from Google that  they call Muse can perform mask-free editing too.    What is that? To be able to appreciate what  that is, let’s look at what mask-based editing   looks like. Imagine that we have a wonderful  photo here, but we would like to change the   location of the background. No matter, let’s just  highlight this region, which we will call a mask,   and ask for a different background. And, yes, it  can repaint the image as if it were in New York,   Paris, or San Francisco. Wonderful. This is image  inpainting by using a mask and a text prompt. And now, let’s do this, but without a mask!   What? How is this even possible? Well,   we can tell the AI, come on, you are smart enough  to know where these objects are and what they are,   so, you do the masking yourself.   Automatically. So, can it? Let’s see. There is a cake in this image, and a  coffee latte. But, psst! Don’t tell the   AI where it is. Confidential information. Let it  find out itself. Now, little AI, please change   the cake to a croissant, if it even exists,  who knows, the latte art should form a flower.    And… wow. Look at that. Who could say that this  was not the original image? It has done absolutely   amazing. And, I am a light transport researcher  by trade, so I cannot resist mentioning that   the specular highlights on the new plate  are also excellent. Good job, little AI! But its mask free editing capabilities  can do even better. Look. We can change   our clothes super easily, even with text,  remember, synthesizing text properly was   quite difficult for previous techniques,  and this just does it easily. So good! Now, get this, because we can also do this with  drawings, and if we do, something amazing happens.    Look. We can start out from a crude drawing  of a cat, and ask it to morph into different   other animals. And through this, it can become a  dog, or a pig, a raccoon, or even other animals. This is possible because this new technique is  not your usual diffusion-based process like many   previous image generators. What does that mean?   It means that it does not start out from noise and   does not reorganize this noise to get a coherent  image. It does not think in terms of noise at all. But, we are not done yet, not even close! It  can also perform image outpainting. That is,   taking this part of the image and replacing  the entirety of the image around it using a   text prompt. Travel around the world  with just one text prompt. So cool! Now, have a look at these images and  their prompts. Great works, right? But,   there is something that ties them together. Do  you know what? Well, hold on to your papers,   because all of these images took just  approximately one second to generate.    That’s right, this is up to 10 times  faster than previous techniques! And   all this less than a year after DALL-E  2 has been published. That is insanity. And it can also perform things that other  previous techniques had a great deal of   trouble with. What are those? Well, two examples,  cardinality and composition. Cardinality means

### [5:00](https://www.youtube.com/watch?v=2AsoWS2t484&t=300s) Segment 2 (05:00 - 08:00)

that if we ask for three elephants standing  on top of each other, we really get three.    If we ask for four bottles of wine, we get  four. And if we ask for 10 bottles of wine,   wait a minute…yes, apparently not  even this technique is perfect. It also does well when it comes to composition.   If we ask for the two baseballs to be to the left   of the tennis balls, the AI understands that  and thus, they will likely end up being there. However, I also love how it combines all of  these concepts together. For instance,   here, we can do mask-free editing while keeping  the composition of the original image the same.    This way, we can transform our cat into  a dog, change a small basketball into an   American football, or make our cat yawn, or even  change these flowers. And note that we did not   need a mask for this, no highlighting  regions where the cats and roses are,   we just write what we want and the AI does it!   Also note once again that the composition of the   original image remains intact. I absolutely  love this. Such an amazing tool, and now,   a really fast one too. One second for  each of these? Sign me up right now! And just imagine that two more papers down  the line perhaps all this will be possible   to do in real time. We might be able to create  little virtual worlds with the speed of thought.    How cool is that! What a time to be alive! So, what do you think? What would you use  this for? Let me know in the comments below! Thanks for watching and for your generous  support, and I'll see you next time!

---
*Источник: https://ekstraktznaniy.ru/video/13298*