# Google’s Imagen AI: Outrageously Good! 🤖

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=HyOW6fmkgrc
- **Дата:** 11.06.2022
- **Длительность:** 7:31
- **Просмотры:** 546,300

## Описание

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambdalabs.com/papers

📝 The paper "Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding" is available here:
https://gweb-research-imagen.appspot.com/

🕊️ Follow us on Twitter for more DALL-E 2 and Imagen-related content: https://twitter.com/twominutepapers

❤️ Watch these videos in early access on our Patreon page or join us here on YouTube: 
- https://www.patreon.com/TwoMinutePapers
- https://www.youtube.com/channel/UCbfYPyITQ-7l4upoX8nvctg/join

🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Aleksandr Mashrabov, Alex Balfanz, Alex Haro, Andrew Melnychuk, Benji Rabhan, Bryan Learn, B Shang, Christian Ahlin, Eric Martel, Geronimo Moralez, Gordon Child, Ivo Galic, Jace O'Brien, Jack Lukic, Javier Bustamante, John Le, Jonas, Jonathan, Kenneth Davis, Klaus Busse, Lorin Atzberger, Lukas Biewald, Matthew Allen Fisher, Michael Albrecht, Michael Tedder, Nevin Spoljaric, Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Rajarshi Nigam, Ramsey Elbasheer, Steef, Taras Bobrovytsky, Ted Johnson, Thomas Krcmar, Timothy Sum Hon Mun, Torsten Reil, Tybie Fitzhugh, Ueli Gallizzi.
If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers

Thumbnail background design: Felícia Zsolnai-Fehér - http://felicia.hu

Chapters:
0:00 Google Imagen
0:21 What is DALL-E 2?
1:04 Google Imagen enters the scene
1:26 Pandas and guitars
2:07 What is new here?
2:29 Finally, text!
2:52 Oh my, refraction too!
3:21 AI reacts to an other AI
3:45 Imagen VS DALL-E 2
5:08 More tests
5:35 So much progress in so little time
6:19 More results

Károly Zsolnai-Fehér's links:
Instagram: https://www.instagram.com/twominutepapers/
Twitter: https://twitter.com/twominutepapers
Web: https://cg.tuwien.ac.at/~zsolnai/

#google #imagen

## Содержание

### [0:00](https://www.youtube.com/watch?v=HyOW6fmkgrc) Google Imagen

Dear Fellow Scholars, this is Two Minute  Papers with Dr. Károly Zsolnai-Fehér. I cannot tell you how excited I am by this  paper. Wow. Today you will see more incredible   images generated by an AI. However, not from  OpenAI, but Google! Just a few months ago,

### [0:21](https://www.youtube.com/watch?v=HyOW6fmkgrc&t=21s) What is DALL-E 2?

OpenAI’s image generator AI called  DALL-E 2 took the world by storm. You could name almost anything, Cat Napoleon,  a teddy bear on a skateboard on Times Square,   a basketball player dunking as an explosion  of a nebula, and it was able to create an   appropriate image for it. However, there was  one interesting thing about it. What do you   think the prompt for this must have been? Hmm.   Not easy, right? Well, it was “A sign that says   deep learning”. Oh yes, this was one of  the failure cases. Please remember this. Now, we always say that in research, do  not look at where we are, always look

### [1:04](https://www.youtube.com/watch?v=HyOW6fmkgrc&t=64s) Google Imagen enters the scene

at where we will be two more papers down  the line. However, we didn’t even make it   two more papers down the line. What’s more,  we barely made two months down the line,   and here it is: this is Google Research’s  incredible image generator AI, Imagen.

### [1:26](https://www.youtube.com/watch?v=HyOW6fmkgrc&t=86s) Pandas and guitars

This technique also looks at millions  of image and text description pairs,   and learns what people mean when they  say this is a guitar, or this is a panda.    But, the magic happens here. Oh yes, it also  learned how to combine these concepts together,   and how a panda would play a guitar. The  frets and the strings seem a little wacky,   but what do we know, this is not human  engineering. This is panda engineering. Or a robot   engineering something for a panda.   We are living crazy times indeed.

### [2:07](https://www.youtube.com/watch?v=HyOW6fmkgrc&t=127s) What is new here?

So, what is different here? Why do  we need another image generator AI?    Well, let’s pop the hood, and look inside. Oh yes.   Two things that will immediately make a difference   come to mind. One, this architecture is simpler.   Two, it learns on longer text descriptions,

### [2:29](https://www.youtube.com/watch?v=HyOW6fmkgrc&t=149s) Finally, text!

and hopefully that also means that it can  generate text better. Let’s have a look.    What? Are you seeing what I am seeing?   That is not just some text, that   is a beautiful piece of text, exactly what  we were looking for. Absolutely amazing. And, these not the only advantages, you know, I  am a light transport researcher by trade, and I

### [2:52](https://www.youtube.com/watch?v=HyOW6fmkgrc&t=172s) Oh my, refraction too!

promised to myself that I’ll try not to flip out.   But…hold on to your papers, and… holy mother of   papers. Look at that. It can also generate  beautiful refractive objects. That duck is   truly a sight to behold. My goodness. Now I will  note that DALL-E 2 was also pretty good at this.

### [3:21](https://www.youtube.com/watch?v=HyOW6fmkgrc&t=201s) AI reacts to an other AI

And, if we plug in DeepMind’s new Flamingo  language model, would you look at that!    Is this really happening? Yes, that is an  AI commenting on a different AI’s work.    What a time to be alive! We will have a  look at this paper too in the near future,   make sure to subscribe and hit the bell icon,  you really don’t want to miss it when it comes.

### [3:45](https://www.youtube.com/watch?v=HyOW6fmkgrc&t=225s) Imagen VS DALL-E 2

And, you know what, let’s test it some more  against OpenAI’s amazing DALL-E 2. See how   they stack up against each other. The first prompt  will be a couple of glasses sitting on the table.    Well, with Google’s Imagen, oh my, these ones are  amazing, once again, proper refractive objects,   loving it. And, what about DALL-E 2? There is one  with the glasses that with an interesting framing,   but I see both reflections and refractions,  apart from the framing, I am liking this. And   the rest…well, yes, those are glasses sitting on  the table. But when we say a couple of glasses,   we probably mean these and not these. But that’s  really interesting, two AIs that have a linguistic   battle here. Imagine showing this to someone  just ten years ago. See how they would react.    Loving it. Also, I bet that in the future, these  AIs will work like brands and products today,   where people will have strong  opinions as to which ones they prefer.    The analog warmth of Imagen, or the  three-year warranty on DALL-E 4? And wait, you are all experienced Fellow  Scholars here, so you also wish to see the

### [5:08](https://www.youtube.com/watch?v=HyOW6fmkgrc&t=308s) More tests

two tested against each other a little more  rigorously. And we’ll do exactly that. The   paper checks the new technique against previous  results mathematically. Or, we can ask a bunch   of humans which one they prefer. And, wow. The  new technique passes with flying colors on both.

### [5:35](https://www.youtube.com/watch?v=HyOW6fmkgrc&t=335s) So much progress in so little time

And once again, DALL-E 2 appeared just about 2  months ago, and now, a new followup paper from   Google that tests really well against it.   This is not two more papers down the line.    Not even two more years down the line.   This is just two more months down the line.    And a year before, we had DALL-E 1, and  see how much of a difference OpenAI made   in just a year. Now I am almost certain that  this paper has been in the works for a while,   and they added the comparisons against DALL-E  2 at the end. But still, a followup paper this   quickly? The pace of progress in AI research is  absolutely incredible. What a time to be alive!

### [6:19](https://www.youtube.com/watch?v=HyOW6fmkgrc&t=379s) More results

So, does this get your mind  going? What else would you   use this new technique for? Let  me know in the comments below! Thanks for watching and for your generous  support, and I'll see you next time!

---
*Источник: https://ekstraktznaniy.ru/video/13540*