# These Neural Networks Have Superpowers! 💪

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=o7dqGcLDf0A
- **Дата:** 16.02.2021
- **Длительность:** 7:29
- **Просмотры:** 149,521
- **Источник:** https://ekstraktznaniy.ru/video/13976

## Описание

❤️ Check out Weights & Biases and sign up for a free demo here: https://www.wandb.com/papers 
❤️ Their mentioned post is available here: https://wandb.ai/ayush-thakur/taming-transformer/reports/-Overview-Taming-Transformers-for-High-Resolution-Image-Synthesis---Vmlldzo0NjEyMTY

📝 The paper "Taming Transformers for High-Resolution Image Synthesis" is available here:
https://compvis.github.io/taming-transformers/

Tweet links:
Website layout: https://twitter.com/sharifshameem/status/1283322990625607681
Plots: https://twitter.com/aquariusacquah/status/1285415144017797126?s=12
Typesetting math: https://twitter.com/sh_reya/status/1284746918959239168
Population data: https://twitter.com/pavtalk/status/1285410751092416513
Legalese: https://twitter.com/f_j_j_/status/1283848393832333313
Nutrition labels: https://twitter.com/lawderpaul/status/1284972517749338112
User interface design: https://twitter.com/jsngr/status/1284511080715362304

🙏 We would like to thank our generous Patreon supporters w

## Транскрипт

### Intro []

Dear Fellow Scholars, this is Two Minute  Papers with Dr. Károly Zsolnai-Fehér. I got so excited by the amazing results of this  paper. I will try my best to explain why, and by   the end of this video, there will be a comparison  that blew me away and I hope you will appreciate   it too. With the rise of neural network-based  learning algorithms, we are living the advent   of image generation techniques. What you see  here is a set of breathtaking results created   with a technique called StyleGAN2. This can  generate images of humans, cars, cats, and more. As you see, the progress in machine  learning-based image generation is just stunning.

### OpenAIs GPT3 [0:45]

And don’t worry for a second about the progress  in text processing, because that is also   similarly amazing these days. A few months ago,  OpenAI published their GPT-3 model that they   unleashed to read the internet, and learn not just  our language, but much, much more. For instance,   the internet also contains a lot of computer code,  so it learned to generate website layouts from a   written description. But that’s not all, not even  close, to the joy of technical PhD students around   the world, it can properly typeset mathematical  equations from a plain English description as   well. And get this, it can also translate  a complex legal text into plain language,   or, the other way around. And it does many  of these things nearly as well as humans. So what was the key to this  work? One of the keys of GPT-3

### Transformer Networks [1:42]

was that it uses a neural network architecture  that is called the transformer network. These   really took the world by storm in the last few  years, so our first question is, why transformers?    One, transformer networks can typically learn  on stupendously large datasets, like the whole   internet, and extract a lot of information  from it. That is a very good thing. And two,   transformers are attention-based neural networks,  which means that they are good at learning and   generating long sequences of data. Okay, but  how do we benefit from this? Well, when we ask   OpenAI’s GPT-3 to continue our sentences, it  is able to look back at what we have written   previously. And it looks at not just a couple  of characters, no-no, it looks at up to several   pages of writing backwards to make sure that  it continues what we write the best way it can. This sounds amazing. But what is the lesson here?   Just use transformers for everything and off we   go? Well, not quite. They are indeed good at a lot  of things when it comes to text processing tasks,

### Image Generation [2:53]

but they don’t excel at generating high-resolution  images at all. Can this be improved somehow?    Well, this is what this new  technique does, and much, much more. So let’s dive in and see what it can do! First,   we can give it an incomplete  image and ask it to finish it. Not bad… but! OpenAI’s Image-GPT could do that  too, so what else can it do? Oh boy, a lot more!    And by the way, we will compare the results of  this technique against Image-GPT at the end of   this video, make sure not to miss that, I almost  fell off the chair, you will see in a moment why. Two, it can do one of my favorites, depth  to image generation. We give it a depth map,   which is very easy to produce, and it creates  a photorealistic image that corresponds to it,   which is very hard. We do the easy  part, the AI does the hard part. Great!    And with this, we not only get a selection of  these images, but since we have their depth maps,   we can also rotate them around  as if they were 3D objects. Nice! Three, we can also give it a map of  labels, which is, again, very easy to do,   we just say here goes the sea, put  some mountains here, and the sky here,   and it will create a beautiful landscape  image that corresponds to that.    I can’t wait to see what these amazing artists all  over the world will be able to get out of these   techniques, and these results are already  breathtaking…but research is a process,   and just imagine how good they will become  two more papers down the line. My goodness! Four, it can also perform super resolution.   This is the CSI thing where in goes a blurry   image, and out comes a finer, more  detailed version of it. Witchcraft. And finally, five, we can give it a pose, and  it generates humans that take these poses. Now, the important thing here  is that it can supercharge   transformer networks to do these things  at the same time, with just one technique.

### Comparison [5:17]

So how does it compare to OpenAI’s Image  completion technique? Well, remember,   that technique was beyond amazing, and set a  really high bar. So let’s have a look together!    They were both given the upper half of this image,  and had to fill in the lower half. Remember,   as we just learned transformers are not  great at high-resolution image synthesis.    So here, for OpenAI Image-GPT we expect heavily  pixelated images…. and…oh yes, that’s right.    So now, hold on to your papers, and let’s see  how much more detailed the new technique is.    Holy mother of papers! Do you see what I see  here? Image-GPT came out just a few months ago,   and there is already this kind of progress. So  there we go, just imagine what we will be able   to do with these supercharged transformers  just two more papers down the line.    Wow. And that’s where I almost fell off the  chair when reading this paper. Hope you held   on to yours. It truly feels like we are living in  a science fiction world. What a time to be alive! Thanks for watching and for your generous  support, and I'll see you next time!