# Can an AI Learn The Concept of Pose And Appearance? 👱‍♀️

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=Z6iTo7KY7lw
- **Дата:** 05.11.2019
- **Длительность:** 4:31
- **Просмотры:** 38,350

## Описание

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambdalabs.com/papers

📝 The paper "HoloGAN: Unsupervised learning of 3D representations from natural images" is available here:
https://www.monkeyoverflow.com/#/hologan-unsupervised-learning-of-3d-representations-from-natural-images/

❤️ Pick up cool perks on our Patreon page: https://www.patreon.com/TwoMinutePapers

🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Alex Haro, Anastasia Marchenkova, Andrew Melnychuk, Angelos Evripiotis, Anthony Vdovitchenko, Brian Gilman, Bryan Learn, Christian Ahlin, Claudio Fernandes, Daniel Hasegan, Dennis Abts, Eric Haddad, Eric Martel, Evan Breznyik, Geronimo Moralez, James Watt, Javier Bustamante, John De Witt, Kaiesh Vohra, Kasia Hayden, Kjartan Olason, Levente Szabo, Lorin Atzberger, Lukas Biewald, Marcin Dukaczewski, Marten Rauschenberg, Matthias Jost,, Maurits van Mastrigt, Michael Albrecht, Michael Jensen, Nader Shakerin, Owen Campbell-Moore, Owen Skarpness, Raul Araújo da Silva, Rob Rowe, Robin Graham, Ryan Monsurate, Shawn Azman, Steef, Steve Messina, Sunil Kim, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil.
https://www.patreon.com/TwoMinutePapers

Splash screen/thumbnail design: Felícia Fehér - http://felicia.hu

Károly Zsolnai-Fehér's links:
Instagram: https://www.instagram.com/twominutepapers/
Twitter: https://twitter.com/karoly_zsolnai
Web: https://cg.tuwien.ac.at/~zsolnai/

## Содержание

### [0:00](https://www.youtube.com/watch?v=Z6iTo7KY7lw) Segment 1 (00:00 - 04:00)

Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér. I apologize for my voice today, I am trapped in this frail human body, and sometimes it falters. But you remember from the previous episode, the papers must go on. In the last few years, we have seen a bunch of new AI-based techniques that were specialized in generating new and novel images. This is mainly done through learning-based techniques, typically a Generative Adversarial Network, a GAN in short, which is an architecture where a generator neural network creates new images, and passes it to a discriminator network, which learns to distinguish real photos from these fake, generated images. These two networks learn and improve together, so much so that many of these techniques have become so realistic that we sometimes can’t even tell they are synthetic images unless we look really closely. You see some examples here from BigGAN, a previous technique that is based on this architecture. Now, normally, if we are looking to generate a specific human face, we have to generate hundreds and hundreds of these images, and our best bet is to hope that sooner or later, we’ll find something that we were looking for. So, of course, scientists were interested in trying to exert control over the outputs, and with followup works, we can now kind of control the appearance, but, in return, we have to accept the pose in which they are given. And, this new project is about teaching a learning algorithm to separate pose from identity. Now, that sounds kind of possible with proper supervision. What does this mean exactly? Well, we have to train these GANs on a large number of images so they can learn what a human face looks like, what landmarks to expect and how to form them properly when generating new images. However, when the input images are given with different poses, we will normally need to add additional information to the discriminator that describes the rotations of these people and objects. Well, hold on to your papers, because that is exactly what is not happening in this new work. This paper proposes an architecture that contains a 3D transform and a projection unit, you see them here with red and blue, and, these help us in separating pose and identity. As a result, we have much finer artistic control over these during image generation. That is amazing. So as you see here, it enables a really nice workflow where we can also set up the poses. Don’t like the camera position for this generated bedroom? No problem. Need to rotate the chairs? No problem. And we are not even finished yet, because when we set up the pose correctly, we’re not stuck with these images - we can also choose from several different appearances. And all this comes from the fact that this technique was able to learn the intricacies of these objects. Love it. Now, it is abundantly clear that as we rotate these cars, or change the camera viewpoint for the bedroom, a flickering effect is still present. And this, is how research works. We try to solve a new problem, one step at a time. Then, we find flaws in the solution, and improve upon that. As a result, we always say, two more papers down the line, and we’ll likely have smooth and creamy transitions between the images. The Lambda sponsorship spot is coming in a moment, and I don’t know if you have noticed at the start, but they were also part of this research project as well. I think that is as relevant of a sponsor as it gets. Thanks for watching and for your generous support, and I'll see you next time!

---
*Источник: https://ekstraktznaniy.ru/video/14226*