# This AI Creates Dogs From Cats…And More!

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=TrdmCkmK3y4
- **Дата:** 25.07.2020
- **Длительность:** 5:25
- **Просмотры:** 128,984

## Описание

❤️ Check out Weights & Biases and sign up for a free demo here: https://www.wandb.com/papers 

Their instrumentation of this paper is available here:
https://app.wandb.ai/stacey/stargan/reports/Cute-Animals-and-Post-Modern-Style-Transfer%3A-Stargan-V2-for-Multi-Domain-Image-Synthesis---VmlldzoxNzcwODQ

📝 The paper "StarGAN v2: Diverse Image Synthesis for Multiple Domains" is available here:
- Paper: https://arxiv.org/abs/1912.01865
- Code: https://github.com/clovaai/stargan-v2
- Youtube Video: https://youtu.be/0EVh5Ki4dIY

The paper with the latent space material synthesis is available here:
https://users.cg.tuwien.ac.at/zsolnai/gfx/gaussian-material-synthesis/

 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Aleksandr Mashrabov, Alex Haro, Alex Paden, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bruno Mikuš, Bryan Learn, Christian Ahlin, Daniel Hasegan, Eric Haddad, Eric Martel, Gordon Child, Javier Bustamante, Lorin Atzberger, Lukas Biewald, Michael Albrecht, Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Ramsey Elbasheer, Robin Graham, Steef, Sunil Kim, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil, Tybie Fitzhugh.
More info if you would like to appear here: https://www.patreon.com/TwoMinutePapers

Károly Zsolnai-Fehér's links:
Instagram: https://www.instagram.com/twominutepapers/
Twitter: https://twitter.com/twominutepapers
Web: https://cg.tuwien.ac.at/~zsolnai/

## Содержание

### [0:00](https://www.youtube.com/watch?v=TrdmCkmK3y4) Segment 1 (00:00 - 05:00)

Dear Fellow Scholars, this is Two Minute Papers with Dr. Károly Zsolnai-Fehér. Today, we have a selection of learning-based techniques that can generate images of photorealistic human faces for people that don’t exist. These techniques have come a long way over the last few years, so much so that we can now even edit these images to our liking, by, for instance, putting a smile on their faces, making them older or younger, adding or removing a beard, and more. However, most of these techniques are still lacking in two things. One is diversity of outputs, and two, generalization to multiple domains. Typically, the ones that work on multiple domains don’t perform too well on most of them. This new technique is called StarGAN 2 and addresses both of these issues. Let’s start with the humans. In the footage here, you see a lot of interpolation between test subjects, which means that we start out from a source person, and generate images that morph them into the target subjects, not in any way, but in a way that all of the intermediate images are believable. In these results, many attributes from the input subject, such as pose, nose type, mouth shape and position are also reflected on the output. I like how the motion of the images on the left reflects the state of the interpolation. As this slowly takes place, we can witness how the reference person grows out a beard. But we're not nearly done yet. We noted that another great advantage of this technique is that it works for multiple domains, and this means, of course, none other than us looking at cats morphing into dogs and other animals. In these cases, I see that the algorithm picks up the gaze direction, so this generalizes to even animals. That's great. What is even more great is that the face shape of the tiger appears to have been translated to the photo of this cat, and, if we have a bigger cat as an input, the output will also give us… this lovely, and a little plump creature. And... look! Here, the cat in the input is occluded in this target image, but that is not translated to the output image. The AI knows that this is not part of the cat, but an occlusion. Imagine what it would take to prepare a handcrafted algorithm to distinguish these features. My goodness. And now, onto dogs. What is really cool is that in this case, bendy ears have their own meaning and we get several versions of the same dog breed, with, or without them. And it can handle a variety of other animals too. I could look at these all day. And now, to understand why this works so well, we first have to understand what a latent space is. Here you see an example of a latent space that was created to be able to browse through fonts, and even generate new ones. This method essentially tries to look at a bunch of already existing fonts and tries to boil them down into the essence of what makes them different. It is a simpler, often incomplete, but, more manageable representation for a given domain. This domain can be almost anything, for instance, you see another technique that does something similar with material models. Now, the key difference in this new work compared to previous techniques, is that it creates not one latent space, but several of these latent spaces for different domains. As a result, it can not only generate images in all of these domains, but can also translate different features, for instance, ears, eyes, noses from a cat to a dog or a cheetah in a way that makes sense. And the results look like absolute witchcraft. Now, since the look on this cheetah’s face indicates that it has had enough of this video, just one more example before we go. As a possible failure case, have a look at the ears of this cat. It seems to be in a peculiar midway-land between a pointy and a bent ear, but it doesn’t quite look like any of them. What do you think? Maybe some of you cat people can weigh in on this. Let me know in the comments.

### [5:00](https://www.youtube.com/watch?v=TrdmCkmK3y4&t=300s) Segment 2 (05:00 - 05:00)

Thanks for watching and for your generous support, and I'll see you next time!

---
*Источник: https://ekstraktznaniy.ru/video/14098*