# AI Learns Real-Time 3D Face Reconstruction | Two Minute Papers #245

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=m9XyXiL6n8w
- **Дата:** 26.04.2018
- **Длительность:** 2:35
- **Просмотры:** 36,964
- **Источник:** https://ekstraktznaniy.ru/video/14477

## Описание

The paper "Joint 3D Face Reconstruction and Dense Alignment with Position Map Regression Network" and its source code is available here:
https://arxiv.org/abs/1803.07835
https://github.com/YadiraF/PRNet

Addicted? Pick up cool perks on our Patreon page! - https://www.patreon.com/TwoMinutePapers

A few comments with some of the best applications:
Lowell Camp - "This technology could be used for consumer-budget markerless facial motion capture, and if a follow-up paper enhances it with audio analysis for tongue posing, then it would require very little touch-up beyond a little temporal filtering." 
Milleoiseau - "VOIP in game but with face tracking."
Evan - "Could this be used for some kind of automatic lip-reading system for deaf viewers to view live events?"
Matan - "Monitor emotions for product improvement."
Idjles Erle - "Reconstructing ancestors faces from photos that are 150 years old. Working out from old photos who is more likely rested to whom."
Morph Verse - "Maybe create a too

## Транскрипт

### Segment 1 (00:00 - 02:00) []

Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér. Today we have two extremely hard problems on the menu. One is facial alignment and the other is 3D facial reconstruction. For both problems, we have an image as an input, and the output should be either a few lines that mark the orientation of the jawline, mouth and eyes, and in the other case, we are looking for a full 3D computer model of the face. And all this should happen automatically, without any user intervention. This is extremely difficult, because this means that we need an algorithm that takes a 2D image, and somehow captures 3D information from this 2D projection, much like a human would. This all sounds great and would be super useful in creating 3D avatars for Skype calls, or scanning real humans to place them in digital media such as feature movies and games. That would be amazing, but, is this really possible? This work uses a convolutional neural network to accomplish this, and it not only provides high-quality outputs, but it creates them in less than 10 milliseconds per image, which means that it can process a hundred of them every second. That is great news indeed, because it also means that doing this for video in real time is also a possibility! But not so fast, because if we are talking about video, new requirements arise. For instance, it is important that such a technique is resilient against changes in lighting. This means that if we have different lighting conditions, the output geometry the algorithm gives us shouldn't be wildly different. The same applies to camera and pose as well. This algorithm is resilient against all three, and it has some additional goodies. For instance, it finds the eyes properly through glasses, and can deal with cases where the jawline is occluded by the hair, or infer its shape when one side is not visible at all. One of the key ideas is to give additional instruction to the convolutional neural network to focus more of its efforts to reconstruct the central region of the face because that region contains more discriminative features. The paper also contains a study that details the performance of this algorithm. It reveals that it is not only five to eight times faster than the competition, but also provides higher quality solutions. Since these are likely to be deployed in real-world applications very soon, it is a good time to start brainstorming about possible applications for this. If you have ideas beyond the animation movies and games line, let me know in the comments section. I will put the best ones in the video description. Thanks for watching and for your generous support, and I'll see you next time!