# This Neural Network Optimizes Itself | Two Minute Papers #212

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=6JZNEb5uDu4
- **Дата:** 06.12.2017
- **Длительность:** 4:32
- **Просмотры:** 59,143

## Описание

The paper "Hierarchical Representations for Efficient Architecture Search" is available here:
https://arxiv.org/pdf/1711.00436.pdf

Genetic algorithm (+ Mona Lisa problem) implementation:
1. https://users.cg.tuwien.ac.at/zsolnai/gfx/mona_lisa_parallel_genetic_algorithm/
2. https://users.cg.tuwien.ac.at/zsolnai/gfx/knapsack_genetic/

Andrej Karpathy's online demo:
http://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html

Overfitting and Regularization For Deep Learning - https://www.youtube.com/watch?v=6aF9sJrzxaM
Training Deep Neural Networks With Dropout - https://www.youtube.com/watch?v=LhhEv1dMpKE
How Do Genetic Algorithms Work? - https://www.youtube.com/watch?v=ziMHaGQJuSI

We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Andrew Melnychuk, Brian Gilman, Christian Ahlin, Christoph Jadanowski, Dave Rushton-Smith, Dennis Abts, Eric Haddad, Esa Turkulainen, Evan Breznyik, Kaben Gabriel Nanlohy, Malek Cellier, Marten Rauschenberg, Michael Albrecht, Michael Jensen, Michael Orenstein, Raul Araújo da Silva, Robin Graham, Steef, Steve Messina, Sunil Kim, Torsten Reil.
https://www.patreon.com/TwoMinutePapers

One-time payments:
PayPal: https://www.paypal.me/TwoMinutePapers
Bitcoin: 13hhmJnLEzwXgmgJN7RB6bWVdT7WkrFAHh

Music: Antarctica by Audionautix is licensed under a Creative Commons Attribution license (https://creativecommons.org/licenses/by/4.0/)
Artist: http://audionautix.com/ 

Thumbnail background image credit: https://pixabay.com/photo-2692456/
Splash screen/thumbnail design: Felícia Fehér - http://felicia.hu

Károly Zsolnai-Fehér's links:
Facebook: https://www.facebook.com/TwoMinutePapers/
Twitter: https://twitter.com/karoly_zsolnai
Web: https://cg.tuwien.ac.at/~zsolnai/

## Содержание

### [0:00](https://www.youtube.com/watch?v=6JZNEb5uDu4) Intro

Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér. As we know from the series, neural network-based techniques are extraordinarily successful in defeating problems that were considered to be absolutely impossible as little as ten years ago. When we'd like to use them for something, choosing the right kind of neural network is one part of the task, but usually the even bigger problem is choosing the right architecture. Architecture typically, at a bare minimum, means the type and number of layers in the network, and the number of neurons to be used in each layer. Bigger networks can learn solutions for more complex problems. So it seems that the answer is quite easy: just throw the biggest possible neural network we can at the problem and hope for the best. But if you think that it is that easy or trivial, you need to think again. Here's why. Bigger networks come at a cost: they take longer to train, and even worse, if we have a networks that are too big, we bump into the problem of overfitting.

### [1:00](https://www.youtube.com/watch?v=6JZNEb5uDu4&t=60s) Overfitting

Overfitting is the phenomenon when a learning algorithm starts essentially memorizing the training data without actually doing the learning. As a result, its knowledge is not going to generalize for unseen data at all. Imagine a student in a school who has a tremendous aptitude in memorizing everything from the textbook. If the exam questions happen to be the same, this student will do extremely well, but in the case of even the slightest deviations, well, too bad. Even though people like to call this rote learning, there is nothing about the whole process that resembles any kind of learning at all. A smaller neural network, a less knowledgeable student, who has done their homework properly would do way, way better. So this is overfitting, the bane of so many modern learning algorithms. It can be kind of defeated by using techniques like L1 and L2 regularization or dropout, these often help, but none of them are silver bullets. If you would like to hear more about these, we've covered them in an earlier episode, actually, two episodes, as always, the links are in the video description for the more curious Fellow Scholars out there.

### [2:08](https://www.youtube.com/watch?v=6JZNEb5uDu4&t=128s) Architecture Search

So, the algorithm itself is learning, but for some reason, we have to design their architecture by hand. As we discussed, some architectures, like some students, of course, significantly outperform other ones and we are left to perform a lengthy trial end error to find the best ones by hand. So, speaking about learning algorithms, why don't we make them learn their own architectures? And, this new algorithm is about architecture search that does exactly that. I'll note that this is by far not the first crack at this problem, but it definitely is a remarkable improvement over the state of the art. It represents the neural network architecture as an organism and makes it evolve via genetic programming. This is just as cool as you would think it is and not half as complex as you may imagine at first - we have an earlier episode on genetic algorithms, I wrote some source code as well which is available free of charge, for everyone, make sure to have a look at the video description for more on that, you'll love it!

### [3:08](https://www.youtube.com/watch?v=6JZNEb5uDu4&t=188s) Results

In this chart, you can see the number of evolution steps on the horizontal x axis, and the performance of these evolved architectures over time on the vertical y axis. Finally, after taking about 1. 5 days to perform these few thousand evolutionary steps, the best architectures found by this algorithm are only slightly inferior to the best existing neural networks for many classical datasets, which is bloody amazing. Please refer to the paper for details and comparisons against state of the art neural networks and other architecture search approaches, there are lots of very easily readable results reported there. Note that this is still a preliminary work and uses hundreds of graphics cards in the process. However, if you remember how it went with AlphaGo, the computational costs were cut down by a factor of ten within a little more than a year. And until that happens, we have learning algorithms that learn to optimize themselves. This sounds like science fiction. How cool is that? Thanks for watching and for your generous support, and I'll see you next time!

---
*Источник: https://ekstraktznaniy.ru/video/14545*