# Google DeepMind's Deep Q-Learning & Superhuman Atari Gameplays | Two Minute Papers #27

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=Ih8EfvOzBOY
- **Дата:** 22.11.2015
- **Длительность:** 3:49
- **Просмотры:** 49,916
- **Источник:** https://ekstraktznaniy.ru/video/14915

## Описание

Google DeepMind implemented an artificial intelligence program using deep reinforcement learning that plays Atari games and improves itself to a superhuman level. The technique is called deep Q-learning, it uses a combination of deep neural networks and reinforcement learning, and it is capable of playing many Atari games as good or better than humans. After presenting their initial results with the algorithm, Google almost immediately acquired the company for several hundred million dollars, hence the name Google DeepMind. I am sure that this is one of the biggest triumphs of deep learning, especially given the fact that now the first few successful experiments for 3D games are out there!

________________________

The Nature paper "Human-level control through deep reinforcement learning" is available here:
http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html
http://www.cs.swarthmore.edu/~meeden/cs63/s15/nature15b.pdf

The code is available here:
https://sites.google.

## Транскрипт

### Segment 1 (00:00 - 03:00) []

dear fellow Scholars this is 2minute papers with Caro here this one is going to be huge certainly one of my favorites this work is a combination of several techniques that we have talked about earlier if you don't know some of these terms it's perfectly okay you can remedy this by clicking on the popups or checking the description box but you'll get the idea even watching only this episode so first we have a convolutional neural network this helps processing images and understanding what is depicted on an image and a reinforcement learning algorithm this helps creating strategies or to be more exact it decides what the next action we make should be what buttons we push on a joystick so this technique mixes together these two concepts and we call it deep Q learning and it is able to learn to play games the same way as a human would it is not exposed to any additional information in the code all it sees is the screen and the current score when it starts learning to play an old game Atari Breakout at first the algorithm loses all of its lives without any signs of intelligent action if we wait a bit it becomes better at playing the game roughly matching the skill level of an adapt player but here's the C catch if we wait for longer we get something absolutely spectacular it finds out that the best way to win the game is digging a tunnel through the bricks and hit them from behind I really didn't know this and this is an incredible moment I can use my computer this box next to me that is able to create new knowledge find out new things I haven't known before this is completely absurd science fiction is not the future it is already here it also plays many other games the percentages show the relation of the game scores compared to a human player above 70% means it's great and above 100% it's superum as a follow-up work scientists at Deep Mind started experimenting with 3D games and after a few days of training it could learn to drive on idea racing lines and pass others with ease I've had my driving license for a while now but I still don't always get the idea racing lines right Bravo I have heard the complaint that this is not real intelligence because it doesn't know the concept of a ball or what it is exactly doing edar dyra once said the question of whether machines can think is about as relevant as the question of whether submarines can swim beyond the fact that rigorously defining intelligence leans more into the domain of philosophy than science I I'd like to add that I am perfectly happy with effective algorithms we use these techniques to accomplish different tasks and they are really good problem solvers in the Breakout game you as a person learn the concept of a ball in order to be able to use this knowledge as a Machinery to perform better if this is not the case whoever knows a lot but can't use it to achieve anything useful is not an intelligent being but an encyclopedia what about the future there are two major unexplored directions the algorithm doesn't have long-term memory and even if it had it wouldn't be able to generalize its knowledge to other similar tasks super exciting directions for future work thanks for watching and for your generous support and I'll see you next time