# DeepMind's AI Learns Complex Behaviors From Scratch | Two Minute Papers #239

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=veWkBsK0nwU
- **Дата:** 27.03.2018
- **Длительность:** 3:13
- **Просмотры:** 26,511
- **Источник:** https://ekstraktznaniy.ru/video/14491

## Описание

The paper "Learning by Playing - Solving Sparse Reward Tasks from Scratch" is available here:
https://arxiv.org/abs/1802.10567

Our Patreon page: https://www.patreon.com/TwoMinutePapers

We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Andrew Melnychuk, Brian Gilman, Christian Ahlin, Christoph Jadanowski, Dennis Abts, Emmanuel, Eric Haddad, Esa Turkulainen, Evan Breznyik, Frank Goertzen, Malek Cellier, Marten Rauschenberg, Michael Albrecht, Michael Jensen, Nader Shakerin, Raul Araújo da Silva, Robin Graham, Shawn Azman, Steef, Steve Messina, Sunil Kim, Torsten Reil.
https://www.patreon.com/TwoMinutePapers

One-time payment links are available below. Thank you very much for your generous support!
PayPal: https://www.paypal.me/TwoMinutePapers
Bitcoin: 13hhmJnLEzwXgmgJN7RB6bWVdT7WkrFAHh
Ethereum: 0x002BB163DfE89B7aD0712846F1a1E53ba6136b5A
LTC: LM8AUh5bGcNgzq6HaV1jeaJrFvmKxxgiXg

Thumbnail background image credit: https://pixabay.com/photo-2009819

## Транскрипт

### Segment 1 (00:00 - 03:00) []

de fellow scholars dit is toen minuut papers het cadeau zo'n afm wie enforcement learning is een learning garrett hem de choosers set of actions in een wire meant to maximize score this class of techniques in ebels as to try to master en large variety of video games en tess many more cool applications reinforcement learning type keywords wel want de real world of dance was dus mini chocolate with meer statief het leo game and after my kingdom is take the immediate lee die it is easy to identify rejection of force was de mystiek however after i were just paars wie haar likely playing some denk dat is een kind toe en land en strether die planner game is blaast in this possible dat we rat mini werd in de final battle op maar dit is also possible dat lilas de game wii earlier due to building de roon kan de economie de ramillies and other possible reese's pieces budget feedback en hou miley gaf dan only once and much after piaggio sonar x'ims learning from sparsely woord is wij challenging even for humans and it gets you work in this problem formules en bedankt he fanny teachers dat guy' de learning of the great hem en no prior knowledge of the environment solist problems and salmon team possible to self zowat the deep mind is come up with to at least hebben chance of het rood think it and now hold on to your paper speak as this and great and learns like a baby larsen but its environment dit is messed up before the start solving problems the a great would be released into the environment to experiment and master bezig task in this case of final goal moet witte tidy up de table first the ugly turns to activate iets hebt ik sensors controle joints and fingers them first to grab een object en dan de stek object on top of each other and in the end the euro but we learn de tidying up with nothing has met de sequence of the elementary actions dat dit had already mastered the hungry tom ossenhaas en intern als kepler dat decide which should be the next action to master wat keeping in mind dat de goal is to maximize progres om de main task switching tidying up de table in this case and now on tour validation miljard king about software projects de question of reliability of images zo de question is how we this technique werk in reality and whites would be de ultieme test and running dit een real robotarm leslokaal en marvel artifact dat dit easily find en moesten green blog to the appropriate sport en nood dat dit het learn how to delete from scratch much like en baby i will learn to perform some tasks and also note that this was de software project dat was die plooit om de robotarm matt smith dat die er is om generalized world for different control mechanisms and property dat is highly sought after one touch ingebouwd in telefoons en the further progress in machine learning research is indicative of the future this may learn how to perform backflips and play video games van een superhuman level within two follow-up pepers he cannot wait to see that en navigeer to report on that for you back for watching and for your generous support en asiel nextime