# AIs Are Getting Too Smart - Time For A New "IQ Test” 🎓

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=nSHU-4Yt4eQ
- **Дата:** 26.10.2019
- **Длительность:** 4:40
- **Просмотры:** 81,480

## Описание

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambdalabs.com/papers

📝 The paper "SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems" is available here:
https://super.gluebenchmark.com
https://arxiv.org/abs/1905.00537

Our earlier video, "DeepMind's AI Takes An IQ Test":
https://www.youtube.com/watch?v=eSaShQbUJTQ

Our earlier video on the OpenAI Retro Contest is available here:
https://www.youtube.com/watch?v=2FHHuRTkr_Y

🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Alex Haro, Anastasia Marchenkova, Andrew Melnychuk, Angelos Evripiotis, Anthony Vdovitchenko, Brian Gilman, Bryan Learn, Christian Ahlin, Claudio Fernandes, Daniel Hasegan, Dennis Abts, Eric Haddad, Eric Martel, Evan Breznyik, Geronimo Moralez, James Watt, Javier Bustamante, John De Witt, Kaiesh Vohra, Kasia Hayden, Kjartan Olason, Levente Szabo, Lorin Atzberger, Lukas Biewald, Marcin Dukaczewski, Marten Rauschenberg, Matthias Jost,, Maurits van Mastrigt, Michael Albrecht, Michael Jensen, Nader Shakerin, Owen Campbell-Moore, Owen Skarpness, Raul Araújo da Silva, Rob Rowe, Robin Graham, Ryan Monsurate, Shawn Azman, Steef, Steve Messina, Sunil Kim, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil.
https://www.patreon.com/TwoMinutePapers

Splash screen/thumbnail design: Felícia Fehér - http://felicia.hu

Károly Zsolnai-Fehér's links:
Instagram: https://www.instagram.com/twominutepapers/
Twitter: https://twitter.com/karoly_zsolnai
Web: https://cg.tuwien.ac.at/~zsolnai/

## Содержание

### [0:00](https://www.youtube.com/watch?v=nSHU-4Yt4eQ) Segment 1 (00:00 - 04:00)

Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér. In a world where learning-based algorithms are rapidly becoming more capable, I increasingly find myself asking the question: “so, how smart are these algorithms, really? ”. I am clearly not alone with this. To be able to answer this question, a set of tests were proposed, and many of these tests shared one important design decision: they are very difficult to solve for someone without generalized knowledge. In an earlier episode, we talked about DeepMind’s paper where they created a bunch of randomized mind-bending, or in the case of an AI, maybe silicon-bending questions that looked quite a bit like a nasty, nasty IQ test. And even in the presence of additional distractions, their AI did extremely well. I noted that on this test, finding the correct solution around 60% of the time would be quite respectable for a human, where their algorithm succeeded over 62% of the time, and upon removing the annoying distractions, this success rate skyrocketed to 78%. Wow. More specialized tests have also been developed. For instance, scientists at DeepMind also released a modular math test with over 2 million questions, in which their AI did extremely well at tasks like interpolation, rounding decimals, integers, whereas they were not too accurate at detecting primality and factorization. Furthermore, a little more than a year ago, the Glue benchmark appeared that was designed to test the natural language understanding capabilities of these AIs. When benchmarking the state of the art learning algorithms, they found that they were approximately 80% as good as the fellow non-expert human beings. That is remarkable. Given the difficulty of the test, they were likely not expecting human-level performance, which you see marked with the black horizontal line, which was surpassed within less than a year. So, what do we do in this case? Well, as always, of course, design an even harder test. In comes SuperGLUE, the paper we’re looking at today, which is meant to provide an even harder challenge for these learning algorithms. Have a look at these example questions here. For instance, this time around, reusing general background knowledge gets more emphasis in the questions. As a result, the AI has to be able to learn and reason with more finesse to successfully address these questions. Here you see a bunch of examples, and you can see that these are anything but trivial little tests for a baby AI - not all, but some of these are calibrated for humans at around college-level education. So, let’s have a look at how the current state of the art AIs fared in this one! Well, not as good as humans, which is good news, because that was the main objective. However, they still did remarkably well. For instance, the BoolQ package contains a set of yes and no questions, in these, the AIs are reasonably close to human performance, but on MultiRC, the multi-sentence reading comprehension package, they still do OK, but humans outperform them by quite a bit. Note that you see two numbers for this test, the reason for this is that there are multiple test sets for this package. Note that in the second one, even humans seem to fail almost half the time, so I can only imagine the revelation we’ll have a couple more papers down the line. I am very excited to see that, and if you are too, make sure to subscribe and hit the bell icon to not miss future episodes. Thanks for watching and for your generous support, and I'll see you next time!

---
*Источник: https://ekstraktznaniy.ru/video/14231*