❤️ Check out Weights & Biases and sign up for a free demo here: https://www.wandb.com/papers
❤️ Their mentioned post is available here: https://wandb.ai/openai/published-work/Learning-Dexterity-End-to-End--VmlldzoxMTUyMDQ
📝 The paper "Learning to Summarize with Human Feedback" is available here:
https://openai.com/blog/learning-to-summarize-with-human-feedback/
Reddit links to the showcased posts:
1. https://www.reddit.com/r/AskAcademia/comments/lf7uk4/submitting_a_paper_independent_of_my_post_doc/
2. https://www.reddit.com/r/AskAcademia/comments/l988py/british_or_american_phd/
🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Aleksandr Mashrabov, Alex Haro, Alex Serban, Alex Paden, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bruno Mikuš, Bryan Learn, Christian Ahlin, Eric Haddad, Eric Martel, Gordon Child, Haris Husic, Ivo Galic, Jace O'Brien, Javier Bustamante, John Le, Jonas, Joshua Goller, Kenneth Davis, Lorin Atzberger, Lukas Biewald, Matthew Allen Fisher, Mark Oates, Michael Albrecht, Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Ramsey Elbasheer, Robin Graham, Steef, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil, Tybie Fitzhugh, Ueli Gallizzi.
If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers
Meet and discuss your ideas with other Fellow Scholars on the Two Minute Papers Discord: https://discordapp.com/invite/hbcTJu2
Thumbnail background image credit: https://pixabay.com/images/id-1989152/
Károly Zsolnai-Fehér's links:
Instagram: https://www.instagram.com/twominutepapers/
Twitter: https://twitter.com/twominutepapers
Web: https://cg.tuwien.ac.at/~zsolnai/
Оглавление (2 сегментов)
Segment 1 (00:00 - 05:00)
Dear Fellow Scholars, this is Two Minute Papers with Dr. Károly Zsolnai-Fehér. This paper will not have the visual fireworks that you see in many of our videos. Oftentimes, you get ice cream for the eyes, but today, you’ll get an ice cream for the mind. And, when I read this new paper, I almost fell off the chair, and I think this work teaches us important lessons and I hope you will appreciate them too. So, with that, let’s talk about AIs and dealing with text! This research field is improving at an incredible pace. For instance, four years ago, in 2017, scientists at OpenAI embarked on an AI project where they wanted to show a neural network a bunch of Amazon product reviews and wanted to teach it to be able to generate new ones, or continue a review when given one. Upon closer inspection, they noticed that the neural network has built up a knowledge of not only language, but also learned that it needs to create a state-of-the-art sentiment detector as well. This means that the AI recognized that in order to be able to continue a review, it needs to be able to understand English, and efficiently detect whether the review seems positive or negative. This new work is about text summarization, and it really is something else. If you read reddit, the popular online discussion website, and encounter a longer post, you may also find a short summary, a TLDR of the same post, written by a fellow human. This is good for not only the other readers who are in a hurry, but, it is less obvious is that it is also good for something else. And now, hold on to your papers, because these summaries also provide fertile grounds for a learning algorithm to read a piece of long text, and its short summary, and learn how the two relate to each other. This means that it can be used as training data and can be fed to a learning algorithm. Yum! And the point is that if we give enough of these pairs to these learning algorithms, they will learn to summarize other reddit posts. So, let’s see how well it performs. First, this method learned on about a hundred thousand well-curated reddit posts, and was also tested on other posts that it hadn’t seen before. It was asked to summarize this post from relationship advice subreddit, and let’s see how well it did. If you feel like reading the text, you can pause the video here, or if you feel like embracing the TLDR spirit, just carry on, and look at these two summarizations. One of these is written by a human, and the other one by this new summarization technique. Do you know which is which? Please stop the video and let me know in the comments below. Thank you! So this, was written by a human and this by the new AI. And while, of course, this is subjective, I would say that the AI-written one feels at the very least as good as the human summary, and I can’t wait to have a look at the more principled evaluation in the paper. Let’s see…the higher we go here, the higher the probability of a human favoring the AI-written summary to a human-written one. And we have smaller AI models on the left, bigger ones to the right. This is the 50% reference line, below it, people tend to favor the human’s version, and if it can get above the 50% line, the AI does a better job than human-written TLDRs in the dataset. Here are two proposed models, this one significantly underperforms, this other one is a better match. However, whoa! Look at that! The authors also proposed a human feedback model that, even for the smallest model, handily outperforms human-written TLDRs, and as we grow the AI model, it gets even better than that. Now that’s incredible, and this is when I almost fell off the chair when reading this paper. But! We’re not done yet, not even close. Don’t forget, this AI was trained on reddit, and was also tested on reddit. So our next question is, of course, can it do anything else? How general is the knowledge that it gained? What if we give it a full news article from somewhere else, outside of reddit? Let’s see how it performs. Hmm…of course, this is also subjective, but I would say both are quite good. The human-written summary provides a little more information, while the AI-written one captures the essence of the article and does it very concisely. Great job.
Segment 2 (05:00 - 08:00)
So, let’s see the same graph for summarizing these articles outside reddit. I don’t expect the AI to perform as well as with the reddit posts as it is outside the comfort zone, but…my goodness, this still performs nearly as well as humans. That means that it indeed derived general knowledge from a really narrow training set, which is absolutely amazing. Now, ironically, you see this Lead-3 technique dominating both humans and the AI. What could that be? Some unpublished, superintelligent technique? Well, I will have to disappoint, this is not a super sophisticated technique, but a dead simple one. So simple that it is just taking the first three sentences of the article, which humans seem to prefer a great deal. But note, that this simple Lead-3 technique only works for a narrow domain, while the AI has learned the English language, probably knows about sentiment, and a lot of other things that can be used elsewhere. And now, the two most impressive things from the paper, in my opinion: One, this is not a neural network, but a reinforcement learning algorithm that learns from human feedback. A similar technique has been used by DeepMind and other research labs to play video games or control drones and it is really cool to see them excel in text summarization too. Two, it learned from humans, but derived so much knowledge from these scores, that over time, it outperformed its own teacher. And the teacher here is not humans in general, but people who write TLDRs along their posts on reddit. That truly feels like something straight out of a science fiction movie. What a time to be alive! Now, of course, not even this technique is perfect, this human vs AI preference thing is just one way of measuring the quality of the summary, there are more sophisticated methods that involve coverage, coherence, accuracy, and more. In some of these measurements, the AI does not perform as well. But just imagine what this will be able to do two more papers down the line. Thanks for watching and for your generous support, and I'll see you next time!