❤️ Check out Cohere and sign up for free today: https://cohere.ai/papers
📝 The paper "Do As I Can, Not As I Say: Grounding Language in Robotic Affordances" is available here:
https://say-can.github.io/
https://arxiv.org/abs/2204.01691
🕊️ Check us out on Twitter for more DALL-E 2 related content! https://twitter.com/twominutepapers
❤️ Watch these videos in early access on our Patreon page or join us here on YouTube:
- https://www.patreon.com/TwoMinutePapers
- https://www.youtube.com/channel/UCbfYPyITQ-7l4upoX8nvctg/join
🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Aleksandr Mashrabov, Alex Balfanz, Alex Haro, Andrew Melnychuk, Benji Rabhan, Bryan Learn, B Shang, Christian Ahlin, Eric Martel, Geronimo Moralez, Gordon Child, Ivo Galic, Jace O'Brien, Jack Lukic, Javier Bustamante, John Le, Jonas, Jonathan, Kenneth Davis, Klaus Busse, Lorin Atzberger, Lukas Biewald, Matthew Allen Fisher, Michael Albrecht, Michael Tedder, Nevin Spoljaric, Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Rajarshi Nigam, Ramsey Elbasheer, Steef, Taras Bobrovytsky, Ted Johnson, Thomas Krcmar, Timothy Sum Hon Mun, Torsten Reil, Tybie Fitzhugh, Ueli Gallizzi.
If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers
Thumbnail image: OpenAI DALL-E 2
Thumbnail background design: Felícia Zsolnai-Fehér - http://felicia.hu
Károly Zsolnai-Fehér's links:
Instagram: https://www.instagram.com/twominutepapers/
Twitter: https://twitter.com/twominutepapers
Web: https://cg.tuwien.ac.at/~zsolnai/
#google #imagen
Оглавление (2 сегментов)
Segment 1 (00:00 - 05:00)
Dear Fellow Scholars, this is Two Minute Papers with Dr. Károly Zsolnai-Fehér. Today, you are going to see how one area of research can unlock new capabilities on a seemingly completely different area. We are now living the advent of amazing AI-based language models. For instance, OpenAI’s GPT-3 technique is capable of all kinds of wizardry, for instance, finishing our sentences, or creating plots, spreadsheets, mathematical formulae, and many other things. While their Dall-E 2 AI is capable of generating incredible quality images from our written descriptions. Even if they are too specific. Way too specific. Now note that all this wizardry is possible as long as we are dealing with text and images. But how about, endowing a real robot, moving around in the real world with this kind of understanding of language. I wonder what that could do? Well, check this out! Scenario 1. This little robot it uses GPT-3 and other language models, and it not only understands it, but it can also use this knowledge to help us. Don’t believe it? Have a look. For instance, we can tell it that we spilled the coke, and ask it how it can help us out? And, it recommends finding the coke can, picking it up, going to the trash can, throwing it out, and bringing us a sponge. Yes, we will have to do the rest of it ourselves, but still, wow, good job, little robot. Now the key here is that it not only understands what we are asking, propose how to help, but, hold on to your papers, because here comes Scenario 2. Oh my! It also looks around, locates the most important objects required, and now, it knows enough to make recommendations as to what to do. And of course, not all of these make sense. Look at that! It can say that it is very sorry about the mess. Well, thank you for the emotional support, little robot, but we also need a little physical help here. Oh yes, that’s better. And now, look, it uses its eyes to look around, yes, I have my hand, well, does it work? Yes it does, great, now, coke can, trash can, sponge. Hmm…it’s time to make a cunning plan. Perfect! And, it can also do plenty more. For instance, if we’ve been reading research papers all day and we feel a little tired, if we tell it, it can bring us a water bottle, hand it to us, and can even bring us an apple. Now I’d love to see this…be gentle, and! Oh my! Thank you so much! The amenities at Google HQ seem to have leveled up to the next level. And, believe it or not, these were just really simple things, it can do way better. Look. This is a plan that requires planning 16 steps ahead, and it does not get stuck, and doesn’t mess up too badly anywhere. This one is as close to a personal butler as it gets. Absolutely incredible. These are finally real robots that can help us with real tasks in the real world. So cool! What a time to be alive! Now, this is a truly amazing paper, but make no mistake, not even this is perfect. For instance, the success rate for the planning is about 70%, and it can properly execute the plans most of the time, but clearly, not all the time. The longer term planning results may also need to be a bit cherry-picked to get a good one. It doesn’t always succeed. Also, note that all this is played at 10x speed, so, it takes a while. Clearly, typing the message and waiting for the robot still takes longer than just doing the work. However, this is an excellent opportunity for us to apply the First Law Of Papers, which says, that research is a process. Do not look at where we are, will be two more papers down the line. If OpenAI’s image generator AI, DALL-E looked like this, and just a year and a paper later, it looks like this. Well, just imagine what this will be able to do just a couple more papers down the line. And, what do you think? Does this get
Segment 2 (05:00 - 06:00)
your mind going? If you have ideas for cool applications, let me know in the comments below! Thanks for watching and for your generous support, and I'll see you next time!