This New AI Can Find Your Dog In A Video! 🐩
6:32

This New AI Can Find Your Dog In A Video! 🐩

Two Minute Papers 10.03.2022 56 962 просмотров 3 010 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
❤️ Check out Fully Connected by Weights & Biases: https://wandb.me/papers 📝 The paper "MTTR - End-to-End Referring Video Object Segmentation with Multimodal Transformers" is available here: https://arxiv.org/abs/2111.14821 https://github.com/mttr2021/MTTR https://huggingface.co/spaces/MTTR/MTTR-Referring-Video-Object-Segmentation https://colab.research.google.com/drive/12p0jpSx3pJNfZk-y_L44yeHZlhsKVra-?usp=sharing 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Aleksandr Mashrabov, Alex Balfanz, Alex Haro, Andrew Melnychuk, Angelos Evripiotis, Benji Rabhan, Bryan Learn, Christian Ahlin, Eric Martel, Gordon Child, Ivo Galic, Jace O'Brien, Javier Bustamante, John Le, Jonas, Jonathan, Kenneth Davis, Klaus Busse, Lorin Atzberger, Lukas Biewald, Matthew Allen Fisher, Michael Albrecht, Michael Tedder, Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Rajarshi Nigam, Ramsey Elbasheer, Steef, Taras Bobrovytsky, Thomas Krcmar, Timothy Sum Hon Mun, Torsten Reil, Tybie Fitzhugh, Ueli Gallizzi. If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers Thumbnail background image credit: https://pixabay.com/images/id-5953883/ Thumbnail background design: Felícia Zsolnai-Fehér - http://felicia.hu Meet and discuss your ideas with other Fellow Scholars on the Two Minute Papers Discord: https://discordapp.com/invite/hbcTJu2 Károly Zsolnai-Fehér's links: Instagram: https://www.instagram.com/twominutepapers/ Twitter: https://twitter.com/twominutepapers Web: https://cg.tuwien.ac.at/~zsolnai/

Оглавление (5 сегментов)

<Untitled Chapter 1>

Dear Fellow Scholars, this is Two Minute Papers with Dr. Károly Zsolnai-Fehér. Today we are going to perform pose estimation, with an amazing twist. You’ll love it! But wait, what is pose estimation? Well, simple, a video of people goes in, and the posture they are taking comes out. Now, you see here that previous techniques can already do this quite well, even for videos. So, by today, the game has evolved. Just pose estimation is not that new, we need pose estimation plus something else. We need a little extra, if you will. So, what can that little extra be? Let’s look at three examples. For instance, one, NVIDIA has an highly advanced pose estimation technique that can refine its estimations by putting these humans into a virtual physics simulation. Without that, this kind of foot sliding often happens, but after the physics simulation

Kinematic Physics

not anymore. As a result, it can understand even this explosive sprinting motion. This dynamic tennis serve. You name it. All of them are very close. So, what is this good for? Well, many things, but here is my favorite - if we can track the motion well, we can

Virtual human

put it onto a virtual character, so we ourselves can move around in a beautiful, imagined world. So, that was one. Pose estimation + something extra, where the something extra is a physics simulation. Nice. Now, two, if we allow an AI to read the wifi-signals bouncing around in a room, it can perform pose estimation, even through walls. Kind of. Once again, pose estimation with something extra. And three, this is pose estimation with inertial sensors. This works when playing a friendly game of table tennis with a friend…or…wait. Or maybe, a not so friendly game of table tennis. And this works really well even in the dark. So, all of these are pose estimation plus something extra. And now, let’s have a look at this new paper, which performs pose estimation, plus…well, pose estimation as it seems. Okay, I don’t see anything nothing new here, really. Why is this work on Two Minute Papers? Well, now hold on to your papers, and check this out. Oh yes! So what is this? Well, here is the twist. We can give a piece of video to this AI, write a piece of text as you see up here, and it will not only find what we are looking for in the video, mark it, and then, even track it over time. Now that is really cool. We just say what we want to be tracked over the video, and it will do it automatically. It can find the dog and the capybara. These are rather rudimentary descriptions, but it is by no means limited to that.

a man wearing a white shirt and blue shorts riding a surfboard a yellowish surfboard carrying a man with long brown hair

Look, we can also say a man wearing a white shirt and blue shorts riding a surfboard. And, yes! It can find it. And, also we can add a description of the surfboard, and it can tell which is which. And I like the tracking too. This scene has tons of high-frequency changes, lots of occlusion, and it is still doing really well. Loving it. So, I am thinking that this helps us take full advantage of the text descriptions.

a skateboard being ridden by a person a skateboarder wearing black pant and white shirt

Look. We can ask it to mark the parrot and the cockatoo, and it knows which is which. So I can imagine more advanced applications where we need to find the appropriate kind of animal or object among many others, and we don’t even need to know what to look for. Just say what you want, and it will find it! I also liked how this is done with a transformer neural network that can jointly process the text and video in one elegant solution. That is really cool. Now, of course, every single one of you Fellow Scholars can see that this is not perfect. Not even close. Depending on the input, temporal coherence issues can arise, these are the jumpy artifacts from frame to frame. But, still, this is swift progress in machine learning research. We could only do this in 2018 and were very happy about it, and just a couple papers down the line, and now, we just say what we want and the AI will do it. And just imagine what we will have a couple more papers down the line. I cannot wait. So, what would you use this for? Please let me know in the comments below! And wait, the source code, an interactive demo, and a notebook are available in the video description. So, you know what to do. Yes, let the experiments begin! Thanks for watching and for your generous support, and I'll see you next time!

Другие видео автора — Two Minute Papers

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник