Google’s New AI: Fly INTO Photos! 🐦
8:01

Google’s New AI: Fly INTO Photos! 🐦

Two Minute Papers 11.09.2022 619 023 просмотров 17 562 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
❤️ Train a neural network and track your experiments with Weights & Biases here: http://wandb.me/paperintro 📝 The paper "Infinite Nature: Perpetual View Generation of Natural Scenes from a Single Image" is available here: https://infinite-nature.github.io/ ❤️ Watch these videos in early access on our Patreon page or join us here on YouTube: - https://www.patreon.com/TwoMinutePapers - https://www.youtube.com/channel/UCbfYPyITQ-7l4upoX8nvctg/join 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Aleksandr Mashrabov, Alex Balfanz, Alex Haro, Andrew Melnychuk, Benji Rabhan, Bryan Learn, B Shang, Christian Ahlin, Eric Martel, Geronimo Moralez, Gordon Child, Ivo Galic, Jace O'Brien, Jack Lukic, John Le, Jonas, Jonathan, Kenneth Davis, Klaus Busse, Kyle Davis, Lorin Atzberger, Lukas Biewald, Matthew Allen Fisher, Michael Albrecht, Michael Tedder, Nevin Spoljaric, Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Rajarshi Nigam, Ramsey Elbasheer, Steef, Taras Bobrovytsky, Ted Johnson, Thomas Krcmar, Timothy Sum Hon Mun, Torsten Reil, Tybie Fitzhugh, Ueli Gallizzi. If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers Thumbnail background image credit: https://pixabay.com/images/id-1761292/ Thumbnail background design: Felícia Zsolnai-Fehér - http://felicia.hu Károly Zsolnai-Fehér's links: Instagram: https://www.instagram.com/twominutepapers/ Twitter: https://twitter.com/twominutepapers Web: https://cg.tuwien.ac.at/~zsolnai/

Оглавление (2 сегментов)

Segment 1 (00:00 - 05:00)

Dear Fellow Scholars, this is Two Minute  Papers with Dr. Károly Zsolnai-Fehér. Today, we are able to take a bunch of photos,  and use an AI to magically, create a video   where we can fly through these photos. It is  really crazy, because this is possible today,   for instance, here is NVIDIAs method that can be  trained to perform this in a matter of seconds. Now, I said that in these we can fly through  these photos. But here is an insane idea:   what if we used not multiple photos, but just  one photo, and we don’t fly through it, but fly   into this photo. Now, you are probably asking,  Károly, what are you talking about? This   is completely insane, and it wouldn’t work with  these NERF-based solutions like the one you see   here, these were not designed to  do this at all! Look. Oh yes. That. So, in order to fly into these photos, we  would to invent at least 3 things. One,   image inpainting. Look, if we are to fly into  this photo, we will have to be able to look at   regions between the trees. Unfortunately, these  are not part of the original photo, and hence,   new content needs to be generated intelligently.   That is a formidable task for an AI,   and luckily, image inpainting techniques  already exist out there. Here’s one. But inpainting is not nearly  enough. Two. As we fly into a photo,   completely new regions should also  appear that are beyond the image. This   means that we also need to perform image  outpainting, creating these new regions.    Continuing the image, if you will. Luckily, we are  entering the age of AI-driven image generation,   and this is also possible today, for  instance, with this incredible tool. But even that is not enough. Why is that? Well,  three! As we fly closer to these new regions,   we will be looking at fewer and fewer pixels  and from closer and closer, which means…this.    Oh my, another problem. So, we surely can’t solve  this, right? Well, great news - we can! Here is   Google’s diffusion-based solution to super  resolution, where the principle is simple:   have a look at this technique from last year. In  goes a course image or video, and this AI-based   method is tasked with…this! Yes. This is not  science fiction. This is super resolution,   where the AI starts out from noise and  synthesizes crisp details onto the image. So, this might not be such an insane idea after  all! So, does the fact we can do all three of   these separately mean that this task is easy?   Well, let’s see how previous techniques were able   to tackle this challenge. My guess is that  this is still sinfully difficult to do. And…oh   boy. Well, I see a lot of glitches and not a lot  of new, meaningful content being synthesized here.    And note these are not some ancient techniques,  these are all from just two years ago.    It really seems there is not a lot of hope here.   And now, hold on to your papers, and let’s see   how Google’s new AI puts all of these  together, and lets us fly into this photo.    Wow, this is so much better! I love it. Clearly, not perfect, but I feel that this   is the first work where the flying into  photos concept really comes into life. And it has a bunch of really cool features too,  for instance, one, it can generate even longer   videos, which means that after a few seconds,  everything that we see is synthesized by the   AI. Two, it supports not only this boring linear  camera motion, but these really cool, curvy camera   trajectories too. Putting these two features  together, we can get these cool animations   that were not possible before this paper. Now,  the flaws are clearly visible for everyone,   but this is a historic episode where we can  invoke the three Laws of Papers to address them. The First Law Of Papers says that research  is a process. Do not look at where we are,   will be two more  papers down the line. With this concept,   we are roughly where DALL-E 1 was about a year  ago. That is an image generator AI that could

Segment 2 (05:00 - 08:00)

produce images of this quality. And, just one year  later, DALL-E 2 arrived, which could do this! So,   just imagine what kind of videos this will be  able to create just one more paper down the line. The Second Law of Papers says  that everything is connected.    This AI technique is able to learn  image inpainting, image outpainting,   and super resolution techniques at the same time,  and even combine them creatively. We don’t need   3 separate AIs to do these, just one  technique. That is very impressive. And finally, the Third Law Of Papers says  that a bad researcher fails 100% of the time,   while a good one only fails 99% of the time.   Hence, what you see here is always just 1%   of the work that was done. Why is that? Well, for  instance, this is a neural network-based solution,   which means that we need a ton of training data  for these AIs to learn on. And hence, scientists   at Google also needed to create a technique to  gather a ton of drone videos on the internet   and create a clean dataset also with labelings as  well. The labels are essentially depth information   which shows how far different parts of the image  are from the camera. And they did it for more   than 2 million images in total. So, once again,  if you include all the versions of this idea   that didn’t work, what you see here  is just 1% of the work that was done. And now, we can not only fly through photos, but  also fly into photos. What a time to be alive! What do you think? Does this get your  mind going? What would you use this for?    Let me know in the comments below! Thanks for watching and for your generous  support, and I'll see you next time!

Другие видео автора — Two Minute Papers

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник