NVIDIA’s New AI: Paint Like Bob Ross!

6:08

NVIDIA’s New AI: Paint Like Bob Ross!

Two Minute Papers 30.12.2022 85 390 просмотров 3 641 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambdalabs.com/papers 📝 The paper "eDiff-I: Text-to-Image Diffusion Models with Ensemble of Expert Denoisers" is available here: https://deepimagination.cc/eDiff-I/ 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Aleksandr Mashrabov, Alex Balfanz, Alex Haro, Andrew Melnychuk, Benji Rabhan, Bryan Learn, B Shang, Christian Ahlin, Edward Unthank, Eric Martel, Geronimo Moralez, Gordon Child, Jace O'Brien, Jack Lukic, John Le, Jonas, Jonathan, Kenneth Davis, Klaus Busse, Kyle Davis, Lorin Atzberger, Lukas Biewald, Luke Dominique Warner, Matthew Allen Fisher, Matthew Valle, Michael Albrecht, Michael Tedder, Nevin Spoljaric, Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Rajarshi Nigam, Ramsey Elbasheer, Richard Sundvall, Steef, Taras Bobrovytsky, Ted Johnson, Thomas Krcmar, Timothy Sum Hon Mun, Torsten Reil, Tybie Fitzhugh, Ueli Gallizzi. If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers Thumbnail background design: Felícia Zsolnai-Fehér - http://felicia.hu Károly Zsolnai-Fehér's links: Mastodon: https://sigmoid.social/@twominutepapers Twitter: https://twitter.com/twominutepapers Web: https://cg.tuwien.ac.at/~zsolnai/

Оглавление (2 сегментов)

Segment 1 (00:00 - 05:00)

Dear Fellow Scholars, this is Two Minute Papers with Dr. Károly Zsolnai-Fehér. Today we are going to look at NVIDIA’s new AI research work which, as they say allows us to paint with words. So, let’s see. Yes, this runs a generative denoising process, or in other words, it starts out from a bunch of noise, and over time, uses our text prompt to rearrange it into an image that we described. Great. Then, it subjects these coarse images to a super resolution technique, which means that in goes this coarse image, and out comes an image with so much more detail. And hence, can generate images of this quality. Now, wait a minute. We are experienced Fellow Scholars over here, so we know that OpenAI’s DALL-E 2 can do this. And Google’s Imagen can do this. The free and open source stable diffusion can also do this. We have a number of papers that can pull this off really well. So, is NVIDIA a little late to the party? Why publish this paper? What is new here? Well, let’s push it to its limits through 3 really fun experiments and find out together. One, it gives us something that many of us Fellow Scholars desire. And that is, of course, more control over the synthesized images! For instance, here, we wish to create an image of boxing squirrels. Yes, you heard it right. But wait, here is the more granular control part: we can draw exactly where each squirrel and the boxing gloves go. And, there we go! Loving it. Or, if we wish to create a rabbit who is also a magician, we can also specify that it should stand on clouds, and we noted that it should cast a fireball. Now we can tell it where exactly that fireball should go. Two, it also follows our instructions really well when it comes to requesting styles as well. We can ask it to paint this penguin in the style of many famous artists, with really cool results. However, sometimes just saying which artist we are looking for is not that helpful. You know, which phase of the artist are we talking about? Or which particular work should it be based on? And hold on to your papers, because this can even help out when our words fail us. How? Well, in this case, we can also use an image instead. We can still add a text prompt, and it will create the new image in the style of this one. This is especially useful in cases when we have a style in mind that is really hard to explain. I love this one. So cool! Three, so how does it compare to the usual suspects? Well, let’s have a look at some teapots. Of course, to the surprise of no one, Stable Diffusion and DALL-E 2 are both capable of this task, however, look. We did not get a painting of a panda. And, with the new technique, look at that! Oh yes! Once again, it follows our instructions better. Now note that text to image AIs are not easy to evaluate as all models can generate a ton of different images for the same prompt. However, further comparisons reveal that there indeed is a pattern here. Now, have you noticed? There is a pattern in this video too. I keep saying that this new technique follows our instructions better, so here is the most important question. Why? How is all this wizardry possible? Well, this was one of my favorite parts of the paper. Have a look at this. The authors claim that as the classical text to image AIs start out from noise, they follow our prompt closely, however, later on in the image synthesis process, not so much. If we change the prompt for the last few percentage of the noise diffusion process, look. Ouch! It completely ignores it. And here comes the best part: the authors trained multiple separate denoiser networks that are suited to different parts of the generation process. Hence, yes, you guessed it right, these can follow our instructions better throughout later parts of the image generation process, and thus, give us better artistic control. Now I am sure that the next generation of text to image AIs are going to be even more powerful two more papers down the line, but this concept may live on to improve even these subsequent

Segment 2 (05:00 - 06:00)

versions. I am very excited to see if this will really be the case. What a time to be alive! Thanks for watching and for your generous support, and I'll see you next time!

Другие видео автора — Two Minute Papers

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник