# [ML News] Stable Diffusion Takes Over! (Open Source AI Art)

## Метаданные

- **Канал:** Yannic Kilcher
- **YouTube:** https://www.youtube.com/watch?v=xbxe-x6wvRw
- **Дата:** 19.09.2022
- **Длительность:** 27:27
- **Просмотры:** 82,860
- **Источник:** https://ekstraktznaniy.ru/video/12634

## Описание

#stablediffusion #aiart #mlnews 

Stable Diffusion has been released and is riding a wave of creativity and collaboration. But not everyone is happy about this...

Sponsor: NVIDIA
GPU Raffle: https://ykilcher.com/gtc

OUTLINE:
0:00 - Introduction
0:30 - What is Stable Diffusion?
2:25 - Open-Source Contributions and Creations
7:55 - Textual Inversion
9:30 - OpenAI vs Open AI
14:20 - Journalists be outraged
16:20 - AI Ethics be even more outraged
19:45 - Do we need a new social contract?
21:30 - More applications
22:55 - Helpful Things
23:45 - Sponsor: NVIDIA (& how to enter the GPU raffle)

References: https://early-hair-c20.notion.site/Stable-Diffusion-Takes-Over-Referenes-7a2f45b8f7e04ae0ba19dbfcd2b7f7c0

Links:
Homepage: https://ykilcher.com
Merch: https://ykilcher.com/merch
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://ykilcher.com/discord
LinkedIn: https://www.linkedin.com/in/ykilcher

If you want to support me, the best thi

## Транскрипт

### Introduction []

stable diffusion has been released to the public and the world is creative as never before it's an explosion of creativity collaboration and open Improvement but not everyone is happy today we'll look at how stable diffusion Works how it impacts the world and what people say about it welcome to a special edition of ml news you may remember imatma stock who I had as an interview guest here on the

### What is Stable Diffusion? [0:30]

channel the founder of stability AI has announced on August 22nd the public open source release of stable diffusion is a text to image model you give it a piece of text and it makes an image and the images it creates are stunning this image right here these images are created by stable diffusion this is not Photoshop this doesn't just adjust a little bit in existing image it creates images from Pure text so the cool thing about stable diffusion is that while similar models have been just available behind an API like open ai's Dali this is completely in the open you can just download the model and do whatever you want with it a small point there is actually a license on it but it's very permissive so almost whatever you want specifically you can change it you can update it you can monetize it and all of that stuff it's been trained on a subset of the lion 5B data set that's been filtered for specifically aesthetically pleasing images and that is a big part of why the results are so amazing and the craziest thing about all of this is this model does not need a data center to run it can actually run on a single GPU look this thing right here is enough to run the model give you the most beautiful images this enables so many people to take part and by the way if you want the 3090 I'm giving away one of them hey tionic from the future quick addendum it's actually a 3090 TI not just the 3090 so even better alright back to me in the past not only one I'm giving away one that's signed by Jensen Huang the CEO of Nvidia all you've got to do to take part is stay until the end of the video I'll tell you exactly how you can get it so here is how something like this would work you go to the hugging face demo or to the stable diffusion dream studio and you enter a prompt a bird with a funny hat now look at that

### Open-Source Contributions and Creations [2:25]

birds with funny hats and you know what happens when you release a model to the open when you release software for anyone to just use and adapt great things people almost immediately started improving this thing look at that all of a sudden someone figures out how to only use half as much memory well now the model runs on even more devices look at that someone built an on an X exporter well now I can throw it on sagemaker throw it into a tritone server people are writing tutorials how to run the model locally and in a collab oh look at that it's a little tool to make a collage picture one picture two picture three and the overlapping regions will just match look at that in painting amazing what it's an anime series about Oprah and Kyoto and look people are figuring out how to run it on an M1 Max GPU no wait M2 in less than 30 seconds look at this stuff this is created on a laptop incredible I guess we're doing videos now look here's a bunch of Bubbles and formulas alright biomorphic video this is certainly trippy Memento Mori a video consistency different styles looks amazing oh look there's a hugging face space called diffuse the rest what do you draw something look at that all right house house diffuse the rest look at that house nice house house and the biomorphic thing is still going and this enables so much look here children's drawing cool art look at that squirrel dragon but you see what's happening here people are taking this and they're making all kinds of stuff they're improving it in various ways and they are infinitely creative this is an explosion of creativity all of a sudden you don't need the skills of a painter anymore you don't need Photoshop skills or anything like that look at that it's lexicon it's a search engine where you can search through previously generated images along with their prompts look at this stuff this is so cool and it's all accessible it's all available and people are becoming so good at prompting these models look at this one this essentially has a few of the prompt tricks like stunning gorgeous much detail much WoW but the actual content of the picture is just a bunch of emojis a burger bunch of houses a tiger a fountain Harry Styles as a manga cover and this is just the beginning people are making web uis for the model you remember how Dali proudly presented the fact that you could make variation of images using their API you can do that too it's a simple radio app away look at that input image submit get your variations absolutely crazy you remember clip guided diffusion well how about clip guided stable diffusion bear holding a lollipop over the rooftop of Hong Kong looking at a UFO oh look hugging face has a library called diffusers all look stable diffusion is now in diffusers dad why is my sister's name Rose because your mother loves roses thanks Dad no problem stable diffusion evolution of the typical American living room from 1950 to 2040. according to stable diffusion look at that 50s 60s 70s tell me this is not crazy look stable diffusion is now in mid-journey and the quality is so good oh what people are building Photoshop plugins look at that in paint and paint around well this seems pretty cool too don't know what it is but pretty nice this is what happens when you give people the opportunity and the tools to build when you give them access the freedom to make what they want they make absolutely great things this thing here it's an alternative web UI well why only rely on one company making a web UI why not give users the option then choose the best these models are so good and versatile look at this stuff it's amazing I don't know what this is but nice so people are experimenting with this stuff figuring out what's going on right here which parameters do what lots of Investigation into the model because it's just accessible there's entire notebooks just trying to figure out what the individual parts of the model do how you change stuff what happens when you change stuff not only do people build great things around the model people also understand the model much better and therefore are able to push it to improve it in a much greater speed this one's called visual grounding guided in painting So up here you have an astronaut you say the part that you want to replace helmet what do it with flower and I mean it's not exactly only the helmet but you can see where this is going these are just the first iterations of an entire age that we are about to begin note how crazy this is just a combination of two or three of these models made it such that I don't even have to click anywhere in the image I can just interact with these things via text we are just natural language how many people does this make art and design and in general creative Endeavors accessible too oh wow it's Jeff vlon Zucker Gates look at all the variations of things that are in there this is crazy now as I said we're only at the start and people are improving this day by day one Improvement that I

### Textual Inversion [7:55]

would specifically like to highlight is called textual inversion is a technique where you take a bunch of images like a very few New Image is 5 images 10 images of a thing and you tell you teach the model about that thing and once you've done that the model kind of knows the concept of that thing and can then make new generations according to the theme so here's what I mean for example here you give it a bunch of images of a yoga pose and you teach the model that this is kind of a New Concept you can give it a name in this case they call it s star because if you could use any name in the world obviously you would choose s Star as a name in any case now you can give this s star to the model along with a prompt and the model will create images according to that concept so this is a great way to teach this model new things that it didn't know about you can't do it with every and anything but you can sort of teach it a concept and look textual inversion is already in hugging face diffusers and look there is already a library of pre-made things that people have taught the stable diffusion model so all of these things are Concepts that people have previously ran textual inversion on and therefore you can simply take these Concepts and generate images according to these Concepts Super Mario World Map yeah let's use that sir land snw map not exactly but this is my very first try so we'll get there now about a week

### OpenAI vs Open AI [9:30]

after the release of stable diffusion opening I released a blog post that they're now introducing out painting to their Dali API Dolly being the model that they've trained they have behind their API they let you interact with it if you are on the beta users list so now you can take a picture and you can sort of out paint from it generate surroundings of that picture according to Dali I guess what instead of waiting for open AI to build this into their API with stable diffusion someone can just go and make it someone can just take the model and build a little UI that does out painting look at that give it a prompt click there's a window there's girl now I can't say whether this is in response to stable diffusion or just by accident but open AI also update their pricing recently to make it significantly cheaper to use their text apis now dully the image generator is still in beta but also there they now have a commercial model so for 115 Generations you're paying 15 but therefore you're allowed to commercialize the images that you get out of Dolly now as you can see right here in the official UI of stable diffusion the one from stability AI an image cost One Credit is one cent that's over 10 times cheaper than Dolly and keep in mind you can just download the model and run it yourself although I'm pretty sure like the electricity is gonna cost more than a cent per image and stable diffusion images that you make obviously you're able to commercialize those from the day it was publicly released but the battle between the API model of open Ai and the open model of stability yeah it doesn't end there open AI has recently announced they are now reducing bias and improving safety in Dali 2. they released a blog post where they say they're implementing a new technique so that the LI generate images of people that more accurately reflect the diversity of the world's population they simply say a new technique and they give an example when they search for a photo of a CEO or rather generate you see it's just men and with their new technique it is a rainbow of people of different ethnicities and genders and so on now again they don't say what the new technique is but people were wondering because it's not that easy to mitigate this kind of stuff now people found that there are some rather interesting side effects of this for example if they generate a professional DSLR color photograph of British soldiers during the American Revolution it seems to be let's say historically rather inaccurate and now it shows again how creative people are so in order to figure out what's running since we can't expect the code people came up with the idea maybe they're just kind of modifying your prompt so people entered as a pro prompt the sentence a person holding a sign that says but that's the prompt and what comes out this picture gets out of that other people have reproduced this The Prompt here says pixel art of a person holding a text sign that says and the picture is that so it turns out that the technique that open AI is advertising is they simply have like a predefined list of things and they append these things to your prompt thereby potentially completely destroying your prompt but neither would they say what the technique is nor do they let you opt out of the technique like in the name of safety they don't trust you they can't just say you know we actually found that this pretty simple thing mitigates a lot of the bias if you just append these kind of words to The Prompt then it actually works pretty well you'll get a pretty diverse result if you want to do so take it under consideration use it in our API we even made like a button for you to automatically append these words this would have been so much better than them just saying we have a new technique and no we're not gonna let you opt out of the technique whenever you enter a prompt that says beautiful summer morning a person meditates on the top of Mount Fuji watching the calm Sunset the birds fly across a river and the air is so pure in this blue nice sky Hindu elderly man it is how shall I say a philosophy it is we know what's good for you overheard in Silicon Valley safety safety open source on the other hand stability AI is partnering up with institutions around the world to make localized models of stable diffusion that seems to be much more sensible to get sort of all of the world to participate you go to places and you let people there improve the model make their own models so at the end it works

### Journalists be outraged [14:20]

for those people too but oh man it did not take long for people to not be happy about this at all simply giving people the tools and opportunity to be creative that doesn't sit well with some people Kotaku writes AI creating art is an ethical and copyright nightmare TechCrunch rights this startup is setting a dolly too like AI free consequences be damned you mean the consequences that anyone has the ability to make their own stuff oh yeah those be damned rather we write a hit piece on people but the same author at the same publication wasn't quite satisfied so about 10 days later another article deep fakes for all uncensored Air Model prompts ethics questions wow really two articles two hit pieces gotta milk it got a milk that was ethical questions that are raised right but don't worry the exact same author writes pieces such as rephrase AI lands fresh investment to grow its synthetic media platform in a quite positive piece about a company that makes synthetic media gee synthetic media like image and video generation I wonder what's the difference all right this one is actually controlled behind an API can be sold and can be controlled by just having one or two people at the correct places in a large company or in the app store or in the play appropriate journalistic channels right here's another one win. ai launches out of stealth with an AI assistant for sales calls oh wait like you know like a bot that makes sales calls for you know sales people like the most annoying calls you'll ever get and now it's an AI doing it for them I guess at least you can now swear at them without you having to feel bad for them or something like this again also completely positive coverage I don't know the model that can make Oprah Winfrey as an anime that's

### AI Ethics be even more outraged [16:20]

the problem consequences be damned and of course the AI ethics Community isn't happy at all because what's ethical about giving people access to tools and giving them the opportunity to make great things that's terrible you can always just pull one of like five different standard insults from the drawer and just accuse anyone that you don't like of one of these when you've got n Engineers cheerfully putting out models they know to be racist you've got a company with n racists you hear that stability Tai that's all of you that's it that's what it means and everyone taking part in it we need organizations like hugging face who is hosting stable diffusion for public download to act with courage and bring their might to the fire fighting effort and addressing Ahmad mustak directly if these Scholars are nobody to you are not qualified to work in this space well that's the thing about stuff being open and stuff being a free market he doesn't need to be qualified he can just do it it's fine but it's very clear what's going on some people enjoy the level of power that they have in big organizations if there's just a few big organizations a few big machine learning conferences a few Publications then you have a pretty solid grasp on power you can make noise on Twitter and you make sure that whatever happens needs to go through one of those people at least to get approval Distributing an open model to anyone where any anyone can improve anyone can do their thing build their stuff in a decentralized fashion means that power vanishes no one has to ask specifically any one person anymore whether they're allowed to do something whether something is ethical in their view or not I can't believe stable diffusion is out there for public use and that's considered as okay yes that's okay now as you can see the pressure on hogging face all of these people is getting pretty intense because how dare they just give something to people well here is what a member of their ethics team has to say I'm concerned about these things being over statements that function to give an impression that the release is something that ethics-minded AI people at least that hogging face signed off on we do not and did not sign off on anything we advise within an open source community that means we are working on licensing documentation and release strategies which any contributor can take or leave we are a resource not approvers really I recall I recall that was quite different a few months ago the evolution of centralized AI ethics don't be evil we decide what is evil we decide you are evil but what are they actually saying right here well you know if you have this model you could make any image that you want any image you could make a bad image like essentially they're saying like okay wait essentially there's essentially what they're saying is like this pen right here the fact that you can buy it in the store is terrible because you know what someone could do you know someone could like someone could could someone could write a dirty word with it

### Do we need a new social contract? [19:45]

but all that being said please let me know what you think there is absolutely issues around things like copyright here maybe we need a new social contract like you as an artist obviously put in a lot of work into making these images is it okay if then the machine simply grabs them into the training data set obviously it's okay for humans to be inspired by other pictures but in the world where machines can consume and produce you know millions and billions of images it tends to be a bit of a different story so maybe Society needs to evolve a little bit right there nevertheless I feel the explosion of creativity is great people are infinitely creative with these things and that is just such a good thing overall and the fact that someone can use it to make a nasty picture or the fact that it doesn't work for all kinds of pictures exactly the same to me it's just such a non-starter and it seems to be quite an dishonest argument that is just aimed at further centralization of power some people just don't like that things are available to the public to anyone without having to ask them first if something is okay I'm not hating on open AI or things like this who decide to put their models behind an API but don't at the same time talk about democratizing AI like it's completely cool you train a cool model you ask for money for people to use it that's fine but this is democratizing AI democratizing means giving people access to everything allowing people to take things for themselves make it better and give back to the community the explosion

### More applications [21:30]

of applications is absolutely great that we've seen look at this tool creates a color palette from a text nobody at open AI came up with this I'm fairly sure this is such a unique application but such a great thing you give a bunch of words you get a color palette out how awesome is that and that's what happens when you give people the tools and access and freedom and even better when the model runs on a consumer GPU so anyone can use it hello it's me from the editing room there's so much stuff coming out I really thought this should make this video but it appeared literally today so I saw it today this is dream textures which is an endless texture generator in blender directly in blender using stable diffusion to create unique and seamless textures this is a playlist of stable diffusion tutorials on YouTube this is Char Lee which is an app that will bring stable diffusion onto an M1 or M2 Mac in a single click and this is stable diffusion implemented using tensorflow and Keras by devam Gupta props to divam for implementing this here this is a serious effort not to be joked about all right back to me in the

### Helpful Things [22:55]

past but as I said let me know what you think alright just a few things that might be helpful too then the video is over deep Garg on Twitter announces the first ever Transformers seminar by Stanford this is a seminar called Transformers United and all the lectures are on YouTube so if you want to know something about Transformers from an academic perspective place to go another thing because it just starts like yesterday is the shifts challenge 2022 which evaluates robustness and uncertainty on real world data projects include things like white matter multiple sclerosis segmentation or Marine cargo vessel power as estimations so this is real world data and you have to act under uncertainty and distribution shifts and it's a challenge so if you're into challenges this one's starting right now all right so now I'm gonna tell you how you enter the raffle

### Sponsor: NVIDIA (& how to enter the GPU raffle) [23:45]

for the GPU this video is kindly sponsored by Nvidia specifically they want you to know about the GTC 2022 fall Edition GTC is nvidia's developer conference the one of the largest of its kind it's free to attend and it's full with amazing content of course the keynote by Jensen Huang is the biggest event and Jensen's gonna tell you all about the future plans of Nvidia and what's happening in the world of deep learning GPU Computing and everything around it now with Nvidia being the market leader that it is I'd say that's a pretty cool thing to attend now of course the focus are going to be things like more efficient deep learning but also things like the metaverse VR and collaboration such as this one Nvidia and Siemens part partner up to enable what they call the industrial Multiverse so this connects nvidia's Omniverse platform which is essentially a virtual reality platform to simulate the real world as closely as possible in order to design to train and to make forecasts this is being connected to the Siemens accelerator which Siemens being the hardware and sensor company that it is a platform for iot enabled hardware and software so you can imagine that as more and more of these companies pair up their systems and team up we're gonna get a richer and richer digital and real hybrid world I think this comes pretty close to the vision that Mark Zuckerberg had for the metaverse and I'd say in many ways closer than you know strapping on a VR headset and running around in VR chat so it's pretty cool to see the industrial applications of this GTC is going to be full with unique demos and workshops that you can attend and of course a lot of talks now next to the keynote there's also a fireside chat with a touring award winners they are all going to be there young Lacon Jeffrey and Joshua Benjo and for a full hour they'll share their opinions about the current state and future of AI research okay here's how you get into the raffle for the GPU go to ykilcher. com GTC now it's important that you sign up to GTC using my link this will track you in their system but once you've done that it's not enough you actually need to attend GTC Well I obviously suggest you attend the keynote but you can attend any session but it needs to be at least one session that you attend of the GTC conference once you've done that you'll be entered into the raffle for the GPU I'll notify the winner as soon as I know now there's one caveat this only counts for people in emea Europe the Middle East and Africa if you happen to live there great enter the raffle if you don't live there I'm sorry I don't have power over this but what I can do is I Can Raffle out a bunch of merch such as shirts like these so if you don't live in emea you can enter the raffle there and maybe get a shirt or whatever you want essentially so in any case the link is why kilcher. com GTC and even if you do not live in emea if you enter into the raffle it'd be absolutely great if you still attend the developer conference as long as you sign up using the link they'll still be able to track you and that gives me brownie points with Nvidia so again why culture. com GTC sign up to the conference using that link attend at least one session you'll be entered into the raffle automatically alright that was it thank you so much in video for sponsoring this video I'll see you at the GTC conference or in the next video bye-bye foreign I was gonna write fun what did you think