Stable Diffusion Tutorial: How to generate your own images from text

9:33

Stable Diffusion Tutorial: How to generate your own images from text

AssemblyAI 24.08.2022 21 464 просмотров 235 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

Text-to-image generation has been one of the hot topics in AI if not the most popular topic over the last year. Unfortunately, up until now, these advanced models like DALL-E 2 or Imagen have not been open to public use. Thus only a lucky few, who managed to get beta access have been generating images. But not anymore! The weights of Stable Diffusion have just been released and now anyone can use them to generate images. In this video, we will go through a Google Colab Notebook that generates images given a text prompt. If you'd like to instead run Stable Diffusion locally on your computer (GPU required) check out Ryan O'Connor's article here: https://www.assemblyai.com/blog/p/c0cd220f-c10a-4e3f-8b54-bd91b0a4dd5b/ 📒 Find the notebook here: https://colab.research.google.com/drive/1uWCe41_BSRip4y4nlcB8ESQgKtr5BfrN?usp=sharing 🎇 Generated images: https://drive.google.com/drive/folders/1U5jbkPAGRQZ3MKJLSZb1Ge-upIj8Y5Vg ✍️ Ryan O'Connor's article: https://www.assemblyai.com/blog/p/c0cd220f-c10a-4e3f-8b54-bd91b0a4dd5b/ ✅ List of released models: https://huggingface.co/CompVis How is Stable Diffusion different than Dall-e 2 or Imagen? All of these models are based on diffusion models with differences in how the architecture is set up and the kind of data that is being used to train the models. The biggest difference between, though, is that the weights of the trained Stable Diffusion model have been released for public use. What is Stable Diffusion based on? Stable Diffusion uses a technique called diffusion models to achieve its results. To learn more about how diffusion models work, check out this video: https://youtu.be/yTAMrHVG1ew How does Stable Diffusion work? Stable Diffusion works by adding noise to an image and then learning to undo this addition and reach the original image. Supported by text information, the model learns how to go from pure noise to an image that reflects the given prompt. ▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬▬ 🖥️ Website: https://www.assemblyai.com 🐦 Twitter: https://twitter.com/AssemblyAI 🦾 Discord: https://discord.gg/Cd8MyVJAXd ▶️ Subscribe: https://www.youtube.com/c/AssemblyAI?sub_confirmation=1 🔥 We're hiring! Check our open roles: https://www.assemblyai.com/careers ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ #machinelearning #DeepLearning

Оглавление (3 сегментов)

Intro

stable diffusion is a text to image diffusion model whose weights have been made public and the creators of stable diffusion who are stability ai made this move with the aim of democratizing image generation in the world and now with the public weights anyone can generate their own images so here are some examples of the images that we've created but some really good examples are a selfie taken on mars good versus evil or for the realistic image of iron man making breakfast but you can see all of these images and more if you like in our drive folder i will leave a link to that drive in the description below so there are two options how you can run stable diffusion and generate images the first one is locally on your computer but for that one you're going to need a gpu and i don't have a gpu so i'm going to show you the second option which is to run it on google call app but one caveat there you're going to need google collab pro because we need a bit of a higher ram than what is available with the free version of google collab to be able to create these images so the steps that we're going to take to either run it locally on your computer or on google collab are more or less the same but as i said in this video i'm going to show you the google call out version if you want a more detailed tutorial on how to do it locally on your computer you can go and check out my colleagues article ryan o'connor's article i will leave a link below this video or also somewhere in the corner here all right let's get started and don't forget that if you create something fun you can share it on twitter and tag us our handle is at assemblyai this collab notebook is also

Tutorial

available for you through the link in the description by the way so you can go ahead and follow along as i go through it so the first step is to change the runtime settings as i said once you change your google call app to google collab pro you can go to runtime change runtime type and make sure that your hardware accelerator is gpu and then you also should make sure that your runtime shape is high ram so not standard but high ram because we need a lot of ram to be able to create the images next requirement is to install miniconda here we are working with python 3. 7 so we are going to install download and install the miniconda version that works with it and then install it and then initialize the conda environment with this comment conda init bash once that's done we can clone the repository of the stable diffusion and then create and activate an environment for this model to run in and lastly the last thing that we want to do here is to download the weights of this model so one thing to note here is that you might have realized we're getting the version one to four uh version of this stable diffusion weights but if you go to their release notes you might see that they have a bunch of different versions and basically version 1 to 4 is the highest one as you can see here and the other versions are a little bit less trained with less data and they make the explanation here is that each one was created from the checkpoint of the previous version and was trained for additional steps in specific variants of the data set so basically the higher number you go the more trained of a model that you're going to get and the one we're getting right now is one to four so once this is done we can actually already generate images and this is the line that we should use to generate these images that also includes the prompt so i will run this example and then i'll tell you what all of these things mean all right this is what you should see if everything went well and you generated an image where you're going to find the images is in disabled diffusion folder there's going to be a samples folder and inside this folder we're going to find our image and when i download it here is what that image is going to look like so my prompt was let's see what it was a painting of a beluga veil sitting at a cafe so it's not exactly sitting it's lying on the table but it is a beluga veil and it is a cafe so it's quite impressive and it's also a painting style not a photorealistic style so let me go through these um parameters or arguments that were passing to this command and what they are how you can change them so orion has prepared a really nice summary of what each of these things are so let's go through them one by one prompt of course is the prompt that you're passing to the model and the image that you want to create the prompt for you can also pass a from file argument and in that one you can pass the name of a file that includes a lot of prompts so not just one but if you want to pass a bunch of prompts at the same time you can use from file you can use ckpt if you want to change the checkpoint that this model is run from you can use outer to specify where you want to put the images that you created so here our out there is basically just wherever we are the directory that we're in that's why it creates a samples folder exactly here but if you want to create other folders you can go ahead and do that some other important ones are the diffusion steps here normally the default value is 50 and ryan has a really good example here of what changes when you use different diffusion steps so as you can see if you pass only 10 diffusion steps you're going to have less of a detail and the higher you go with diffusion steps the better of an image that you're going to get but after 50 diffusion steps there is not much of a big difference so you might as well use 50 diffusion steps and that's also going to save you some time other than that you can specify the number of samples that you want right now we are creating only one sample here as you can see number of samples is only one per prompt but if you want to get more than one image per sample of per prompt you can change this to whatever you want two three five another thing you can change is the width and height of your image so by default there are 5 12 512 by 512 images you can give them smaller numbers but ryan has observed in the article also that the bigger the image created the better the image quality and also the caption alignment of the image so you can see some examples here when you pass it the prompt of guy fury giving a tour of a haunted house the smaller images cannot really capture the details of what's going on and the bigger images are able to capture it better and one last thing you can change about width and height is the aspect ratio so how does the width and height relate to each other ryan has found that depending on the caption you might need to use different aspect ratios so maybe a vertical image would give you better results maybe a horizontal but more or less for any general purpose probably square aspect ratio so identical aspect ratios would work fine so this is the general things that you should know about generating the images using this command you know you can generate more samples you can change the height or width of the image but one of the main parts of this command is the prompt right so the prompt that you're going to create is going to directly affect the image that you're creating and there is such a thing called prompt engineering so making sure that you write a prompt that is going to be understood by the machine to be able to generate the best quality images so let's think about that for a second so

Tips

there are a couple of things that you should think about when you're writing your prompts for example if you write a prompt and you're not getting good results that might mean that you might need to add something like an image of a photorealistic image of a drawing off sort of phrases at the beginning of your prompt another information that you can pass to your machine is what kind of style do you want this image to be created in so for example if you're creating a painting you can pass it the information of do you want to be in an expressionist style or surrealistic style for example or if you're creating a image or an illustration you can pass it the information of what kind of aesthetic do you want to be in for example cyberpunk aesthetic or cordish core aesthetic etc and another thing that works really well is to pass it some information of style based on the person who made it so an artist so this could be a sculpture a painter you know you can say in the style of van gogh and because fungal has a very specific style you will be able to see it in the generated images or salvador dali for example has a very specific style so when you say an image in the style of salvador dali you're probably going to get some good results all right but that said about generating images with stable diffusion if you have this notebook your job is going to be super simple you just need to run through it and generate some prompts that that's going to be fun for you uh if you have any questions don't forget to leave a comment below and once again if you create something fun don't forget to share it with us on social media you can find us in basically any social media with at assembly ai uh we're looking forward to seeing what you're going to create with this

Другие видео автора — AssemblyAI

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник