# Generate Images with Your Voice Using DALL-E | Tutorial

## Метаданные

- **Канал:** AssemblyAI
- **YouTube:** https://www.youtube.com/watch?v=fRa2rmDvOCY
- **Дата:** 28.04.2022
- **Длительность:** 21:40
- **Просмотры:** 13,546
- **Источник:** https://ekstraktznaniy.ru/video/13126

## Описание

Create your own DALL-E image generation app in Python with Streamlit and DALL-E Mini.

DALL-E Playground: https://github.com/saharmor/dalle-playground
DALL-E Mini: https://github.com/borisdayma/dalle-mini
DALL-E explanation: https://youtu.be/F1X4fHzF4mQ
Code: https://github.com/AssemblyAI-Examples/dalle-mini-python-app

Get your Free Token for AssemblyAI Speech-To-Text API 👇https://www.assemblyai.com/?utm_source=youtube&utm_medium=referral&utm_campaign=yt_pat_32

▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬▬

🖥️ Website: https://www.assemblyai.com
🐦 Twitter: https://twitter.com/AssemblyAI
🦾 Discord: https://discord.gg/Cd8MyVJAXd
▶️  Subscribe: https://www.youtube.com/c/AssemblyAI?sub_confirmation=1
🔥 We're hiring! Check our open roles: https://www.assemblyai.com/careers

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

Timestamps:
00:00 Intro
00:20 DALL-E Playground
01:33 Backend part
02:54 Streamlit frontend part
14:40 Adding voice commands

#MachineLearning #DeepLearning #DALL-E

## Транскрипт

### Intro []

hi everyone in this video i show you how you can use your own dahli version to create images from a text prompt so we're going to build a streamlet application that runs a dali model in the background and then we go even one step further and use speech recognition to build a speech to image application so without further ado let's get started

### DALL-E Playground [0:20]

so i found this awesome repository dali playground that lets you play around with dali on your machine and this is based on dali mini which is another great open source project that tries to re-implement dali and this works pretty well and what this dolly playground is doing is that it consists of a back end and a front end and the back end is actually running in a google colab so this will give you free gpu support and then simply runs the dali mini in a back-end application and then it also has a front-end that will then call the backend as a service basically and then here it will generate the images so what we're going to do now in this video is that we use this backend in a collab so this will then run locally on our machine and the dali playground front end is actually implemented in javascript so what i'm going to do here is that i reverse engineered this part and implement this in python with streamlit because i love python and this gives us a lot of flexibility to build on top of this project so yeah let's do this so we start with

### Backend part [1:33]

the back end and for this we simply have to click on this link which will open the google call up and this is actually very short so it will just install a few packages and then it runs the application on our localhost port 8000 and for this it uses local tunnel to use our own localhost and then the backend application is actually also already implemented in python with flask so yeah let's run the code all right so our app is running and here in the top we also get a url so this is a public url that we can use to call our backend so let's copy this and now first let's try out this playground in here so for this you can simply click on this link here and then it will open the playground so let's copy and paste the backend url and then now it checked that our backend service is running so now we can enter a text prompt so for the start let's try cat and rainbow and then you can also specify the number of images and now let's hit enter and see if this works all right and here we get our two images with a cat and rainbow so pretty interesting stuff and now as a next step let's implement

### Streamlit frontend part [2:54]

our own python app to call our backend service which is now still running here so for this let's open our editor and create one file main dot pi and now as first thing we want to reverse engineer the back end part so for this i actually want to create another file that i call dolly. pie and now let's look at the code again to reverse engineer this so basically there are two approaches we can take we can either look at the back end and have a look at the app and then we find out that this is a flask app and then we can basically have a look at the end points that it provides so here we see that we have one at slash dali and this is a post method and this will generate the images and then we have one at the home base route and this is a get method and this is simply a health check so we could have a look at this or we can also have a look at the front end code and if we go in here then we see this back-end api javascript file and here this has two functions one is called dali service and this is using the back end url plus the dolly slash dolly so here again we see this is a post method with some headers and it sends the text and the number of images and the second function is check if valid backend and this simply calls the base url and also with the headers so now we have to implement these two functions in python and this is basically all we have to do to reverse engineer this now so let's start by implementing this in python and for this we use the requests module so we need one function that we call check if valid backend which gets a url and in here we use a try except block so with the try block we call the url and see if this works so we say response equals requests. get and then the base url then we also specify a timeout and say this should timeout after five seconds and then we also use some headers so headers equals and for the headers we can go back to the javascript code and then we grab this header object here so let's copy and paste this in here and now we have to make sure that this is valid python code so this is a dictionary with these two values so the one is responsible that we don't have a tunnel reminder for our local tunnel so this is basically just to avoid a warning and then we also say no course so now this is a valid python dictionary so we can say headers equals headers and now if this works then we should get a response and then we want to say return response dot status code equals 200. so if this works correct then we return true in this function and otherwise we want to catch one exception and requests dot exceptions dot time out and if we have a timeout then we return false so this is the first function we have to use and now the second function so for this we call this um call dolly and this gets the url as well then it gets the text prompt and number of images so let's say by default this is one and now here we again have a look at what is happening in the javascript code so it uses json and passes the text and the number of images so now let's do the same here so we create a data object and again here we use a python dictionary with the key text and this gets the text from the parameter and then we use the number of images key and here we put in the number of images and then again we um call our backend service so we get a response by saying response equals requests and now we need a post method a post request and the url is the base url plus slash dolly and then again we need to pass in the headers so we say headers equals headers and we also need to pass in the json data so we say json equals data and now this is basically all we need so now again we can say if response dot status code equals 200 then we simply return the response here and otherwise this function will return none and now we have reverse engineered everything that we need so now we can start implementing our front end um so here i think this was automatically inserted so i actually don't need this so now let's create one more helper function here to display or to call these two functions and then display the images so for this let's create a function create and show the images and this needs the text prompt and the number of images and now for this we use streamlit so up here i say um import streamlet as st and now first let's check if we have a valid backend so we say valid equals check if valid backend and it needs the url so let's create the url up here so we say url equals and for this we need to go back to our playground and then grab the url from here so this is what we're going to use so copy and paste and now we can use this so now let's check if this is valid and then we say if not valid then we want to display an error so we say streamlets. right and then let's say the backend service is not running and otherwise we can go on so we say else and then we do the second call so we say response equals call dolly and then again it needs the url then it gets the text and the number of images and then we check if response is not none so if we return this here then we can extract the json from this and then go over this so we can say for data in response. json and i already played around with this so i know this is going to be a list with um one element for each image that we want so we can iterate over this and then we can um we have this as a base 64 encoded string so now we need to get a image representation for this so for this we have to decode this again so for this we can import base 64 and then here we can say image data equals base 64 dot b 64 decode and we simply want to decode the data and now this is a valid image data so we could store this into a jpeg file for example or we can simply display it in streamlit by saying streamlet. image and imagedata and this is all that we need here so now we can go back to the main file and here we implement our main app so here again we say import streamlet sst then we also want to say from dali we import the helper function create and show images then we give our app a title so we say streamlets. title and we call this dolly and then mini and then let's first create a text field so i say text equals streamlets dot text input so here we can put in our prompt so let's give it a label what should i create question mark then let's also give it a slider to select the number of images so num images equals streamlet. slider and here we can give it a label as well how many images and we can also give it a min and a max value so let's say from one to six then let's give it a button so we say okay equals streamlet button and then the button simply says go and then we say if okay so if we click the button then we want to call our helper function create and show images with the text and the number of images and this is all that we need for our app so now we can go to the terminal and to run this we can now say streamlet run and then the name of the file main. pi so let's execute this and now our app is running at localhost port 8501 so we can go to our browser and paste in this url and now here we have our streamlet app so now we can give it the same prompt rainbow or what did we use in the beginning cat and rainbow and we want two images and then let's click on go and cross fingers and it's working how awesome is this so as you can see we get two images where we can see a cat and a rainbow sort of so yeah this is pretty cool so now i want to do one more step to

### Adding voice commands [14:40]

make this application even cooler and now use speech recognition and a voice command to create this command here so let's do this so to use a voice command we are going to use assembly ai which offers a speech-to-text api and also a real-time api so for this we can sign up for free and then we simply have to grab the api key and then we can close this again and jump back to the code and now let's create one file configure dot pi and here i create one variable api key equals a string and now i paste in my key and now we can use this so in the code we say from configure import the api key and we already have one tutorial on our channel that explains how to use the real-time api in python step by step so i'm simply going to copy and paste the code from the other tutorial in here and you can also find the code on github so i will put the link in the description below so what we need for this is we need to import web sockets then async io json and here we also need base64 and then to capture our microphone input we need pi audio so of course you have to install these libraries so pi audio web sockets and streamlet and you can simply do this with pip and now we are going to use a mechanism in streamlit to store a session state so for this the first thing we do is we say if and then we check if the key text is not in the session state yet not in streamlet dot session state then we initialize this by saying streamlet session state and now we put in the text key and in the beginning this is a empty string then we also want to store a variable that we call run which defines if we are currently recording with the microphone or not so in the beginning this is false and then we need one more button here so above the text input we say streamlet. button and then we say say something something and this should start the microphone recording so for this we use a callback so we say on click equals start listening and for this we need to define one function so we call this start listening and this simply um sets the session state of run to true now and then we want one more thing so now we want to give the text input a default value so by default we say the value equals the session state with the text key and then we check if we trigger or if we click on the button and if we have a valid text and if text and um so no second if here so if okay and text then we create and show the images and now this is all that we need so now let's quickly go over the rest of the code here so here we capture the microphone with pi audio then we have one function send and receive that uses web sockets and now makes a connection to the assembly ai real-time api and in here we have a while true loop and now we exchange this to while the session state run is true then we want to send the data to the web socket and then we have a receive function so here again we say while the session state run is true and here we wait for the message to get back from the web service and if we have a final transcript then we store this in the session state so this is the text then we also set run to false again and then we call streamlet experimental rerun so now what this is doing is that it will run this from the top again but now we have the session state set so now we have our value in here and then we have the text and then it will create and display the images so now let's save this and we can actually simply rerun our app so now if i click on reload again then we get an error off key is not defined so let's click on this and we have a typo here so here i use api key and here i also use api key so i assume that in the code i called this off key so let's search for this so here we have to use api key and yeah this is the only place where we have to use it so let's go back and let's rerun this again and now let's click on the button cat and rainbow and it inserted the text in here so now let's click on go and let's again select two images and go and it's working so now we have our app with voice command so let's try this again a red car on the moon and voice control is working fine so let's click on the go button again alright and here we have our two images so yeah this sort of looks like a red car on the moon maybe not this one but still we can see the red color and yeah this is working super nice and now we have a speech to image application how cool is this so this is even cooler than text to image and i hope you enjoyed this tutorial if you did so then please hit the like button and consider subscribing to our channel and then i hope to see you in the next video bye
