# Build Hour: Image Gen

## Метаданные

- **Канал:** OpenAI
- **YouTube:** https://www.youtube.com/watch?v=8kM5aDD5gLI
- **Дата:** 03.09.2025
- **Длительность:** 52:16
- **Просмотры:** 9,955
- **Источник:** https://ekstraktznaniy.ru/video/11274

## Описание

Over 130M people created images with Image Gen in its first week inside ChatGPT. Now, with Image Gen in the API, developers can build the same capabilities directly into their own apps and platforms.This Build Hour walks through gpt-image-1 in the API, with demos on streaming, editing, and masking for real-world apps.

Bill Chen (Applied AI) covers:
- What’s new: text rendering, world knowledge, image inputs
- New capabilities: streaming, multi-turn editing, masking
- Best practices: picking the right API (image vs responses), customizing outputs, handling latency & UX tradeoffs 
- Live demo: building an AI-powered photobooth from scratch
- Customer spotlight: create AI presentations using Gamma (https://gamma.app/)
- Live Q&A

👉 Follow along with the code repo: https://github.com/openai/build-hours
👉 Check out the Image Gen Guide: https://platform.openai.com/docs/guides/image-generation
👉 Sign up for upcoming live Build Hours: https://webinar.openai.com/buildhours

## Транскрипт

### Segment 1 (00:00 - 05:00) []

Welcome back to Build Hours. I'm Christine on the startup marketing team and today I'm joined by Bill. — Hi everyone. I'm Bill. I'm a solutions architect on the startups team. — We have a really fun topic for you today on image gen. But first I want to give uh just a little quick refresher on the purpose of build hours for anyone new joining us. The goal of build hours is to empower you with the best practices, tools, and AI expertise to scale your company using OpenAI APIs and models. So on the right of your screen, we have both a chat function as well as a Q& A function. Uh would love for you guys to submit any questions you have as you're building with us today. We'll also include a link to the code that we'll be using, so you can actually follow along. Um, and again, anything comes up, feel free to put it in the Q& A. We have our team today in the room ready to answer any of your questions. We'll also leave some to answer live towards the end. I also wanted to drop our homepage uh link on the bottom of this screen where you can find any upcoming build hours. We are constantly taking your feedback, your suggestions, and adding more. So, we actually have three more coming up in June and July uh with some new topics. So, definitely check that out. To set the scene a little bit on image genen, we released image genen first in chat GPT back in March. Um I don't think you could have gone on social media without seeing one of these Studio Giblly images. They were all over the news feed. Um, in just the first week alone, we had 700 million images and over 130 million users um, generating images. So, this was super exciting and we knew we wanted to take it a step further and really get this in the hands of developers. So less than a month later, we launched in the open AI API with just one directive to build cool stuff. Um, and we saw anyone from startups all the way to Fortune 500 companies really taking image genen and bringing this to market, incorporating it into their tools and platforms. Uh we actually have one of our startup customers, Gamma, here with us uh who will be demoing some of the things that they've built um later on in um in this build hour. So here is what we're going to be talking about today. We have actually new capabilities, even more new capabilities, uh, which we'll go through things like text rendering, world knowledge, multi-turn editing, uh, and then a really fun demo. We're going to be live building a photo booth so you can actually see what Image Gen uh, can generate. And then, as I mentioned, Gamma will come on stage. We have their head of AI engineering with us today, Jordan, who will um, share a bit more on Gamma. And then my favorite part is Q& A. Um, as I mentioned, uh, we really try to get to all of your questions, but we do read all of them, um, and incorporate your feedback. So, feel free to answer anything, um, and submit your questions on the right. So, without further ado, I will hand it off to Bill. — Thank you, Christine, uh, for those, uh, introduction. And, uh, wow, it's really been a while since I've done something like this. The last time this was uh back all the way back in high school. And fun fact, I think I believe Christine and I have both done something like this uh back in high school. Uh so it's really nice to be back but just in front of a larger audience. Uh I'm assuming that most of y'all probably have heard about image uh and better yet played around with it yourself. uh for clarity sake uh it is helpful to define what image really is how it is different uh from our previous generations of text to image models and we'll also do a quick demo uh walk through by yours truly and afterwards uh we'll uh let GMA take the stage and let them show you the cool stuff that they have built. First of all, how is it different? What is imageen? uh immigen is different from the previous uh image generation models like dolly you probably have heard uh those are stable those are diffusion based uh models and the main difference here is that it is 40 native image gen is a 40 image in an image generation model meaning it is the same GPD 40 architecture behind the scenes powering everything and the generation of the images happens auto reggressively and means that it generates image the same way that GPD40 generates text. Uh the

### Segment 2 (05:00 - 10:00) [5:00]

generation of the image happens almost like a next token prediction. Well, it sounds like it shouldn't work that well, but uh in practice, what we found is that it works quite well and it brings you a host of benefits like being able to render text on top of images well, improved instruction following, granular image editing, as well as editing based on image input. Uh here is just a quick overview of all the benefits that we have to offer. I won't get too much into it. Uh for fear of it sound too salesy, feel free to take a pause here, take a quick screenshot. All of this will be available as recordings for you to take a look at as well. And just a couple of examples on what uh the improved text rendering could look like is you can have handwritten text, type text on different surfaces. And I still remember since we're on the topic of high school, I sort of remember back in high school, I ran uh student council election four times in a row, did not get elected all four times. But in those process, I um had to create a lot of uh posters and I remember having to put 10 hours at a time into creating some posters like that. Uh so um in the rare event that we have those of you in the audience are looking to do something like that, uh you can do that within 10 minutes. Uh so hopefully that saves a lot of your time. Uh the added world knowledge is also helpful here. Uh we found out that a lot of folks have been making great educational materials like science posters and uh science posters that explains concepts like photosynthesis here as an example shown on the slide here. Uh and cellular structures directly with simple oneline instructions without additional context. and immigot all of that because it is based on GPT40 and GPT40 has all of those world knowledge imbued with it during a training process and you can also make photorealistic renderings of real world places because of uh those real world knowledge. Image inputs is also nice. So, for example, here you can use multiple images combined with a prompt to generate uh a an a final image that incorporates all of those image inputs. Uh so, as you can see here, combining all of those images into a cohesive gift basket. And just to take you out of the presentation really quickly, we have um we do have an image gallery uh on our website. Here you can see uh which one all the images some of the images that have shown you and what prompts and as well as input that has gone into uh into generating those images and some of those uh some of those inputs have image inputs as well combined with text input worth playing around with it yourself. Oops. Going back to our presentation uh so I've talked enough about the capabilities uh that those are all available in chat GBT. How can you build anything with it? Uh as you might have suspected uh these are available in the API as uh GPT uh image one that brings the experience uh to the API. So I will talk briefly about its capabilities and explain to you how uh you might best use it. Just to get things out of the way, uh we have actually released a couple of new features last week. Last week released uh those new exciting features as a part of the responses API uh improvements. So image gen now is available as a built-in tool inside of responses API. Uh so those improvements include streaming, multi-turn editing, multi-tool image generation as well as masking. I will get into each of these uh here. streaming quite self-explanatory. Uh image end does take a decent amount of time to completely finish uh generating an image anywhere between 30 seconds to a minute depending on your settings and to enable you to build uh responsive user experiences. Uh we've added streaming feature in the responses API that allows you to stream partial renderings of the image as they become available for the uh before the full image uh gets completed. Uh multi-turn editing is also incidentally quite a self-explanatory. You can pass in an image and uh through its ID wholesale or wholesale uploading it uh in order to combine it with an additional text prompt uh resulting in a different image. So how it works is responses API provides you with an image ID or previous response ID with every response and you can pass that back into the next response so on and so forth resulting in a multi-turn editing uh

### Segment 3 (10:00 - 15:00) [10:00]

user experience multi-tool image generation uh in responses. So now you can use other built-in tools together with image as well. So rather than explaining to you with this uh slide, I think it's best uh to just show you a little bit how you can play with it yourself. Uh so let me open up uh the uh playgrounds open AAI playground. And so here uh is those of you familiar with it. This is where you can play around with the latest models that we have uh together and prompt those models and see what they produce. And make sure you're on responses API because responses API is one that offers built-in tools. And you can select image gen tool can add it and then we can also select uh web search tool uh just for demonstration purposes. So as I mentioned before image genen has uh really capable uh model of the world uh it has internal knowledges of the world how the world works but it still doesn't have access to real-time information. For those we can use the web search tool to look it up on the internet. And so what that would look like is uh we can do something like look up the weather in New York City right now and generate an a poster image with the information. And here I can send in uh this prompt. And as we can see uh by giving it access to web search tool as well as image generation tool responses API was able to decide intelligently itself using the GPD41 model to call the web search tool and we have looked up the proper weather. Uh the weather as we can see here today is May the 29th. Uh low 61 degrees Fahrenheit, high 70° F. cloudy in the morning, then intervals of clouds and sunshine with shower uh places in this afternoon. And as you can see here, we have the image gen tool starting to generate and start to stream in this response uh with up to-date information directly. And so all of this is available uh directly out of the API itself. Uh and you do not have to define uh your custom functions to implement this. So all this is available out of the box. Masking uh also quite self-explanatory. You can create masks and uh build uh in painting experiences. So for example here we create a mask and indicated that only certain areas is available for editing. And as you can see here the only that area was edited and nothing else was changed. We have a flamingo right here. Imageen used to be just simply text to image or image generation as a whole used to just be textto image experiences with uh these advancements in modern capabilities. For now uh we can truly say that design is can be thought of as a dialogue. That's enough talking about the capabilities. Now I would like to just quickly go over some of the ideas uh that have come up on what you can build with it. Just to dump a couple of use cases, ideas for you. Uh you can use it for marketing and brand design. Uh generating posters and marketing material of products on the fly has never been uh this easy. E-commerce as well as retail. This image was generated as you can see here. I literally pulled this out of the galleries that we had. And uh this is image if you recall was generated by combining an image of a model plus the product of the dress image product image of the product which is a dress that she's wearing right now. Now can you imagine an experience where you're let's say you're an e-commerce store that sells dresses. How cool would it be that you can let your customers uh try on the style before uh before they buy through their own photos. And you can certainly make a lot of cool educational posters out of it as well. As a child, I kept uh I used to read a lot of books and yes, I would love to have this. This is a bit of a meta slide. Uh since I've used image to generate this image to put in this presentation to tell you guys you can put uh you can generate images to use for presentation. So it's uh quite self-explanatory here. Uh, so great for presentation. I, this is a bit of a personal idea for me as well. I love games and I tried to build games of my own. When I was back in high school, I used something called RPG Maker, I believe. And I think the biggest pain was finding the right character assets, the sprites to put into the game. And if you're like me, you know that pain. Now is a perfect time to go back uh to these ideas and finish building that game uh that you wanted to build.

### Segment 4 (15:00 - 20:00) [15:00]

Uh this is definitely not an exhaustive list. Uh these are just what I can think of when I was putting together this deck uh in under an hour uh in a bit of a sleep-d deprived state. But you guys are probably smarter than I am. So I will let you guys uh take it away and build cool things with it. And but definitely let me know what you're trying to build. Now that we're talking about building, it is useful to get a little bit into the best practices as well. Uh, and also had a lot of fun making this slide as well because you can obviously see that I've generated a lot of uh those uh images myself. So, choosing the right API format, we offer image uh in two API formats. Uh, responses as well as image. image you might be familiar with uh when you if you have played with it uh using it to generate images with dolli uh we recommend using images for single turn straightforward text to images uh tasks uh but only that and for anything else actually for the I think for most use cases I will recommend using responses because it has built-in multi-turn multi-tool uh experience that might require additional reasoning on top since you can also call a base model that orchestrates all of those tools together. Next, you can also customize the image output as well. Uh size and quality affects the number of tokens that gets used. Uh and the model is built based on tokens. Uh you can use it uh to to you can fiddle around with the output uh parameters uh to get the the format that you wish and want. Also a couple of things here is you can only use transparent background uh with certain formats like PNG as well as WEBP. All of this is available in our uh documentation uh on image gen. And I think this last part seems straightforward but folks often forget is uh the user experience. Image gen does take a little bit longer to generate. How should that user experience be when the image is generating? Should it be streamed? Uh these are all the questions you should uh think of answering uh before building. There are also certain limitations by imagen as well. Uh for imagen has as well. Uh I will call those out uh in the next few slides. Uh but knowing what those limitations are and putting guard rails into place is also something that folks often miss. So limitations here. Um speaking of limitations, there are a couple of them and it's worth calling it out here. For one, the generation speed is quite a bit slower than before. Uh, but with streaming, you might be able to improve the user experience there. And text rendering is good, but definitely not perfect yet. And the other day, I was trying to generate a poster with uh some Chinese characters on top. And that someone who speaks Chinese um I wasn't able to understand a couple of those characters. So, for languages, text renderings that is not English, you might run into some of that as well. consistency uh for multi-turn image is good but also not perfect either. Also, one last note on moderation as well. Uh as with all our models that we make public, uh we put in a significant amount of consideration into safety and uh moderation. All generated images will be done in accordance to our content policy that's publicly available here. That means no violence, abuse or anything dangerous. uh there's a moderation parameter uh that you can pass in uh but low is as low it'll get and you can fine-tune the sensitivity yourself as well. So uh so it might uh so even then it might still might refuse uh depending on certain uh generations and thereby might may not be a good fit for your use case if you are looking to generate certain types of content uh despite good intentions or for example artistic uh context. Uh now that we're done with all the concepts here we're getting to the fun part. Uh let's build something together. Uh well, I've already built actually a lot of it uh so I'll be talking to you through it together, but we'll also be adding a couple of new features as well. So, let's get right into it. Well, here's a demo. This is a slide where we get into the demo. And great. And so, uh let's see here. Um let me just show you what the front end looks like. All right. Uh so here is an app that uh we built for our execs summit. Uh so execs summit is basically this event that happened three weeks ago where a bunch of Fortune 500 CEOs came to San Francisco to see what we had built here at OpenAI and discussed a lot of the things that's happening. And here we built a uh photo booth app and

### Segment 5 (20:00 - 25:00) [20:00]

I repurposed it for this uh demo, but I've also added a lot of the cool new features that I talked about uh during the presentation. This is basically a very simple Nex. js photo booth app. And what I can do here is I can upload a picture of myself that's pre-prepared. That's me. It's yours truly. Uh look at the way he smiles and look at the camera. Uh yeah, sorry. It's a little embarrassing. Um yeah, and so I have a set of modifiers available for me to choose here. So, this is where I turn to Christine, my partner in crime here, and see which one. Uh, let's — Yeah. — Uh, let's Okay, so we have to do the Gibway style. That's kind of a take in. Um, — knitted cozy scene. Yeah, — definitely. — Mhm. — Um, Japanese anime movie poster. — Okay. Yeah. — Okay. And last one, maybe the mini figure. — Awesome. Okay, those sound amazing. So, why don't we kick this off since I've said that image generation does take a couple of seconds to complete and this is also where we clasp our hands together and pray to the gods that live demo works. Uh again, we don't can any of our demos. Oh, as you can see here, things are actually looks to be streaming in. Uh and this is where it's helpful for me to open the Chrome developer tab to see what is happening behind. Well, this is sort of behind the front scenes. Behind the scenes, I'll also get into the code just in a bit. So, as you can see here, this is the first uh new feature that uh I've built into uh this demo. This wasn't available at the Exact Summit. So, this already is something that was new uh since last week. As you can see, all of the images are streaming in. So, at the ExecSummit, the way it worked is you will have to wait until the last image get generated. So folks were left twiddling their thumb until the last image is generated. As you can see here on the Chrome developer tab, the way how streaming works is we're passing in partial images. We have two types of images that we're passing back to the front end. We have partial and we have final. Partial images are basically like the ones well self-explanatory. They're partial images. With the ID, we're able to upload update the images at in every single each one of those panels. Um yeah, now that those images have been generated, let's look at the second thing that I I've added uh to our little demo here. So let's take an image of this Gibbly thing here. Um and and let's say we wanted to make certain changes uh for this image. Uh few weeks ago, this demo, you weren't able to do that, but now because of responses API, we're able to add uh additional prompts to change it. I'm kind of curious uh what's I guess everyone's favorite color uh here. Uh I'm kind of I want to uh give it to the chat. — Yeah. So, we actually had an overwhelming amount of people who voted for green. Just another one came in. So, I think we have to change this uh background to green. — Okay. Well, the background already seems pretty green. So, let's uh make it more green then. Make the background — more green. And actually, just make it a darker shade of green. And um let's see what else can I do with this. Oh, we can say something like uh keep everything else the same. And let's click on modify. And as you can see here, we're passing the image back for it to generate a new image. While it's modifying, let's not twiddle our thumbs and wait for it. Uh let's dive right into the code uh on what we have uh built. Uh so for the initial generation I'll show you the code logic on what goes through when we click on one of those buttons the front end passes back the ones that we have selected that then goes through a mapping here that gets mapped to the prompt. Uh I've this is a very basic prompt and I've typed it out in I think 10 minutes and it's completely voded out so I'm sure there's lots of room for improvement. This is again where I mentioned that this repository is already made public and you can access it in our build hours repo. I have made a little bit of change but the updates I will update right after the um right after this session and so you can look at those prompts yourself and you can pull those uh the code hosted yourself uh prompt engineer to your heart's content and once the uh the mapping has completed we're we generate the complete uh system prompt that says maintain the original comp composition of the subject generate image of the subject with the following modifiers the map prompt uh maintain the original composition and subject. And afterward, this is where the magic happens. As you can see here, we're using the responses API. We're creating a response and we are passing

### Segment 6 (25:00 - 30:00) [25:00]

in our input image, the B64 encoding of our input image along together with the prompt. And here we're uh giving it tool uh the image generation tool. And we have a couple of parameters that we pass in as well. For example, the size, the quality. I definitely want high quality. uh partial images, three images and also the other thing that we set here is stream is true. Uh so what this allows you to do is you can open a stream and start to pass the different events that get sent back. And so we can pass the partial image and send it back to the front end and we can also take the final image and pass it back together with a responses ID that's associated with the um the image. And the other thing to call out here that's worth mentioning is the tool choice here. I've set it to required because we are using responses API. Uh so if you can prompt it certainly to make it call tools. Uh but by setting tool choice equals to required it is forced to make one or more tool calls. And since we've only given it one tool uh it has access to that means we're the response API is forced to call the image generation tool. So, that's just one uh caveat to keep in mind when you use the responses API. Uh let's get back to this image that has uh finished generating. Wow, that sure looks quite a dark uh green color. Almost look like nighttime if anything. Um but it's not a very good showcase of my multi-turn if we only showed one turn of it generating, right? So, why don't we uh add something else? So, I I can't exactly tell where this handsome gentleman works. Um, Christine, like where does he work at? I'm sure — I think he works at OpenAI. — Okay. Yeah, of course. And how can we fix it? Maybe we should um add a uh OpenAI logo to his shirt like how it's showing on my hoodie right now. So, let's do something like add open air logo to his shirt. And we can click the modify uh once again. And hopefully this works. But in the time it's generating we all we can also shift our attention to how the um how the edit uh logic works. So it is in another route. js file. Again for those of you well verssed in you guys are probably more well versed in next. js than I am. Uh you guys can probably just download the repo and look at it yourself. But so for for the other folks um this is uh basically where all the logic resides uh for this particular endpoint the edit endpoint. And here we also see a very similar um uh very similar code logic as well. We have the input text which is the prompt that we passed in and we have the image URL which is B 64 encoding for the image. One thing that I would love to call out here and for you guys to take note notice is that uh if you recall from the presentation that I gave, you can actually instead of passing the uh B 64 encoding of the image wholesale back to the back end, what you can do is you can just pass the previous response ID uh back. So there's no right way of doing this. That way we can manage all of the states for you and there's less back and forth uh whole images that you have to send back and forth. And uh just to give you guys a quick look at how you can do this, you can go to our uh image generation docs here on our website and you can take a look at the multi-turn image generation. And here we provide you with an example of how this is done. This is basically we have a response. This is the first response and we have a follow-up response. Instead of passing in the entire image ID at the entire image in B64 encoding, we're just passing in the response ID. Uh and here this will basically work like this. So there are things you can do certainly to simplify. And here we see that um that I that is very clear where this uh lovely gentleman works. And uh one other thing that I would also like to add here is um what if I want some access to live real world information that's only accessible on the internet. I've shown you how we can do it on the in the playground, but we're all builders, so that means we have to code. And where should I put this exactly? If I'll take a moment here and, you know, wonder if any of you have a suit folks have figured it out. I see that a few folks have indeed figured this out. Uh, but just to give you guys a hint, it's not here. It's not

### Segment 7 (30:00 - 35:00) [30:00]

you guys a hint, it's not here. And also I put those comments there because I kept uh messing it up by putting it in the wrong place. As you can see here, we have a tools um field and it is an array. So this is where you can identify your custom uh your custom defined tools as well as hosted tools. And here what we can do here is add oh looks like cursor just read my mind and we can add a oneline description here. With this, we basically added capabilities that uses web search tool. And let's look up something that only uh that some knowledge that's only available recently. Uh I just recently got into uh NBA. Uh well, I didn't get into NBA, but I got into following basketball. And since the Golden Gate Warriors has lost, I've been following along the Knicks. Uh since I've lived in New York for five, uh years. Uh, we don't talk about Knicks and Pacers, but I'm wondering about Knicks versus Celtics. Uh, how they did. Uh, so let's we can do something like look up the latest score of Knicks versus Celtics and uh uh add that score to the background uh of the image. Keep everything else the same. And I can click modify here. And since this is the last part of our demo, uh we are once again um going to clap our hands together and pray to the demo gods that this will run properly. Uh so let's give it a second and see what it generates. Oh, and in the meantime, I can just talk to you guys a little bit of what exactly is happening. Uh this is basically the same on how what you saw in the playground. If we set something like oh look up the latest score, it's going to decide that it needs uh to call a web search tool before it can generate uh an image. So that's what exactly what I had done. And indeed the Knicks have beaten the Celtics and gone forward to the Eastern Conference Finals. And with that, that concludes uh the demo that I have prepared so far. Again, I have this entire repo available uh as a part of our build hours repository. So, feel free to play around with it, hack it yourself, maybe even build a product to production with it. Really excited to see what you built with it. And now pass it back to Christine. — Yeah, of course. Um I am really excited for this next part. Uh we are welcoming Jordan, the head of AI engineering from Gamma onto stage with us. So Jordan, hi. How's it going? Thanks for joining us today. — It's going great. Thanks for the intro, Christine. — Of course. Um, so I'll let you take it away. I know you have some um, you know, really cool features that you want to show. Um, so yeah, feel free to share your screen. — Great. Um, so for my talk today, I'll actually be showing a lot of very similar kind of technologies that Bill just showed um, in Gamma and how we use them to power our app. Right. So, at Gamma, our mission is to help bring your ideas to life. Um, we do that through helping you make presentations, documents, websites, and we recently added uh social media posts as well. And we do all this with AI. And like all these mediums, Gamma is a visual platform. It's a visual medium. So the kind of three pillars for us have always been charts and diagrams, visualizations and layouts and lastly AI generated images. And over the last two years we've generated a lot of AI images. Um we do about 700,000 AI generated presentations per day. Every presentation has several images in it. Um so we just recently crossed 1 billion AI generated images through our platform and through uh various providers. One of them being um the new image gen model. Um but with this like historically AI images have had a lot of problems. Um we were definitely an early adopter of using AI generated images in our presentations and we kind of kept saying like one day it'll get better. Um kind of through that process we've had you know issues that I think everyone here has seen in some form or another. Hands being weird, limbs being weird. Um this third image of OpenAI build hour was something I generated actually yesterday and even today with some of the newer models um that are faster and cheaper

### Segment 8 (35:00 - 40:00) [35:00]

like it still can't get text right. Um, but in the last month or two, I would say there's been a lot of good news with a lot of new models coming out. Um, that's AI images have gotten a lot better. Um, so much so that we're able to use them in new contexts that we weren't able to before in our presentations. So, this is an example of what an AI image from 2023 November looked like when we asked for I think delicious sushi. Um, kind of weird. I don't think the egg goes in the sushi like that. Uh, and to contrast, this is what yesterday looks like. Much better. The quality is just outstanding and it's now something that we can with more confidence include in the generated presentations that we make through gamma. So as a part of this I'd like to show how we use AI generated images in gamma and give a brief overview of just the general platform at gamma. So I'm going to create a presentation about today's topic uh the OpenAI build hour. So with Gamma you can easily create a presentation from just a single line. So we'll go to the generate option and I'll type in OpenAI build hour image gen and then I'll give the date to tell it it's this build hour. So one of the other interesting things that is not related to image gen is actually web search. Um so having web search as a native tool. Oh looks like this maybe stalled out. — I'll try that again. Um so having web search as a native tool built into language models really helps with topics like these. um things that are past the training dates or current events of language models before would just not be able to show up here. It would basically be a hallucinated outline of a presentation where um all the details would be kind of made up. Um but as you can see here, when we search for today's build hour, we actually get real information based on the actual web page where all this came from. So, I'm going to just make a couple tweaks to make this a little better. Um, I'd like a maybe a separate page for speakers. And then I will add some titles. So, I'll say I can make this a little bigger. Um, open AI. I'll say like solutions architect. And I'll say for myself, gamma head of AI engineering. And because I don't want our gamma to AI generate these people, I actually have images already that I can include. So I'll put one of Bill. me. And this is basically telling our AI, use these images um instead of generating new ones. So hopefully it does that. So the next part is uh once I have the outline that I'm okay with, I will choose a theme. And for this demo, I've actually made a open AI theme. So I'll search for that. I'll select that and then I'll make sure to use GPT image model for this. So, um, with this theme, it's attempting to match the styling of OpenAI, but it's also attempting to match kind of the imagery, um, kind of the abstract gradients that I think, uh, the OpenAI brand uses. So, I'll go ahead and generate this deck. And like Bill mentioned earlier, um, image generation is definitely a slow kind of AI operation. So, when we generate decks, um, the language model is actually pretty quick. um it's able to generate the content of this um and we're usually waiting on images to be generated. So as this deck is streamed in, we're taking the outline that we had before, the one that used the web search tool to find information about build hour, and then it's passing it to another language model to do the full deck generation along with choosing the layouts and choosing the visual kind of representation that we want. Um, so as we can see, it looks like it used the images up here. Um, so I'm going to go ahead and change the layout of this. get rid of this our audience thing. I think we just want this to be about speakers. And then um, one thing I can do here is I can switch the layout.

### Segment 9 (40:00 - 45:00) [40:00]

I think uh, this one's probably a little better. And we can go ahead and look at some of the images it generated. So this one is maybe not the best with text. This is quite a good image. I would say it got the text right. This is an interesting one. So, a lot of these images I would say kind of lean more towards um showing like full websites. So, the last part I'd like to show and another feature that we've been building on top of image gen is the ability to do maskless editing. So, if we have an image we don't like, we can um open this feature to open our chat with AI and then this is actually doing a chat with the context of this image. So, one thing I can try is um regenerate this image to be an abstract gradient and we'll see what this does. Um, so from this menu, we can do one of two things with images. We can either create them or edit them. Um, so in this case, we're creating a new image because I don't think this is necessarily an image that I want to tweak. um it kind of missed the mark but I think in other cases we can show where image editing might be the right solution for this. So I believe this is using the GPT medium model internally and for us we've seen latencies about 30 seconds. Um I am definitely looking forward to implementing some of the streaming functionality as I think that would make this a lot better of an experience. Um, so I'm gonna pick this one. And then that's going to go ahead and update in my presentation. And then let's try this. Let's edit this to just remove the text below the laptop and see how this does. So, this is using maskless editing where before when you want to do image edits, you needed to also supply a mask. This it just takes your text and is able to interpret it, figure out where to edit it, and it edited it. And that was pretty good. Um, let's also remove the top bar and see if that works. Thanks. So, it's going to remove that and hopefully we'll get an image with just the laptop. There we go. Obviously, if this was a real deck, I'd probably make more tweaks, but I think this is enough to kind of show the um development cycle of building with Gamma, where you can supply kind of a general style and a general outline to the deck you want, but then use AI image editing to actually refine these images. The last thing I want to talk about is maskless editing. Um, one of the big wins we've had really recently, which I feel like only makes sense to call out at a OpenAI build hour, is um, we actually switched from other models that do maskless editing to GPT image. Um, and we saw overnight basically a 27% improvement from users like using this. Uh, these are based on user rating images after they've been edited. These types of wins where it's basically a oneline code change and we get huge improvements. So, we've been really happy about that. Um, and if you'd like to try Gamma, you can do it for free at gamma. app. All right, I'll hand it back to you to Christine and Bill. — Thanks so much, Jordan. Um, actually, we had a question come from the audience. Um, I'm sorry, Christine. Uh, you cut out on my end in the sound. Could you repeat that? — Yeah, sure. U, we had a question come in from the chat. The question was, if I want to make a strategic sales deck, would you recommend I get the strategy from chat GBT first, then give that to Gamma? — Yeah, so we actually support many different um ways to kind of import your content into Gamma. Um, I would say if you're in a more professional use case, um, a lot of people use our paste in mode. So that allows you to paste in a

### Segment 10 (45:00 - 50:00) [45:00]

full outline or a full, you know, pages of research and have gamma either condense it or preserve it into the pitch deck you want. Um we see a lot of people using um language models like Chad GBT to first synthesize their thoughts, do research, generate images um and then just bring that into Gamma and have Gamma handle basically splitting up into slides and doing the visualizations. — Thank you again for joining us. We will move into live Q& A now. Uh let's see the first question. Okay, I'd love tips around consistency and granular control around things like object references style. — Yeah, I think that is a very common question uh Christine and I'm glad you guys have asked it and I think um couple of things here. Um definitely for consistency granular control things there are a couple of knobs that you can turn with the new image and API especially because now it is native. Prompting is actually very important. So what that means is you need to follow the best practices for prompting as well be as specific as you can and do not give conflicting in uh instructions to image genen and then the other things like granular controls like object references styles uh reference is one thing I'll tease I'll pull out from this question here is because we offer you to be able to pass in images uh in as a part of the input reference images actually hugely helpful to inform form what it should be able to generate. So if you have reference images of objects that you want to put in a certain scene styles that you want uh the image to be uh generated as a uh whiz, you can pass all of them as image inputs. Uh so prompt well as providing references those are I think the biggest two levers uh for this particular question. Uh shall we move on to the next one? — Yeah. And how would you resolve ho how would you be resolving mutation problems especially coming because of the prompts while speaking of prompts uh I think now that you're talking about mutation problems uh so mutations well it sounds like a scary word but I assume you meant things like uh unexpected generation results based on uh mistakes and prompts. So, it actually is I will say it's actually easier than most folks imagine to be able to spot those uh issues and um and be able to fix it. So, if there are contradictions inside of your prompt, uh definitely fix those. One tip, one tip I will offer you guys uh to do is before I generate images, I actually put all of my prompts through uh GPD 41 or 03 just to make sure that I think nowadays models are smarter than me at least in anything writing related and prompts is writing and uh and definitely like you don't have to do it alone. Uh do it together with the other models I guess. Um great. Uh and then the next question is what would you suggest uh the best practice and quality and cost if you want to generate a story with text interpers with images that are all consistent? Uh so another great question and then definitely now we're getting to use cases. Now you're getting me really excited here. Um generating a story with text and interspersed images. Uh I'm not exactly familiar what your specific use cases are, but I have a few concrete use cases in my mind. For example, if you to want to generate children's books on the fly or educational materials on the fly here, um definitely uh GPD well image gen would be a great fit for that. A couple of things that you can do here. I'm just sort of um spitting out my thoughts as they come here since I didn't really prepare too much for this as a question to come up. uh but uh first couple of thoughts here is first of all everything I've just said before reference images and passing the images back because of response API you can generate images in a multi-turn manner so what you can do here is you can either pass the images back as a reference or you can pass in the previously generated images in as uh image generation ids and responses API will be able to take all of those uh and uh be able to to see what you have previously generated and put together uh the bits and pieces uh and be able to generate um images that are on style as consistent with style. As for other things like quality and cost that is also a great question here. The recommendation I will provide here is use the best quality highest quality that you can use latency and cost permitting because that way you will be able to see the ceiling of what uh image

### Segment 11 (50:00 - 52:00) [50:00]

genen model is capable of and then after that after you are sure that your whatever use case you had in mind uh this is image genen is a right fit uh despite the limitations that I have mentioned during the presentation uh You can start to tune things like uh output formats as well as the quality. Go from high to medium or low uh and see which ones would be uh a best fit for you. Next one is possible to not edit part of an image leaving it intact. I want to isolate the image uh the area being edited. Great questions. Uh and um if you uh remember I mentioned very briefly about masking because it's a little bit sort of hard to well put together a demo around it. But very easily you can do this um by um putting passing in a mask uh image um together with the image that you want to edit. The mask image basically has an alpha channel. The transparent layer well I can't exactly recall which it's all available in our doc. I believe the transparent layers are the area that you would like to be modified and the other layers you the other uh parts that you don't want modified the alpha channel you leave it uh to be uh fully dark basically and that will be uh it could be the other way around play with it yourself would be my recommendations. — Awesome. That's all we have time for today but we really appreciate all your questions. uh we are reading them and um taking your feedback like I mentioned. Um so I'll leave you with some parting gifts. There are some links that we will be sharing out in an email afterwards along with the recording. Um these are really helpful for you to try out. Um and as Bill mentioned, he's going to be updating the code that we use today. So I saw some requests come from you guys on um on different tweaks and wanting to play with it yourself. So, um, with that, we will see you June 17th for our next build hour on voice agents. Thank you everyone for joining us. Um, and we'll see you then.
