OpenAI’s New ImageGen is Unexpectedly Epic … (ft. Reve, Imagen 3, Midjourney etc)

11:16

OpenAI’s New ImageGen is Unexpectedly Epic … (ft. Reve, Imagen 3, Midjourney etc)

AI Explained 25.03.2025 94 262 просмотров 3 683 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

I’ve spent quite a while testing the new 4o ImageGen from OpenAI, and comparing it to models released just yesterday, like Reve, Midjourney, Imagen 3, as well as models not yet out. https://app.grayswan.ai/ai-explained AI Insiders ($9!): https://www.patreon.com/AIExplained Rarely in AI is one model so much better than the rest, as we can see on the chatbot-side of things. Yes, I have a video imminent on Gemini 2.5 and DeepSeek. But for ImageGen, I was very impressed, as you’ll see. Still not perfect, don’t show it a mirror for example, and definitely not photorealistic, but incredibly obedient. You’ll see what I mean. What Sam Altman calls ‘Images in ChatGPT’ will be available to everyone apparently, even free users. There are some filters, but I am sure everyone will soon have access to an unfiltered model of its strength, and its easy to imagine what will come of that. Chapters: 00:00 - Intro 01:07 - Prompt Adherence, vs Reve, Midjourney, Imagen 3 + one other 03:39 - Idioms 04:20 - Thumbnails? 05:56 - Captions / Infographics 07:20 - Filters and Public Figures + Gray Swan 08:30 - Sora? 08:49 - Ethnicities/hands 09:09 - Where’s Waldo? 10:33 - Selfies and Photorealism Images with ChatGPT/4o ImageGen: https://chatgpt.com/ Imagen 3: https://labs.google/fx/tools/image-fx Reve: https://preview.reve.art/app Altman Announcement: https://x.com/sama/status/1904598788687487422 Non-hype Newsletter: https://signaltonoise.beehiiv.com/ Podcast: https://aiexplainedopodcast.buzzsprout.com/

Оглавление (10 сегментов)

Intro

I have spent quite a while testing the new 40 image gen from open Ai and comparing it to models released just yesterday for example as well as models that aren't even publicly out yet rarely for me in AI is one model so much better than the rest no of course it's still not perfect and don't even think about showing the model a mirror because it will frankly have a breakdown but the word that comes to mind for me about this new image gen and I know calling it new is a little bit of a Str because it's been being worked on for more than 2 years the word that comes to mind is obedient depict six people of six completely different ethnicities doing jazz hands that was my prompt here okay you could quibble that the people in the back you can't quite see their hands but this is not bad think of just how recently it was that hands were such a problem for AI this new tool which Sam Alman is calling images in chat gbt will be available to everyone apparently even free users and it will also be coming to the API so I thought it deserved its own video featuring comparisons with reev

Prompt Adherence, vs Reve, Midjourney, Imagen 3 + one other

mid journey and I might even sneak another model in there I'll also cover image editing which I know is not unique to this model you can do it with Gemini in Google's AI Studio but still it's a notch above the first comparison I think is pretty Illuminating and the prompt I used was three apples balanced on the trunk of a blue elephant with three legs standing beside five weeping willow trees in elgem Tunisia obviously that is an incredibly difficult prompt to adhere to but I think the model did insanely well it captured the coliseum in lgem that I visited blue elephant you got three apples on the trunk in every image and kind of five trees in every image depending on how you count it I know got quite a few more in the background far into the background but if you're being generous some of these 1 2 3 4 5 pretty accurate I'm also just noticing that the Shadows are fairly consistent which I think is pretty impressive but obviously not the three legs on the elephant it's a little bit like my Common Sense reasoning Benchmark simple bench in that having three legs here for an elephant is a Twist on a common scenario and the model just doesn't expect it and can't really do it it's just been trained on too many images with elephants having the normal four legs image in 3 which is Google's best text to image model struggle somewhat again no three legs on the elephant but this time the apples are kind of wrong in number not all of them are on the trunk and you're not getting much of a sense of location then of course I wanted to test reev which was code named Half Moon previously which is claimed by that company to be the best image model in the world and the way I'd phrase it is that it's very good if it wasn't for 40 image gen I'd say it probably is the best image model in the world but for now I'm going to say second on this particular prompt you may even prefer prer it even though there are only four trees that I can see but it's a really good image and great sense of location will'll slightly more often get the number of apples wrong but overall despite occasional Shadow issues the images are pretty vivid and engaging so massive credit to reev here I'm now going to show you a sneak peek of a model releasing tomorrow and I think this is a brilliant image not quite what I was going for but nevertheless very interesting all of the images from this model were fairly similar engaging but not quite what I was looking for okay this next one you might like because I'm sure you guys are going to see plenty of comparisons online but I wanted to go

Idioms

one meta layer higher I asked all the models to illustrate the idiom hold your horses that's a pretty tough test because it's not just about visuals literally holding a horse it's also the idiom hold your horses slow down only open ai's 40 image gen understood the metaphor and in every image conveyed it appropriately plus it gave some really great text too of course reev as well as having some slightly dodgy image details just didn't really understand the metaphor in any of the images image in three from Google couldn't do this at all and as you can see at the top nor could mid Journey okay this next one's not going to be a comparison but I think

Thumbnails?

it shows off the capabilities of 40 image gen really quite well here is one of my classic thumbnails and I gave it to 40 image gen and said make it 3D I think you have to admit that with the slight exception of anthropic logo down here the overall results are darn impressive I mean just for a moment let's just focus on the fact that aside from possibly a little line here next to stumbles in one of the images the text is incredibly accurate then look at this one in the top right and I'm actually going to zoom in the effect of the whale coming out from the water drawn from my thumbnail as inspiration is pretty darn impressive now I'm not saying that I'm going to immediately drop my traditional thumbnail approach but for my just released new patreon video which was about Claude 3. 7 having theory of mind and knowing it's being tested I did want to try it out so I got my existing thumbnail and ran it through 40 image gen to see what it would come up with and as you can see you have this lab like image with this being projected onto the wall I don't normally like AI thumbnails but this is probably the first tool that has tempted me the next test I can see being the most common use case for image gen with chat gbt you could call it images with captions or basic infographics but it does really quite well here I asked depict a four panel Journey showing the stages of a human life not only did I get that Journey for each one but I got these labels that I didn't even ask for which I've just noticed aren't quite perfect you can see the elderly spelled wrong top right but again you'd be hard pressed to say for some of these that

Captions / Infographics

there's any clear mistakes now because I love the UI all of these tests were done on Sora but of course we can't forget image editing that is either unavailable or a whole set of extra steps with other image generators but not so with chat gbt with images so for one of these images I picked out and said add glasses to each character that got me this image where you can see the original image is preserved just they now have glasses all of the other image generators had problems with the four stages of Life although reev came the closest with this image I mean it kind of Skips out everything from the age of 21 to 81 but not bad mid Journey went super metaphorical and artistic but I did say human life and I can't really see humans here the unreleased model went in a completely different direction which I kind of like but I'm a bit confused by now I did miss my opportunity to talk about Native image generation and editing in Google AI studio with Gemini 2 flash but now I have got a chance the comparison isn't quite as favorable I said depict a four panel Journey again showing the stages of a human life and got this and it makes me wonder before I was born was I a robo dog with a stick in my back quarter anyway we can edit the images and I said change the baby on the right to being an old man and as you can see I got this okay now for some disclaimers

Filters and Public Figures + Gray Swan

and a few times I was denied permission for an image so there are filters for the new image gen it did allow me to submit a photo of the Google CEO and Sam Orman the CEO of open Ai and I said make these two people arm wrestle and even though the Fidelity to how they look isn't perfect this image in the top left isn't bad and I thought I would be denied this generation but I wasn't you can let me know in the comments whether you think slightly less filtering is a good thing but for me true safety is about things like bioweapons and cyber weapons that's why you through my new Link in the description can and possibly should enter the grace one Arena if you have any interest or aptitude in jailbreaking models testing whether they can do these kind of things and yes appropriately that now includes visual vulnerabilities breaking models through the images you submit to them or you're just interested in big prize pools do check out the link in the description and yes as you may have noticed the prize pools are getting a bit out of control in case you're wondering because I'm doing this in Sora I can turn any image into a video but honestly I wouldn't quite recommend it even when

Sora?

you're using storyboards the results aren't exactly lifelike the six different people with different ethnicities doing jazz hands was probably one of the most impressive outputs I saw from image gen mainly for the reason that was a stark weakness of image gen models going back

Ethnicities/hands

last year and the year before and also just that it was so much better than other models at this particular prompt mid Journey struggled hard Google's imageen 3 denied me entir L and Reeve wasn't bad we do have the six different people I wouldn't exactly call this jazz hands though one thing I do have to mention of course is that when you're using chat GPT to generate images it is

Where’s Waldo?

going to be slower typically than the other models but here was another test that I hope you guys like just so you can see it I said create a difficult what we call in Britain where's Wally or where Waldo style image with an italic caption telling the viewer what to look for it should take at least 10 seconds to solve now I'm going to scroll through the images and you can of course pause the video but the generations while artistically very interesting but for me all suffered from the same problem which is that they didn't actually display the thing that they told you to look for unless you want to be very generous and count this thing here as a tiger but that is a big stretch you know what in this one I'm actually going to give it to imagin 3 because I can see that it's saying find the time traveler in the medieval Marketplace and even though the text is kind of screwed up and it's very easy to spot at least it's there and it's kind of cool Reeve created very beautiful images but again I think they suffered from that same problem of not actually having the thing that you're supposed to look for honestly don't waste too much of your time but if you do see them let me know I'm going to give reev a pass on this image because they said find the pirate hiding among the beach goers and I'm going to say this is the pirate not really hiding but there we go I guess this serves to illustrate the point which is the logic the kind of brains of 40 image gen is just noticeably better than the others the artisticness is probably similar obviously most people will just use this

Selfies and Photorealism

to turn their selfies into charcoal sketches or Dragon Ball Z characters that's pretty obvious but the fact that we now have ai models capable of producing an image like this one with just incredibly accurate text and genuine logic behind what it's portraying that is a true moment in AI it's worth a dedicated video because sometimes incremental change can add up to big change so is this a storm in a teacup or a true moment in AI will you never use this tool or use it hundreds of times like I'm expecting to let me know thank you so much for watching see you in the next video which should be coming very soon and have a wonderful day

Другие видео автора — AI Explained

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник