# Metas New TEXT-TO-IMAGE Takes Everyone By SURPRISE! (Now Announced!) CM3leon

## Метаданные

- **Канал:** TheAIGRID
- **YouTube:** https://www.youtube.com/watch?v=gcHzu2Fw3ds
- **Дата:** 19.07.2023
- **Длительность:** 10:18
- **Просмотры:** 12,732

## Описание

Metas New TEXT-TO-IMAGE Takes Everyone By SURPRISE! (Now RELEASED!)

Welcome to our channel where we bring you the latest breakthroughs in AI. From deep learning to robotics, we cover it all. Our videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on our latest videos.

00:00 Introduction
01:30 Start of Video
02:00 Meta Vs Mijdourney V5
05:10 Meta Vs Midjourney V4
05:43 Surprising Feature
08:00 Meta Vs Pix2pix
08:09 Meta Picture To Text
9:00 Other features

https://ai.meta.com/blog/generative-ai-text-images-cm3leon/
https://huggingface.co/spaces/timbrooks/instruct-pix2pix

Was there anything we missed?

(For Business Enquiries)  contact@theaigrid.com

#LLM #Largelanguagemodel #chatgpt
#AI
#ArtificialIntelligence
#MachineLearning
#DeepLearning
#NeuralNetworks
#Robotics
#DataScience
#IntelligentSystems
#Automation
#TechInnovation

## Содержание

### [0:00](https://www.youtube.com/watch?v=gcHzu2Fw3ds) Introduction

So Meta just released something that caught everybody off God meta actually released a C3 Leon a more efficient state-of-the-art generative model for text and images or if you don't want to complicate things essentially text to image Now text image has largely already been conquered by sites like mid-journey that do provide I would say state-of-the-art text image generation that I genuinely don't see anyone even coming close to which is why this kind of technology is quite surprising but I would still argue that some of the features that you do see in this text to image software is going to be kind of different than what we do see in mid-journey because there are a few things that are just a little bit different when we do compare them and you actually might be surprised by some of the features that you do see because I genuinely didn't expect some of these features to be in this software nor did we expect it to be released now for some of you who are confused you have to understand that meta is a giant AI company and they've actually pivoted towards the AI space after spending billions on the metaverse and realizing that it wasn't that great for investment So Meta basically has an AI team that's dedicated to providing the community with open source and other relatively great AI technologies that they are hoping shapes the rest of the AI landscape and that's why we now have this text to image software I do think that sometime later it's likely that it's probably going to be open source so you can see here on the web page it

### [1:30](https://www.youtube.com/watch?v=gcHzu2Fw3ds&t=90s) Start of Video

actually comes up with this meta AI a small cactus wearing a straw hat and neon sunglasses in the Sahara Desert so of course if we were to input the same prompt into mid-journey you can see that this is what we get now I wanted to do a quick distinction because of course I know that the comparisons are going to be there what's the point of even covering something like this if it isn't as good as mid Journey well if you keep watching the video I'll show you the new features that probably might make mid-journey feel some kind of pressure

### [2:00](https://www.youtube.com/watch?v=gcHzu2Fw3ds&t=120s) Meta Vs Mijdourney V5

to release certain features so this one that you're looking at right here this is actually mid Journey version four hence the reason the quality isn't as good so of course these ones aren't also upgraded but I'm going to show you as well mid Journey version 5 which you can see looks a lot more realistic and of course is just ridiculously high quality I mean for example you can even see the Reflections in the sunglasses of what appears to be the rest of the background if that isn't truly incredible I don't know what is so that's why you know when you look at version four it actually does look pretty similar to this so what we can see here is that meta is definitely trying to increase their capabilities and not only the text department but pretty much every single Department meta now has many different things and the earlier released the maker video AI system that generates video from text so they have text the video text generation audio generation and now of course they have the final frontier we which is of course image generation it seems like meta might be loading up for their very own AI model which could be completely multimodal so if you continue to read the web page you're going to see something very interesting okay we see something that says a text to image and it says given a prompt text with potentially High compositional structure generate a coherent image that follows the prompt for example the following four Images were created for the prompts a small cactus wearing a straw hat and neon sunglasses which is what we saw in the first one a close-up of a human hand High model high quality which is what we can also see here and interestingly enough this is actually quite impressive because many AI models for some reason definitely struggle to get human hands with the correct number of fingers correctly So Meta actually chose this not out of sheer Randomness but they wanted to demonstrate that their text to image model was clearly Superior when it comes to certain things because of course as you know this feature right here was something that was only recently solved in mid-journey version 5 and of course stable diffusion version I think 0. 9 or whichever stable diffusion version which was just released which honestly blows everything out of the water then of course we can see here a raccoon main character in an anime preparing for an epic battle with a samurai sword battle starts it looks pretty decent but one of the most interesting ones that I want to show you all is this last one right here it says a stop sign in a fantasy Style with the text 1991 and I want you guys to really pay attention here because as you know AI development is very interesting because largely we don't understand how lots of these models work and one of the large problems with these models is that they do struggle with text now stable diffusion doesn't struggle with text as much as mid Journey does I'm going to show you exactly what mid Journey did when I used this prompt now in order to be fair I do know that mid Journey does have different variations which some versions are more artistic for example I chose to use V4 because V4 is the most artistic and you can see that in version 4 of mid Journey this is what the same prompts will give you now I do know that you can alter certain things

### [5:10](https://www.youtube.com/watch?v=gcHzu2Fw3ds&t=310s) Meta Vs Midjourney V4

that might add stylize or whatever but mid Journey's one fatal flow is that it doesn't do text well although the text here does look high quality it doesn't really say anything and if you're wondering what version 5 would look like this is what we get from mid-journey version 5. so I wanted to keep that in there just to show you why new AI technologies that are being developed even if they don't Excel on everything there are some areas that other AI models simply just lack which means that on certain applications if you do want text in an image you might actually have to use something that you don't normally

### [5:43](https://www.youtube.com/watch?v=gcHzu2Fw3ds&t=343s) Surprising Feature

use now this was the largest area to which there was much surprise you see this is what we have and it's called text guided image editing Now is where you give yourself the default image and then essentially you edit this image via a text prompt now there are some programs like adobe's generosit fill that can't actually do this and you can see right here that with adobe's generative fill all you need to do is essentially click generative fill and then you need to of course add yourself a text prompter like this and this is something that you can actually use we've done several videos on this before essentially you just need to sign up for adobe's beta program but it is something that does work really effectively and it's something that we haven't seen in anything other than Photoshop so now that Facebook is incorporating something like this it will be interesting to see if this is going to be a feature in future more advanced text to image AI models so you can see we have the input image here then it says what would she look like as a bearded man and of course we can see it instantly changes it to a man with a bid which is actually rather funny then of course we can see put on a pair of sunglasses then she should look 100 years old and of course apply face paint now it would be interesting to test this compared to photoshop generative film know because I don't doubt that the results would likely be very similar photoshop's generative fill has taken the industry by storm with many different creatives using it for many different things and what they also state is that this is across the model meaning that it isn't a separate model it's simply one entire thing which is as they reference quite similar to instructor picks to picks but instructs pixapix was actually one separate thing which is why they talk about how this is all bundled into one AI model and of course we can see the similarities here swap the sunflowers with roses add fireworks to the sky replace the fruits with cake turn it into a still from a western all very interesting capabilities that we will see in future Advanced AI models and if you're wondering how pics to picks would actually handle this right here I actually just tried it with picks to picks demo and you can see that when I did ask this software to put on a pair of sunglasses these were the sunglasses that it put on so it's actually quite effective at doing this and what I do hope is that just like pics to picks hopefully meta will release this software at least in some kind of

### [8:00](https://www.youtube.com/watch?v=gcHzu2Fw3ds&t=480s) Meta Vs Pix2pix

demo form where users can actually test it and I'm not sure why they haven't actually released this as a product just yet because I'm pretty sure if this was released to the internet just like chat

### [8:09](https://www.youtube.com/watch?v=gcHzu2Fw3ds&t=489s) Meta Picture To Text

GPT was many people would have vast uses for this software because it's easy to use and honestly the only limit is your creativity now another thing that the C3 Leon model can also do is it actually can analyze what's inside an image so you can see it says describe the given image in very fine detail it says in this image there is a dog holding a stick in its mouth there's grass on the surface in the background of image there are trees so this is something that we'll increasingly see from a variety of AI models and I would argue this is really good because although mid-journey does have a feature called slash describe in which you can input an image and then when you put slash describing it essentially somehow describes your prompt based on the image that you do have this feature is actually going to be much better in terms of The Wider uses of applications now this is something that is pretty crazy to me it says that given a text description of the bounding box segmentation of the

### [9:00](https://www.youtube.com/watch?v=gcHzu2Fw3ds&t=540s) Other features

image generate an image so essentially what you're asking to do is generate a high quality image of a room that has a sink and a mirror in it with a bottle at this location with a sink at this location and with a bed at this location so essentially what they've done is they've trained this Air Model to be able to precisely put certain objects at certain locations in a final image generation and that is something that we haven't seen before with mid-journey so this technology could accurately give us access to much finer and more detailed image Recreations if we wanted it to and this is something that we just haven't seen before and when I saw this it honestly blew my mind then of course we have segmentation to image so essentially what they also had was an input image of a duck on a lake then of course they extracted the different segments of that image so what they've done is they've separated the image into bold colors based on different objects in the image so they put the water as blue they've put the duck as ping and then essentially they've reconstructed that image based on the segmentation and you can see that the reconstructed image in nourished one and generation 2 does look rather accurate to what we got and the reason this is actually pretty scary imagine you got an extremely low resolution image then you extracted the segmentation and then you regenerated the image you could arguably have numerous applications for this software then of course the paper goes on to Showcase how they actually

---
*Источник: https://ekstraktznaniy.ru/video/14777*