# Metas 'MovieGen' AI Just SHOCKED The AI World (Text To Video AI)

## Метаданные

- **Канал:** TheAIGRID
- **YouTube:** https://www.youtube.com/watch?v=naMt59O_a6M
- **Дата:** 04.10.2024
- **Длительность:** 19:43
- **Просмотры:** 75,464

## Описание

Metas 'MovieGen' AI Just Shocked The AI World (Text To Video AI) 
🐤 Follow Me on Twitter https://twitter.com/TheAiGrid
🌐 Checkout My website - https://theaigrid.com/

00:00:00 - Introduction to Meta's Move Gen
00:01:23 - Comparison to other AI systems like Runway Gen 3 Alpha
00:02:29 - Comparison to Sora and surprising advancements
00:03:25 - Overview of Move Gen model sizes and features
00:04:08 - Analysis of video examples, focusing on lighting and physics
00:05:15 - Example of a sloth in a pool with detailed analysis
00:06:59 - Example of a monkey in a hot spring
00:08:28 - Example of a girl running on a beach with a kite
00:09:52 - Introduction to video editing capabilities
00:11:20 - Example of editing a man running in the desert
00:12:39 - Discussion on future applications for video effects
00:14:45 - Introduction to personalized videos feature
00:15:33 - Introduction to video-to-audio feature
00:16:30 - Demonstration of generated audio for videos
00:17:36 - Explanation of how the audio generation works
00:19:29 - Conclusion on the quality of the generated soundtracks

Links From Todays Video:
https://ai.meta.com/research/movie-gen/

Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos.

Was there anything i missed?

(For Business Enquiries)  contact@theaigrid.com

#LLM #Largelanguagemodel #chatgpt
#AI
#ArtificialIntelligence
#MachineLearning
#DeepLearning
#NeuralNetworks
#Robotics
#DataScience

## Содержание

### [0:00](https://www.youtube.com/watch?v=naMt59O_a6M) Introduction to Meta's Move Gen

So Meta is introducing meta's movie gen take a look at this short video and then I will dive into all the specific details that might surprise you

### [1:23](https://www.youtube.com/watch?v=naMt59O_a6M&t=83s) Comparison to other AI systems like Runway Gen 3 Alpha

now that You' seen that short video I'm sure you can understand that meta are clearly one of the under underrated leaders when it comes to AI meta's movie gen actually comes to a surprise to many in the AI space so now that you've all seen that short video for meta's movie gen you can largely understand with as to why many are surprised in the short video it actually was compared to other state-of-the-art systems like Runway gen 3 Alpha and we can see it outperforms it on a variety of different tasks this was something that is rather surprising to many individuals in the AI space like myself because we didn't really expect meta to come out with a foundation model for videos that was going to be surpassing other companies that are solely specifically focused on those videos so this was something that took us off guard because whilst we thought that meta was busy with llama 3 it seems that they were also busy with generating videos in really high quality we can see

### [2:29](https://www.youtube.com/watch?v=naMt59O_a6M&t=149s) Comparison to Sora and surprising advancements

that there are many examples of this software even being compared to Sora and in many cases currently excelling and surpassing what the model is able to do it seems that meta has innovated in its efforts to do this effectively and somehow they've managed to come out on top this is one of the most surprising updates from Ai and I think this is one of those examples where it showcases that this technology is advancing once again more efficiently than we could have predicted many of these examples are simply outstanding but let's dive into the four key features that meta's movie gen has to offer quickly I'm going to show you guys the model sizes so you don't get confused and then we're going to dive into all of the features so at first we have a 30 billion parameter model which is the main movie gen model

### [3:25](https://www.youtube.com/watch?v=naMt59O_a6M&t=205s) Overview of Move Gen model sizes and features

then we have movie gen audio a 13 billion parameter model which generates audio clips then we've got the personalized movie gen video which is a post trining extension and then we've got movie gen edit which is where you can edit the videos and this is another post trining extension so it's two models and one of those two models has two extensions that allows you to customize your videos now you can see right here that we're going to be taking a look at some of these video examples and during these videos what I want you to do is to pay attention to the things that you might not for example pay attention to the lighting pay attention to the physics because these are things that video generators particularly do

### [4:08](https://www.youtube.com/watch?v=naMt59O_a6M&t=248s) Analysis of video examples, focusing on lighting and physics

struggle with for example in this video I think that the physics and overall lighting do look really good it is quite hard to get lighting really done well in AI video generators because lighting is something that affects many different things and this is something that is quite hard to get realistically when you've got many different servfaces many different Reflections and in this video right here we can see that the lighting around his body the lighting on the ground actually managed to match up with exactly what we are seeing in front of him so those are the things that I look at when I'm analyzing video to see if it is actually effective and this is something that I think is definitely up there the text input for this summary was the camera is behind a man the man is shirtless wearing a green cloth around his waist he is Barefoot with a fiery object in his hand and he creates wide circular motions a calm seas in the background and the atmosphere is mesmerizing with the fire dance so you can see at the prompt adherence it is actually rather really high we have

### [5:15](https://www.youtube.com/watch?v=naMt59O_a6M&t=315s) Example of a sloth in a pool with detailed analysis

another example here and if I'm being honest out of all the examples showcased on The Meta website this is by far my favorite we just have a sloth that is drinking some tropical drink and seems to be as relaxed as one can be the text input is a sloth with pink sunglasses lays on a donut float in a pool the sloth is holding a tropical drink the world is tropical the sunlight casts a shadow and I think this one is really nice because most people might not pay attention to the background area right here on the left hand side but we can see that the water Reflections on the left hand side and the right hand side look remarkably effective which is like I said before one of the key details that I do pay attention to so also what I am paying attention to as well is the Shadows now I didn't notice this at first but upon further analysis something that I found that was really cool was that as this sloth is moving from right to left up the water you can see that the Shadows actually contrast on its face like we can see the Shadow and the sun rays coming in and out and it manages to dynamically change the lighting based on it being underneath a tree and we can see that these Shadows actually look effective as they are track underneath this dut and that was something that I thought adds to the realism which is why I think that this model is really effective sometimes models May struggle with these small things and we might not notice them but when we have everything working in our favor these are the kinds of things that Mak a system like this look super realistic we also have some more wild

### [6:59](https://www.youtube.com/watch?v=naMt59O_a6M&t=419s) Example of a monkey in a hot spring

animals and this is a prompt that says a red-faced monkey with white fur is bathing in a natural hot spring the monkey is playing in the water with a miniature sail ship in front of it made of wood with a white seal and a small Rudder the hot spring is surrounded by Lush Greenery with rocks and trees another really effective demonstration of what we can see here there is some very minor morphing but this isn't something that I haven't seen from other models it honestly is one of the most surprising videos that I've seen in quite some time considering how hard this would be to get done correctly for example if we actually look at number one the reflections of the monkey we can see that they are quite accurate and of course the reflection of the boat we can also see the ripples as it manages to move forward which is really nice as well and we can see that the legs of the monkey are also there and we also do see the hand just underneath the surface of the monkey so this is something that is really effective I mean it's incredible that this technology is going to be available in the near future and other companies are also working on other iterations of this but like I said before when we really take a look at the details and realize that what we are watching isn't real it's something that should not be overlooked at all there's one last clip here of a girl running across a beach holding a kite she's

### [8:28](https://www.youtube.com/watch?v=naMt59O_a6M&t=508s) Example of a girl running on a beach with a kite

wearing jean shorts and a yellow t-shirt the sun is shining down I think this one is once again really effective and I think what this clip wanted to Showcase was two things first off they wanted to showcase the fact that it has really good Dynamic physics one of the things that I've seen from video editors constantly struggle with is the ability to actually have legs that run rather effectively sometimes in video clips that are longer than 2 seconds we will see the legs more sometimes we will see them look like they're speeding up and slowing down and this issue is just not present in this video at all and I think that was something that they wanted to Showcase as it is quite hard to get right there was also something else that I picked up on that I think you may have missed one of the things that you can see is that as the girl manages to walk in the sand we can see that there is the footsteps that are picking up the sand and we're seeing that these footsteps are being left as those feet are being placed in the sand so another example of really accurate physics detection which is really nice now I do want to say quickly before we move on to the video this video demonstration it didn't actually showcase any areas that might have failed so if there are demos out there for example when we do get access to this maybe we're going to see the areas where it does fail but right here this is a really nice example now I

### [9:52](https://www.youtube.com/watch?v=naMt59O_a6M&t=592s) Introduction to video editing capabilities

think meta's movie gen is severely underrated because being able to have these kinds of abilities immediately is something that is incredible considering the consistency of this kind of editing so edit with video text is here and we can see just how effective this looks we can see that you're able to have the original video in the top left and there were three other iterations of the video we can see adding fire sparklers to his hands and I think like I said before once again let's take a look at the lighting as his hands come down if we pay attention to his face we can see that his face gets a little bit more orange in order to account for the lighting coming in and then as they move away you can see that the lighting isn't there as much and it's correctly shaded in the sense that we have the light down here and we can see all of the areas of his face that are lit correctly which is really nice for example here as well we can see that the background maintains its perspective which is really effective we can see that the background also isn't in Focus which adds to this kind of realism effect that we get when looking at this video and I think it's really nice with how that managed to work and of course you can see right here the last one change the sky to the Northern Lights which is really nice now I will say in this one it does say change the background to an outdoor stadium but it doesn't say change his shirt to a blue shirt so I'm

### [11:20](https://www.youtube.com/watch?v=naMt59O_a6M&t=680s) Example of editing a man running in the desert

not sure if the model was meant to do that but nonetheless it still looks great there's another example of this as well and I think this one is definitely one of the most effective we can see that the original video looks really nice of a man running in the desert and then we have text input add blue pom poms to his hands not only do we have the pom poms added to his hands but we do have decently accurate physics of how these pom poms would be looking if this person was running with them we can also see you're able to turn it into a cactus desert you're also able to replace the running clothes with an inflatable dinosaur costume and the best thing about this is that like I said before it doesn't just adapt the video to the other video it actually makes it look exactly like how it would be if that person was wearing that outfit because if you've ever seen an inflatable dinosaur costume before you'll know that its head kind of Bobs in that exact way which is really funny to see it happening and then of course for the pom poms one to see him running like this um it's pretty cool to see how each video differs with subtle changes now this is where we start to get into the area where you can start to see how this may have applications for future uses such as generative video effects for

### [12:39](https://www.youtube.com/watch?v=naMt59O_a6M&t=759s) Discussion on future applications for video effects

example we can see you're able to add pouring rain right here this does look somewhat realistic I will say in this example right here we don't get the added physics of you know the clothes getting wet and all those Minor Details those things are going to be pretty difficult to add so it's likely that we will get that in the near future but the other details that we do get such as changing the background to a carnival it looks really nice in terms of the effects that we have in the back the blur that we get and of course making them wear 3D glasses I think this one deserves more credit because the reason this one is so incredible is because if you've ever tried to 3D track something onto an object you'll know that the software tries to get all of the data points on that object So In traditional tracking when you try to track something what will happen is you'll try to track multiple different points and then you'll pin a 3D object to that and with AI it seems that we're going to be able to get software that's just automatically do that which is just incredible because it's going to save people a lot of time we also have one last example here of the text to prompt area and we can see again at just how effective this is now I do think that this is something that is remarkably effective I think it's going to have so many different use cases and I think in the future this is going to be how video editing is done surely when you're in Premier Pro maybe you're in your editing software whether it be cap cut whether it be D Vinci resolve you're going to be having buttons where you're just going to be able to say okay let me just change this that and I think this is definitely going to change how certain movies are made it might be a few years before we get it completely polished but this is most certainly a timesaver especially right now for those who are in the fil makers another example that we do have that is quite fascinating is the personalized videos many different AI systems allow us to personalize things with personalized images so it's no surprise that meta has come out of the gate with personalized videos we can see

### [14:45](https://www.youtube.com/watch?v=naMt59O_a6M&t=885s) Introduction to personalized videos feature

that it's able to Simply use one image here and it's able to accurately resemble what this man looks like I think the fact that it's able to just use one image is something that's really cool because usually you'll have to have many different images in order to get this looking remotely effective we can also see some more examples of this personalization in effect which is really nice I'm sure certain content creators are going to be really enjoying this I'll definitely be testing this with a few people's faces maybe we're going to see some funny videos of Mark Zuckerberg riding horses or doing something that he probably otherwise might not be doing so next we're going to take a look at one of the best things to actually come out of this

### [15:33](https://www.youtube.com/watch?v=naMt59O_a6M&t=933s) Introduction to video-to-audio feature

and that was actually something that caught me of God which is the video to audio feature in movie gen now I've seen a few examples of software like this before but this was something that was rather fascinating so basically what we have here is a system that was trained on a massive data set of videos with matching soundtracks and the model has learned how to match up different sounds to different specific videos and I'm going to show you guys a few videos and then I'm going to talk about in more detail how this all works but I first want to show you guys the different sounds that was generated by the model when looking at video as the input prompt

### [16:30](https://www.youtube.com/watch?v=naMt59O_a6M&t=990s) Demonstration of generated audio for videos

so hopefully you guys have seen that you can see that the audio was really good but how this works is that the model takes in the video frame and then processes them to predict sounds that should accompany those visuals for example if a video shows a car driving the model will generate sound effects like the car engine the tire squealing or ambient suti noise and it does this

### [17:36](https://www.youtube.com/watch?v=naMt59O_a6M&t=1056s) Explanation of how the audio generation works

by understanding what actions are happening in the video and then generating the appropriate sound effects now the system is capable of generating different types of audio so we've got diagetic sounds and these are sounds that come from directly what's happening in the scene and then of course we've got non-tic sounds and these are sounds like background music or mood setting sounds like intense music during a car chase scene that fits the tone but it isn't actually happening in the video now the movie gen model can generate sound at high quality specifically 48 HZ which is the standard for cinematic sound this means that the audio generated by this model is clear and of professional quality making it suitable for use in films games or rather media content the model not only matches the sound with the video but also produces long coherent audio tracks for videos that are made up to several minutes long and it's designed to create sounds that naturally extend over time so the audio feels continuous and realistic the Audio model was trained on millions of hours of video and sound data so during the training it actually learned the physical relationship between what's happening on screen and of course the sounds that should be produced of course it also actually learned the psychological aspect of audio in video such as how specific sounds can heighten emot create tension or establish the mood of a scene the system manages to generate really effective sounds like a splash when someone jumps into water or other things that are really effective and in order to get this to actually be as effective as it is to where you can hear like the firework exploding as it explodes after the initial training the model was actually fine-tuned with a smaller set of high quality video and audio data and this fine tuning step helps improve the overall quality of the generated sounds making them feel polished and more Cinema atic similar to

### [19:29](https://www.youtube.com/watch?v=naMt59O_a6M&t=1169s) Conclusion on the quality of the generated soundtracks

what you'd hear in high-end production so overall I think this is a remarkable step because we have something that could pretty much automatically do soundtracks and background music and it's something that I just didn't expect with this level of quality

---
*Источник: https://ekstraktznaniy.ru/video/14039*