# Mochi 1: The BEST Open Source Video Generation AI Yet!  (Genmo AI)

## Метаданные

- **Канал:** TheAIGRID
- **YouTube:** https://www.youtube.com/watch?v=vbzgWa6Ms4E
- **Дата:** 23.10.2024
- **Длительность:** 13:53
- **Просмотры:** 12,511

## Описание

Prepare for AGI with me - https://www.skool.com/postagiprepardness 
🐤 Follow Me on Twitter https://twitter.com/TheAiGrid
🌐 Checkout My website - https://theaigrid.com/

0:00 Model Introduction
0:25 Playground Launch
0:59 AGI Vision
1:31 Model Performance
1:59 Key Areas
2:33 Prompt Adherence
3:09 Evaluation Methods
3:36 Performance Rankings
4:00 Motion Improvements
4:33 Quality Scores
5:15 Industry Impact
5:39 Technical Details
6:14 Physics Simulation
7:00 Evaluation Criteria
7:33 Model Architecture
8:03 Technical Specs
8:37 Efficient Design
9:05 Computing Requirements
9:32 Video Processing
10:02 Visual Focus
10:24 Language Model
10:50 Token Management
11:14 Design Features
11:43 Future Updates
12:13 Known Limitations
12:39 Performance Examples

Links From Todays Video:
https://www.genmo.ai/blog
https://www.genmo.ai/play

Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos.

Was there anything i missed?

(For Business Enquiries)  contact@theaigrid.com

#LLM #Largelanguagemodel #chatgpt
#AI
#ArtificialIntelligence
#MachineLearning
#DeepLearning
#NeuralNetworks
#Robotics
#DataScience

## Содержание

### [0:00](https://www.youtube.com/watch?v=vbzgWa6Ms4E) Model Introduction

today an exciting new step in video generation has arrived with the release of mcky 1 this open-source model is pushing the boundaries of what AI can do in video creation showing impressive advances in how smoothly characters move and how well it follows your prompts Moi is designed to be accessible to everyone whether you're working on personal projects or something commercial now

### [0:25](https://www.youtube.com/watch?v=vbzgWa6Ms4E&t=25s) Playground Launch

what's great about this is not just another AI announcement if you're curious to try out this model genmo has launched a free posted playground where you can experiment with this model and of course if you want to dive deeper into exactly what's going on the weights are available on hugging face now what's interesting about this is that on their blog they've actually stated that their goal is to unlock the right brain of artificial general intelligence or AGI just like the right side of the human brain is associated with creativity and Imagination genmo wants to bring those

### [0:59](https://www.youtube.com/watch?v=vbzgWa6Ms4E&t=59s) AGI Vision

qualities to AI now mcky one is the first tangible step towards this Vision enabling AI to not only generate videos but act as a immersive World simulator capable of imagining anything whether it exists in reality or not now by focusing on creativity genmo aims to create AI that can visualize new possibilities tell compelling stories and bring imaginative ideas to life in ways way that were previously Out Of Reach now the model is

### [1:31](https://www.youtube.com/watch?v=vbzgWa6Ms4E&t=91s) Model Performance

clearly stunning we've never seen open source perform at this level before but genmo did something rather different you see today there is an enormous gap between video generation models and reality the current models that we do have and this is a problem I've spoken about on several different occasions is that they often struggle with making movements look natural and accurately following the user's instructions these

### [1:59](https://www.youtube.com/watch?v=vbzgWa6Ms4E&t=119s) Key Areas

are two key areas motion quality and prompt adherence where videos are still falling short which results in video outputs that seem jerky or fail to match what the user envisioned now mcky one sets a new best-in-class standard for opsource video generation and also competes with well- leading Clos models what we do have is the 480p preview of mcky 1 and this version excels in the following areas so we do

### [2:33](https://www.youtube.com/watch?v=vbzgWa6Ms4E&t=153s) Prompt Adherence

have prompt adherence mcky 1 actually demonstrates exceptional alignment with the prompt that it is given meaning that the videos it generates actually closely match the instructions provided by the user this allows for detailed control over elements like your characters your settings and the actions and to ensure this high level of accuracy mcky 1 benchmarked using an automatic metric a vision Vis language model similar to the approach used by openai with DAR 3 acts as a judge to evaluate how well the generated content matches the prompts

### [3:09](https://www.youtube.com/watch?v=vbzgWa6Ms4E&t=189s) Evaluation Methods

and they used Gemini 1. 5 Pro to evaluate how these generated videos performed ensuring they're consistent with the users's intended descriptions for those of you that might not understand how good this new open source model is we can look at the prompt adherence leaderboards take a look at op Sora pyramid flow P collapse Runway ml gen 3 even cling Luma dream machine and at the

### [3:36](https://www.youtube.com/watch?v=vbzgWa6Ms4E&t=216s) Performance Rankings

top the new leading model is genmo mcky 1 preview I think that deserves a round of applause although there aren't enough people to do that but being able to surpass all of these other models in such a short space of time with an open-source model shows that you can still catch up to the leading Labs even if you are a different team this is

### [4:00](https://www.youtube.com/watch?v=vbzgWa6Ms4E&t=240s) Motion Improvements

where you take different routes such as focusing on prompt adherence and actually ensuring the user gets exactly what they want and we can see by this graph that this is exactly what occurs now this also performs well in the motion quality mcky 1 also delivers significant improvements in motion quality the smoothness of character movements in generated videos has been pretty challenging for most AI models which leads to unnatural or robotic looking actions and mcky one addresses

### [4:33](https://www.youtube.com/watch?v=vbzgWa6Ms4E&t=273s) Quality Scores

these issues by creating fluid lifelike motion that enhances the overall realism of the content making it more engaging and Visually pleasing and when we also once again take a look at the motion quality ELO score we can see that once again Geno's mcky 1 preview is right up there with the likes of cling managing to surpass even the very Infamous Runway gen 3 and the Luma dream machine and even cling which is rather surprising which means that mcky 1 and the genmo team have done something absolutely incredible if you would have

### [5:15](https://www.youtube.com/watch?v=vbzgWa6Ms4E&t=315s) Industry Impact

told me that an open source team was about to deliver a new video model that could potentially surpass the rivals in the industry in terms of prompt adherence and motion quality I would have said there's absolutely no way that could happen but we're seeing once again today that the video arena is being constantly challenged with more and more competitors and often times

### [5:39](https://www.youtube.com/watch?v=vbzgWa6Ms4E&t=339s) Technical Details

this results in a much better experience for the user because now you have a lot more choice now some of the results might surprise you see mcky 1 produces videos at 30 frames per second which helps create a smooth visual experience the videos that mcky one generates can last up to 5. 4 seconds and they maintain their temporal coherence meaning that the motion flows naturally from one frame to the other without abrupt jumps or inconsistencies now they

### [6:14](https://www.youtube.com/watch?v=vbzgWa6Ms4E&t=374s) Physics Simulation

also have realistic motion Dynamics and physics simulation mcky one simulates realistic physics such as fluid dynamics which is the movement of liquids fur and hair simulation and natural human actions this makes the animations of any characters look more lifelike for example if there is a scene that involves water or an animal with fur mcky one ensures that these details move in a realistic way adding a layer of believability that crosses The Uncanny Valley a point where AI generated visuals become so realistic that they start to evoke an emotional response from the viewers Now mcky 1 focused on

### [7:00](https://www.youtube.com/watch?v=vbzgWa6Ms4E&t=420s) Evaluation Criteria

the motion quality when assessing mcky 1's performance human evaluators were specifically asked to concentrate on the quality of movement rather than the details of each individual frame they used criteria such as interestingness of the motion how realistic it seemed and how fluidly it was portrayed and of course to measure that performance the ELO scores were computed using a protocol similar to the LMS chatbot

### [7:33](https://www.youtube.com/watch?v=vbzgWa6Ms4E&t=453s) Model Architecture

Arena and that kind of scoring is often used in gaming and ranking competitions which basically means they just put these models side by side and of course mcky managed to come out on top in most of these scenarios now I think this model is absolutely incredible they've managed to do something that most people didn't think was even possible and I think a lot of that can be put down to the architecture that they used which is rather fascinating too so when we

### [8:03](https://www.youtube.com/watch?v=vbzgWa6Ms4E&t=483s) Technical Specs

actually take a look at the architecture that they used we can see that mcky 1 actually represents a significant advancement in open-source video generation it uses a massive 10 billion parameter diffusion model based on an architecture called the asymmetric diffusion Transformer which is asmd and what that tells you in simpler terms is essentially that this model is incredibly powerful because it has so many parameters which are basically the tiny settings that helps this model understand and generate video content

### [8:37](https://www.youtube.com/watch?v=vbzgWa6Ms4E&t=517s) Efficient Design

and Asim D is a new kind of architecture which is designed to make the entire process more efficient nowaki one is entirely built from scratch meaning that it's a brand new system and not just an upgrade of something old it's also the largest video generation model that has been openly released and the design is simple enough that developers can tweak or hack it to fit their needs now of

### [9:05](https://www.youtube.com/watch?v=vbzgWa6Ms4E&t=545s) Computing Requirements

course running these models can take a lot of computing power so genmo focused on making mcky 1 as efficient as possible alongside mcky one genmo is also releasing something called a video vae variational autoencoder and the vae is quite important because this compresses the video information down to a much smaller size which is 128 times smaller to be exact and it does this by

### [9:32](https://www.youtube.com/watch?v=vbzgWa6Ms4E&t=572s) Video Processing

breaking down the video spatially and temporally and reducing the complexity this allows people to use less computing power to run mcky 1 making it much more accessible now if you want to know how Asim D actually works this architecture processes both the user prompts and the video tokens in a streamlined way and it focuses a lot of its processing power on understanding the visual part of the video rather than just the text it uses

### [10:02](https://www.youtube.com/watch?v=vbzgWa6Ms4E&t=602s) Visual Focus

something called the multimodal self attention which means it can look at both the text prompts and the video content at the same time to understand how they should work together this is similar to how stable diffusion 3 works but with one key difference mcky one gives a lot more Focus to the video part by having more parameters dedicated to the visuals the model ends up being

### [10:24](https://www.youtube.com/watch?v=vbzgWa6Ms4E&t=624s) Language Model

better at generating realistic and detailed video content now interestingly many video generation models use several pre-trained language models to understand prompts which can be quite complex however mcky 1 simplifies this by using a single powerful language model called the T5 XXL to handle all the prompts this makes the model much more straightforward and

### [10:50](https://www.youtube.com/watch?v=vbzgWa6Ms4E&t=650s) Token Management

efficient while ensuring it can still understand and generate based on user inputs effectively Now mcky 1 is designed to handle a very large amount of video information up to 44,5 120 video tokens at once to be precise a token is basically just like a small part of the data that makes up the video and to make sure it knows where each piece should be mcky 1 uses a

### [11:14](https://www.youtube.com/watch?v=vbzgWa6Ms4E&t=674s) Design Features

technique called learnable rotary position embeddings rope which can be extended to work in three dimensions covering both space and time and this helps the model keep track of everything that's happening in the video can generate coherent and well structured scenes now this also benefits from some of the latest advancements in AI model design like swiglo feed forward layers which help the model learn better and faster it uses query key normalization

### [11:43](https://www.youtube.com/watch?v=vbzgWa6Ms4E&t=703s) Future Updates

to make the training more stable and sandwich normalization to keep internal activations which are essentially the parts of the model that light up when it's working under control and these tweaks help ensure that the model runs smoothly and produces high quality outputs without instability now genmo have said that a technical paper will f up with all the details to encourage progress in video generation but I think that this is absolutely incredible now

### [12:13](https://www.youtube.com/watch?v=vbzgWa6Ms4E&t=733s) Known Limitations

for those of you who are wondering what is coming next from the Gen mode team well it is here mcky 1 HD and mcky 1 HD will support 720p video generation with enhanced fertility and even SM the motion addressing edge cases such as warping in complex scenes now whilst Geno's mcky one is absolutely stunning there are a few known limitations for

### [12:39](https://www.youtube.com/watch?v=vbzgWa6Ms4E&t=759s) Performance Examples

example the initial release generates videos at 480p which is of course not HD and in some cases in edge cases with extreme motion minor warming and distortions can also occur and if you're wondering about what exactly you want to use this for mcky one is also optim ized for photo realistic Styles so it doesn't perform well with animated content and they're also anticipating that the community will fine tun the model to suit various aesthetic preferences which means that it's likely that in the coming weeks and coming months we're going to be getting specialized versions of this video model that could be even better and if you head on over to the web page one of the top Creations was one that I really do like this was one that actually tests how well this model perform forms against open AI Sora and we can see that the stylish woman walking downtown in Tokyo looks rather good now there are other examples on this web page that you can of course check out depending on the different things that entertain you but I think that this model is one that is really fascinating considering the level of quality and the level of control over

---
*Источник: https://ekstraktznaniy.ru/video/13938*