# Chinas NewTEXT TO VIDEO AI  SHOCKS The Entire Industry! New VIDU AI BEATS SORA! - Shengshu AI

## Метаданные

- **Канал:** TheAIGRID
- **YouTube:** https://www.youtube.com/watch?v=IOlKL26FOC4
- **Дата:** 29.04.2024
- **Длительность:** 14:46
- **Просмотры:** 17,814

## Описание

CHINA STRIKES AGAIN! New VIDU AI BEATS SORA! - Shengshu AI - Text To Video AI

How To Not Be Replaced By AGI https://youtu.be/AiDR2aMye5M
Stay Up To Date With AI Job Market - https://www.youtube.com/@UCSPkiRjFYpz-8DY-aF_1wRg 
AI Tutorials - https://www.youtube.com/@TheAIGRIDAcademy/ 

🐤 Follow Me on Twitter https://twitter.com/TheAiGrid
🌐 Checkout My website - https://theaigrid.com/

Links From Todays Video:
https://www.shengshu-ai.com/home

Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos.

Was there anything i missed?

(For Business Enquiries)  contact@theaigrid.com

#LLM #Largelanguagemodel #chatgpt
#AI
#ArtificialIntelligence
#MachineLearning
#DeepLearning
#NeuralNetworks
#Robotics
#DataScience

## Содержание

### [0:00](https://www.youtube.com/watch?v=IOlKL26FOC4) Segment 1 (00:00 - 05:00)

so a couple of days ago there was a recent announcement from another Chinese technology company Shang Shu technology and this is an AI firm that along with Ting University developed vdu China's first text to AI video model now this is the recent announcement that they held and it's pretty incredible vidu is capable of generating highdefinition 16sec videos in 1080P resolution with a single click and it's positioned as a competitor to opening eyes Sora text to video model with the ability to understand and generate Chinese specific content like pandas and Dragons now what you're about to see is of course the full demo in which they showcase the abilities of these clips and I personally do believe that this is something that is rather surprising so take a look and so there we have the actual demo and this demo has been received with many different mixed reactions and for a variety of different reasons now I'm someone who's pretty open to many different AI Technologies and I've taken a look at many different video AI generators and something I do want to say is that this is a lot better than you do think okay like I know some people are stating that this isn't great but trust me video generation is extremely hard and that's why many state-of-the-art models that you can currently use for free don't have the ability to do what we're seeing in this clip and you know the things like we've seen in Sora so what we have here guys is a clear indication that China has been slowly but surely ramping up its AI efforts and this is of course something that we aren't surprised by but I think this week is probably one of the most surprising weeks in terms of what China has been able to do in AI first of all they've got a robot which has actually been state-of-the-art in terms of Robotics second of all they then developed an llm that was pretty much state-of-the-art in terms of vision systems the small model systems and of course their large language model systems are passing dpt4 and then of course the third announcement that we did get from China was this vid this text to video AI model that can pretty much surpass the state-ofthe-art in terms of what is freely available now the demonstrations shown here some people might state that they are cherry-picked but I would argue that of course they're going to be with any kind of AI generation that you do get sometimes there are things that don't look right so of course in any demo things are going to be cherry-picked for that demo and I don't think that is a crazy thing at all now there's some more information and some key things that eagi viewers actually may have missed so I'm going to be showing you guys those key things that you probably did Miss if you weren't paying attention to this actual video so one of the things that you probably did miss about this trailer was this right here you can see that the creators of this text video model clearly know that Sora is the biggest AI text to video system in terms of competition and because of that they are uniquely positioned in where they've placed certain Clips in the trailer and one of those clips that they played in the trailer was of course the clip of the woman and the man walking down a busy street at night in Tokyo now we can see here that the one from open eyes Sora actually looks very good in terms of the temporal consistency and in terms of everything else now with regards to the video one when they actually did show us this it was only

### [5:00](https://www.youtube.com/watch?v=IOlKL26FOC4&t=300s) Segment 2 (05:00 - 10:00)

around 3 seconds in the trailer but I do have to say that it is pretty good motion for their first ever system I mean maybe it's not the first of a system but the first one that's gained notoriety due to the level of detail and level of consistency now of course as you can easily tell you know opening eyes Sora is miles ahead and in fact I wouldn't actually say miles ahead I would say it's a decent bit ahead but maybe with version two they could most certainly catch up to this model and if we go back to the trailer you're going to see that there are several different instances where things are quite similar so for example right here you can see that it's quite similar to the woman who's walking down in Tokyo and of course if we look on to the right there is of course some morphing on the hands and of course on the legs but I think we have to give credit where credit's due because if we actually look at the skirt right there we can see that as the legs move up on the right hand side there is a neat bit of defamation on the skirt that actually does look very normal and it does look really correct an example on the jacket when the guy's walking we can see that the jacket is actually swinging around and the hips and the motion right there actually do look pretty effective so I know a lot of people were dunking on this saying that this is you know objectively mediocre and I've seen many different tweets stating that but I would have to say that this is not mediocre at all and this is definitely a state-of-the-art level system because if an AI company right now came out and released this in the west this would ly be heralded as something that is a SORA killer so I think maybe what we have here is a situation where things look very good but people aren't really appreciating what it is just because it's not available for use yet and because I'm guessing Sora exists now there was another demonstration that was also pretty interesting we did see open eyes Sora if you do remember this clip that was initially released with Sora they released this clip of a kind of Land Rover thing driving around the hillscape and it was pretty good now when you do compare it to the new one of course the vidu it of course doesn't look as good in the trailer but I do have to say that it still is pretty decent based on what we've seen now you can see right here some of the things that I will say that this does get right and other systems don't is of course the temporal consistency so for example right here when we're looking at this clip and it's about to come up now you can see like the bushes they actually stay in motion and they move past as well as with the trees and the only thing that I would say with this as well is that I downloaded the video so I don't know if there's like a higher resolution version available online so I can't really right now comment on the quality of this because the video has been shared around so many times with the original footage being hard to source so I don't think that this is the of course is like the highest quality because even on this video clip like even in the resolution that I have it now there are some clear artifacts on this video um like here you can see that there is like light like breaking up and that uh usually happens when a video has been downloaded and shared so many times so it's pretty hard to find the original 1080p clips and I think once you know they release them again on like maybe some official like YouTube type thing then we can actually see how much better they are because a lot of people might say that the quality isn't good the temporal consistency isn't good I would disagree and like I said and this is a key thing because videos are being shared around the resolution is getting lower and that is going to impact how people actually do see this and I think that is an important thing to remember when trying to be unbiased when viewing this now if we actually do take a look at something here and this is what I'm stating okay is that if we look at this um opening is Sora isn't actually released yet which means that what this is this is a state-of-the-art system because if opening is Sora isn't available yet and we know that this requires magnitudes more compute than we could even think about this is something that we need to you know take a look at because when you've seen Sora and of course Jenna 2 combined SL compared we can see that Runway which is you know arguably the second SL you know Top Cabs or whatever um we can see that Runway generation 2 doesn't really have any good temporal consistency in the likes of what we've just seen and I think it's important to note how crazy this actually is because they're actually I would argue that it's actually quite better than Runway generation 2 well yes it does have some good features in terms of being able to move smoothly a little bit would say that there isn't much motion it's more of like a really slow you know motion thing like for example here this is a key example if we compare opening eyes Sora this one right here to runways Gen 2 and then we see what vid's done because VI have done direct comparisons because they're a direct competitor we can see that open is Sora if we actually look at how the water moves we can see that it moves pretty well the ships are moving you know all well and that looks pretty good but on Runway Gen 2 the waves and stuff like that it doesn't look really well and then of course as well we can see like if we go back to

### [10:00](https://www.youtube.com/watch?v=IOlKL26FOC4&t=600s) Segment 3 (10:00 - 14:00)

video and this is what I'm talking about when I say that video is something that's pretty important we can see that in this demo right here which they've shown us there's actually decent consistency in terms of the wave and how things actually move around it and then of course there's another example somewhere in the clip right here like right there we can see that the waves crashing around it looks pretty realistic like they're not morphing into the boat everything looks PR pretty realistic here I know this is just like a short demo but that's not something you actually see at Runway at all that kind of motion these waves crashing around you didn't see that in runways at all and we don't see that in pabs which means that of course you know that kind of temporal consistency with the motion where the characters were walking where we saw the skirt and of course we saw this kind of thing at opening eyes it means that you know they're definitely a step ahead now in terms of the architecture for vid was actually proposed as early as September 20122 predating the diffusion Transformer the dit architecture used by Sora so vidu is pretty different it utilizes a universal Vision Transformer a uvit and that architecture actually allows it to create realistic videos with Dynamic camera movements and detailed facial expressions and adherence to the physical World Properties like lighting and shadows and I think that they've done something here pretty amazing considering they're not even using the same architecture and like I said before if we also take a look at one more example and this is why I'm going to show you guys how good this is because a lot of people are looking at this and say it doesn't look that good but if you compare it to what's actually state-ofthe-art that we can get our hands on now this is something that is really good um and it's pretty surprising so for example again taking a look at this opening I Sora the clips of the TVs moving around and stuff like that you can see that all of the things are flashing and then you can see with Runway generation 2 it's very you know not that crazy at all it's very slow there's not really much in terms of you know moving around but if we go back to vid and this isn't like know this isn't any kind of favoritism what not but if we go back to vid there's a clear for example here of them moving around okay moving around the objects like this and we can see that this is pretty crazy like that is really hard to do like this is insane like honestly this is crazy like and I do need to get the HD versions of these because like I said it's a bit disingenuous although I couldn't find them I really did try and search but it's a bit hard to you know judge this when you don't have the full resolution Clips but you can see like in the back okay like take a look at these images here they're staying in place they're all not deforming they're not meshing and you know going around these TVs all of this motion is happening and the TVs are moving correctly like that okay compared to what we just saw here like literally at the end of this okay you can see right here like literally compared to the end of this like honestly guys it's pretty good like you have to say that this is pretty incredible in terms of what they've been able to accomplish and another thing as well is that if you take a look at AI videos that were one year ago literally one year ago and then um you know you take a look at what we have now with Sora and the kind of techn that we have now on you know the craziness of what we're able to do I think we have to understand how far we've come in such short of a time and you could argue that yes um it's not just short of a time things are pretty different in terms of you know architectures been building up on different architectures and Decades of research but things are starting to accelerate so that's something that I think you know should keep interesting but it does seem like CH China right here genuinely has taken the lead because like I said before I think there is probably like an actual 1080p Vision cuz the videos have just been you know downloaded so many times um and like I said before we don't even have access to Sora was given to you know people that are in like the film industry and stuff like that and they've been using it and they said it takes about 10 to 20 minutes per render and you can render up to 3 seconds all the way up to a minute clip long um and I think that's a really important point to remember so let me know what you think about this I think this is a pretty gamechanging stuff um I think it's pretty crazy let me know what you think I think this is absolutely incredible from them and I think in the future we're definitely likely to see more and more competition and once again what I do find surprising is that China's been able to pretty much catch up to state-of-the-art models in not a short amount of time at all and I think they're definitely going to prioritize this technology so I wonder where does that leave the US in terms of how they're going to now prioritize this are they going to speed up their acceleration or slow down and regulate it in different manners I honestly have no idea but I do think that the USA are probably going to speed up the development when they've seen that you know China can pretty easily catch up across all bounds and I think this is definitely going to create some kind of AI not arms race but definitely AI race um and it will be interesting to see how this is deployed in the future so with that being said let me know what you think about this technology I think it's pretty good I think it's pretty surprising as well and it's been a pretty crazy week for the China slua AI rate

---
*Источник: https://ekstraktznaniy.ru/video/14363*