Game OVER! Chinas New AI Video Tool BEATS SORA! (KLING AI Text-To-Video)

23:49

Game OVER! Chinas New AI Video Tool BEATS SORA! (KLING AI Text-To-Video)

TheAIGRID 07.06.2024 38 990 просмотров 780 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

Join My Private Community - https://www.patreon.com/TheAIGRID 🐤 Follow Me on Twitter https://twitter.com/TheAiGrid 🌐 Checkout My website - https://theaigrid.com/ Links From Todays Video: https://kling.kuaishou.com/ Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos. Was there anything i missed? (For Business Enquiries) contact@theaigrid.com #LLM #Largelanguagemodel #chatgpt #AI #ArtificialIntelligence #MachineLearning #DeepLearning #NeuralNetworks #Robotics #DataScience

Оглавление (5 сегментов)

Segment 1 (00:00 - 05:00)

so China just went ahead and released their text to video AI tool and it is pretty incredible so I'm going to show you guys the quick sample of some of the clips and then we'll dive into the good stuff so what you just saw was of course the very impressive clling AI this is clling AI video generation tool and this is something that was launched by CA and this is a major Chinese technology company that was launched in 2011 with its headquarters in Beijing and this I genuinely have to say on some of these demos I would argue that it actually genuinely surpasses Sora in terms of the consistency and what it's able to do regarding the quality of the clips and trust me when I say that you need to watch this video Until the End because once you truly understand how great this system is in terms of its ability to generate high quality Clips with a very decent amount of consistency amongst the scene in terms of ensuring that characters remain stable and consistent and for example Clips like this that are just a remarkable display of true understanding of exactly what's going on it is showing us that right now we are seeing that AI is coming to a point where other nations are truly starting to catch up and slowly even surpass some of the state-of-the-art models in certain areas like Tex de video so let's dive in to exactly what makes this entire system so effective and how it actually works and how the team managed to crack this in such a short time frame so there are six different things that they talk about on their web page and I'm going to show you guys exactly what they are so one of the things that they talk about is 3D spatio temporal attention so you can see right here that the prompt we have is a man riding a horse in the Gobi desert with a beautiful sunset behind him movie quality like scene and essentially this is where they've adopted a 3D spatial temporal attention mechanism which can better model complex spatial temporal motion and generate video content with larger movements while conforming to the laws of motion now this isn't by far their best clip in fact this is probably the worst clip that you're going to see for the entire video but essentially what they talk about here is the ability to ensure that when they're generating clips that have a lot of different moving parts and a lot of things that you know have Motion in them it's very difficult to ensure that certain things are actually pretty consistent and with this clip we can see that things remain consistent we have the person riding and we can see their body doing what it should be this is how Riders move when they're on a horse we also have the dust trails and of course we have the legs of the horse moving in sync with the entirety of this clip as well as the background that is moving correctly so this is something that is remarkably impressive they also demonstrate another example here where an astronaut runs on the lunar surface the low angle shot shows the vast background of the moon and the movements are smooth and light so what we can see here is that this is a clip where we can see an astronaut running AC across the moon a very decent one I don't think this is their highly skilled model I'm guessing that this was just where they wanted to Showcase what you can do when you have the camera angle panning From Below all the way to above so this is an example where they're trying to show as much character consistency among different camera angles and I think this whilst yes the quality isn't remarkably incredible I still think that it shows what this system is able to do because if you take a look at for example the sh Shadows that most people wouldn't even think to look at they do look remarkably accurate so now let's take a look at another example of their 3D spatial temporal attention mechanism in action so this is where we have the most interesting thing and this is by far one of the most interesting demos that you'll probably see in the entire video and there are a lot more that are remarkably impressive and I was truly shocked by this and I know we have that as a meme on this channel amongst the AI community but this was generally pretty

Segment 2 (05:00 - 10:00)

surprising that they managed to catch up or at least be on the level of Sora in such a short time frame so this is where they talk about thanks to the efficient training infrastructure extreme inference optimization and scalable infrastructure the keying large model can generate videos up to 2 minutes long with a rate of 30 frames per second so that's the info that they have on their website and I think this is arguably more impressive than the open ey Sora video because what we are seeing here is a 2 minute long video that is remarkably consistent with the background animation I guess you could say whatever the background footage is I mean it's truly impressive with as to what we are seeing here and I would argue that this is much longer than some of the Sora demonstrations because Sora demonstrations as far as I know were limited to around 1 minute minut it now they might be working on Sora 2 but if we're taking a look at what we're truly seeing here this is truly a remarkable level of consistency and temporal consistency because what we have to truly think about here is that the AI system must need to understand exactly what's going on over a longer period of time you have to understand that the longer the context is the harder it is for these AI systems to I guess you could say be remarkably consistent and we can see that this consistency is ushered amongst the entirety of this 2minute long clip generation now there was another example but I didn't choose to include it because it's not as good as this one in terms of actually explaining what's going on but I think this example right here and you can even see that there are literal train lines as the train is going across and of course maybe the background doesn't make sense cuz it looked like that was Rome and then it looked like another place was Arctic so maybe that's a bit small in terms of the details that it might be missing but I think that video generation up to 2 minutes long where you have this level of consistency and usually with the kind of AI systems that we're working with the longer the systems generate things for the more errors you start to see because things just get lost in you know translation I guess you could say like as the information is you know processed through the AI system a lot of it does get lost which is why early on a lot of the AI video systems that we used to see were only 2 to 3 seconds worth of videos and now you can see we've got things that are up to 2 minutes long and there doesn't seem to be any real glitchiness or any real loss of quality regarding what's going on here so this is something that I think is remarkably impressive because it shows that this system is able to generate consistency especially when the AI system is able to look at what the scenery is like and it's able to generate consistent footage for whatever system or whatever scene may be next and all of the motion that's going on here I think this is genuinely really remarkable and impressive now one of the most impressive things that we did see with other AI systems was their ability to simulate the physical world properties this was something that was talked about in the Sora paper because it was hiled as a new capability that was I guess you could say kind of emerging because it something that we didn't really expect but of course as these AI systems are trying to predict the next frame or I guess you could say make the videos all in one go which is usually the architecture that we know that they use they have to I guess you could say understand how the physical world Works in order to create a video clip that actually looks realistic and whatever kind of world model they may have internally this shows us that they're able to simulate the physical properties of the real world and generate videos that conform to the laws of physics so here we can see that the prompt is carefully pour the milk into the cup the milk flows steadily and the cup is gradually filled with milky white so that's the actual extracted prompt from the website and what we can see here is remarkable consistency in such a short clip now there's another clip that I do want to show you guys that has been remarkably impressive because I would say if there is probably one video clip that you do take away from this video it's going to be this one so take a look at this clip right here a Chinese man sitting at a table eating noodles with chopsticks and I would have to argue that if I personally saw this clip at like 480p maybe on a forum or something I wouldn't for the life of me think that this is AI generated at all but we can clearly see here that this actually is AI generated

Segment 3 (10:00 - 15:00)

but it looks remarkably impressive because one of the things that you don't see here is that the man doesn't actually have Source around his lips but as he inhales not inhales uh the source you can see that there is all of this mess around the lips and that's because of the sauce that is orange at the bottom of the I think noodles here so I think it's rather impressive that such a subtle detail is captured with the AI system which is truly in my opinion remarkable because it shows that all of these small details are captured by these systems and they're not really missing out on any of the finer quality details that we do expect from traditional video footage so this was one of the clips that I think truly showed people that hey this is a system that is really up there in terms of its ability to generate clips that are impressive and I think unless you're actually just focusing on the hands because the hands don't look as realistic and I mean you can see just a little bit of inconsistency but enough to let you know that what you're watching is AI generated but I think this is of course something that is just remarkable in itself especially the way the noodles move and the fact that the guy's emotions look very realistic there was also this example and this is where the chef chopping onions in the kitchen preparing for a dish and I would argue that yes this isn't as good as the pre previous one and it isn't as long as the previous one but it still is a demonstration of this simulation of simulating the physical world's properties and the reason why they've likely included this one is because what you would doing in this video clip right here is that you are basically changing the physical nature of that onion okay so essentially the reason that this is of course so impressive is because you have to truly understand what is going to happen to an onion when it is cut by this blade and you can see that as it is cut you can see more onions are processed and then they are split out by the knife which is pretty impressive because this shows a decent level of understanding by this AI system and I would say that it is very very hard to get this kind of consistency with whatever AI system you are using now this AI system was truly impressive because there were also other examples of them being able to generate high quality things and just do a whole bunch of other useful things that we may have not have you know even thought about so one of the things that they spoke about was of course the strong concept combination ability so this AI system is remarkably good at combining different concepts together so this is a white cat driving a car through a busy downtown street with tall buildings and pedest in the background now the reason that they've done this example is because this footage doesn't exist so a cat driving a car downtown through a busy City street of course footage like this hasn't been recorded before it doesn't really exist on any I guess you could say person's hard drive or any of those large databases where they just house you know millions of royalty free stock videos so I'm guessing that what we have here is a situation where they're demonstrating this AI system's ability to generate new and interesting videos that haven't existed before and combine you know existing videos with other New Concepts to create new pieces of material which is of course very fascinating because it shows us that this is a system that doesn't fail when it tries to mimic exactly what is going on with the real world so we can see the background is of course very good in its consistency and we can see that even the subtle movements of the cat as it looks around and drives the car those seem quite realistic if I say so myself now once again you can see that they've demonstrated this ability in this here where we have a macro lens volcano erupting in a coffee cup a scene that of course you wouldn't ever see unless you somehow manage to have a volcano erupting in your coffee cup but what we have here is a demonstration of exactly how great this system is so we've got a situation on our hands where it's not just good at replicating some of the footage that we've seen before it manages to show us how the liquid from the volcano actually transfers into this like coffee style liquid and gets melted along the cup edge here and one of my personal favorites from this entire strong concept combination ability was the ability for this Lego character

Segment 4 (15:00 - 20:00)

visiting an art gallery I thought that the reason that this was so good was because this video clip actually captured the nuances of how Lego characters actually walked if you've ever seen a Lego Movie you'll know that those characters in the movie they actually walk exactly like this which is remarkably surprising the fact that they were able to act actively capture exactly how this Lego character works and of course you can see even on the right there as a little Easter egg there is also a Lego character there too so it's very interesting because what's fascinating as well was that this character on the right was in focus and as the Lego character keeps walking forward and forward it then shifts to being out of focus which is like I said before if you're someone that doesn't really you know understand how videos work because you've never worked in media before you might miss some of the subtle details as well as mistakes but I think you can grow to appreciate them more especially if you've had that background which is why when I look at some of these clips they truly do make me pretty impressed so I think that this one here was really cool because it showed the ability to capture specific details across many different clips now one of the things that I really did like and I have to say that I think personally this is my favorite feature from this video system and what we have here is movie quality image generation one of the biggest gripes that we've had and that I've personally had with video AI systems is the fact that they just don't have the good quality B yes temporal consistency is something that we do look for in these video clips the problem is right now that the quality is just not there but you can see right here with the prompt that we have this is a very high quality clip that looks remarkably accurate of what we've described now I want to show you guys this clip instead because this is the clip right here that showcases just how good the quality is in terms of what you're getting here and if I'm being completely honest with you the quality here might not look as good as it can be because I've of course downloaded this clip and then I've uploaded recorded my screen and then I've once again processed the video again and then it's been upload loaded to YouTube and of course it's been compressed again so trust me when I say when you see this raw video it actually looks remarkably impressive in terms of its quality and this is something that I'm not just stating that for the video but it does look really high quality higher quality than anything I've seen now of course postprocessing with any upscaling you know video softwares that you want to use I think in the future this is not going to be a difficult problem to solve at all but I do think that having a system that can natively output high quality footage is going to be something that is a game Cher for Industries and of course you can see right here that this is a prompt of a chimney under the sunset and this is where you can start to see that the high quality nature of this AI system isn't just for show it's something that is truly truly impressive so I think that you know when we take a look at all of these factors combined and the fact that this system you know apparently is in Alpha Testing as in some people are actively being able to use this shows us that China is rapidly advancing with their video models and all models that they are currently using now another feature that they actually spoke about was the varied aspect ratio so they spoke about how key Ling adopts a variable resolution training strategy which allows it to Output a variety of different video aspect ratios for the same content during the inference process meeting the needs for video materials in richer scenario that's what the website said but essentially we have it here in a 1080x 1080 scene and then on the left here we have it in a 920x 1080 scene which is basically just of course the portrait Edition and then of course we have the square Edition there was a landscape Edition but I didn't include it because I'm sure you guys can completely understand the picture of whatever this AI system is trying to do but I genuinely have to say that with this clip being in there as something that I personally think is probably one of the most realistic clips and of course when we do take a look at some of these for example this bird right here being very high quality and of course this road right here showing us the kind of consistency that I just wouldn't even think of this right here showing us the real world physics being demonstrated a

Segment 5 (20:00 - 23:00)

very consistent fish underwater and of course one of my favorites was the panda playing the guitar now there was one clip that I actually did forget to add but I'm going to show it to you all now because the consistency of that clip is remarkable and I'm going to show you guys why although it was slightly demonstrated a little bit before so this is the clip I wanted to show you guys before the video ended so this was a clip where we have a little boy e in a burger but take a look at what happens because there was also this clip from Sora and I would argue that it was remarkably impressive for this very reason so he takes a bite of the burer and then you can see literally as he's taken a bite that there is quite a lot of mess around his mouth which I think is remarkably accurate for of course how kids eat and it's managing to simulate the fact that there might be certain particles left on well not actually particles you call the these actual crumbs but of course that they'd be on his face and I just thought that this is an eerie realistically uh generation for such an AI system so overall I think that what this is going to do for the Dynamics of the AI market place is it's going to show us that China can compete quickly and efficiently to not only the state of what the United States is doing in terms of their AI development but in even some instances managed toass them which means that now that China is of course focusing a lot of their efforts on these kinds of systems and of course we've seen a variety of different advancements across many different domains I genuinely wouldn't be surprised if in a couple of months we do get a bunch of different Chinese AI tools that are far superior than what the United States has and it may create an even worse terminal race condition where other nations are fighting to develop the very best AI systems which could lead to detrimental outcomes now I know that yes this is literally just a text to video AI video but of course it shows us that this kind of Technology was something that we really looked at and we helded it as if it was going to be something that was completely impossible just 18 months ago and now we have a system that some people would say is just remarkably you know uh just realistic so I would say overall what does this do for your timelines in terms of where you think AI is going to go because I don't think if it was for Sora or for Google's recent vo we would even be maybe not as shocked by this kind of demo but for me personally this kind of makes me believe that the kind of capable systems that we're going to be getting in the future even if it's not from the United States but from another country who is developing it it's definitely going to be absolutely incredible because if another country releases this tool and a lot of people are using it then man the fight for customers and of course the marketplace is going to be very incredible to watch so with that being said let me know what was your favorite demo was it the man eating the noodles with chopsticks was it the high quality blue rose petals in HD was it the chimney under the sunset that looked remarkably interesting or was it the very long video generation up to 2 minutes long but we saw remarkable consistency across many different areas that showed a very good ability to generate the physical world or was it of course the cat driving around the city with tall buildings and the pedestrians in the background I'd love to know what you guys thought about this if you think this is actually a major AI update or if you thought this is not something that is worth your time other ways I'll see you guys in the next video

Другие видео автора — TheAIGRID

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник