GROUNDBREAKING New TEXT TO VIDEO Stuns The ENTIRE AI Industry!

11:16

GROUNDBREAKING New TEXT TO VIDEO Stuns The ENTIRE AI Industry!

TheAIGRID 22.07.2023 12 252 просмотров 265 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

PIKALABS GROUNDBREAKING New TEXT TO VIDEO Stuns The ENTIRE AI Industry! https://research.nvidia.com/labs/toronto-ai/VideoLDM/ https://twitter.com/Artoid_XYZ/status/1680419222130946048 https://twitter.com/thedorbrothers/status/1679818718187077632 https://twitter.com/ammaar/status/1679552378301739008 Welcome to our channel where we bring you the latest breakthroughs in AI. From deep learning to robotics, we cover it all. Our videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on our latest videos. Was there anything we missed? (For Business Enquiries) contact@theaigrid.com #LLM #Largelanguagemodel #chatgpt #AI #ArtificialIntelligence #MachineLearning #DeepLearning #NeuralNetworks #Robotics #DataScience #IntelligentSystems #Automation #TechInnovation

Оглавление (3 сегментов)

Segment 1 (00:00 - 05:00)

so a text to video I just got a major upgrade as many of you know text of video is one of the AI fields that isn't as largely covered but nonetheless it's still receiving an ample amount of research and development and of course substantial amounts of investment now one company that recently just came out of absolutely nowhere it seems is p collabs and essentially what they've done is they've managed to Pioneer it's kind of like a new way in order to create text to video and honestly it's a lot better than some of the previous models we have seen before and then essentially the only other one that was really good was xeroscope and this is what I found on Twitter this was just a video posted by a user I'll leave a link to this in the actual description but the point is that text to video AI as you know has always been pretty difficult and one of the only other companies that I will come back to in terms of comparisons is of course Runway this has been the number one company in terms of what people have looked at for text to video in terms of what is the golden standard and they have you know four different versions text to video text and image to video and then of course a stylization but that's not what we're talking about what we are talking about so with the AI tool that I'm actually talking about is of course Peak collabs and essentially there's not much on the website in the way of demos that you can see because you can't like actually click the videos but I will show you the actual demos that people have managed to create with this software because honestly it seems like we're getting somewhere like this definitely does seem like something that is actually usable in an end production film and this doesn't mean that every other text of video was bad before but I do think that this method of text to video is particularly very effective so if you're wondering this isn't just raw text to video what we have here is essentially a Driving Image and then of course that image is animated in a certain way now for those of you who are wondering if there is going to be some kind of research paper on this webpage I don't see any research paper and my best guess is the reason they haven't released any kind of research paper is simply because what this company is doing is maybe they're building it off the backs of other research papers for example if we take a look at nvidia's Toronto AI lab they have this thing called align your latent essentially which is their own version of texture video but why this is so good is because they're using a different method and I do think that when companies maybe they read this research paper they might figure a few crazy things out essentially what they'll do is they'll build their own way and they might fine-tune their own model and I don't think they want to release a research paper because of course this is actually a product that is eventually likely going to be something like mid Journey so enough with the Dibble dabble let's actually take a look at how good this software really is and the kind of examples that it's been managing to generate so this is the demo that they did post on their actual socials page and essentially actually you can see right here that the majority of the content from picker Labs is going to be using a Driving Image now some of you might say that isn't text to video but I'm going to show you that the examples that they showcase are actually very effective so you can you work so you can see on the left hand side we do have the image input which is of course a young woman wearing a white shirt and then of course we have the prompt so essentially what the prompt does is the prompt conditions the image and essentially moves it in a certain way and if we look at this in terms of pure objectiveness this definitely looks pretty effective the only thing that current text-to-video models do struggle with are the two main factors number one being the frames per second as you can all see there is some kind of lag on the hair which is particularly visible and noticeable considering that many of us do watch High framed content for example this video that you're currently watching should be at minimum 30 frames per second and it seems like this is currently at around 20 to 15 frames per second which can cause visual issues now additionally another thing that we do suffer from in artificial intelligence text to video content creation is of course the lower resolution and I'm not sure what causes this issue but the low resolution I'm guessing is just maybe a lack of processing power or just a lack of efficiency on how these AI systems manage to create the final output so that is something that we've seen across the board I mean even when we look at in videos paper where they have all of these videos a lot of these videos aren't actually extremely high quality they're around 480p which means that there's something that still needs to be sold but despite these many drawbacks of AI texture video I would still regard this as something that does look very coherent and some other examples that we do show you will showcase this so take a look at this one Twitter user what they posted you can always see the Picker Labs watermark in the bottom right hand corner and essentially you can see that this does

Segment 2 (05:00 - 10:00)

look pretty interesting so I'm guessing that this might have been a specific image which they use as a Driving Image but we can see that with the text to video technologies that we have this considering its ants and it's pretty close up these are like macro shots it does look pretty decent in terms of what we've seen before one thing that I want you to pay attention to is the smoothness in terms of the frames one thing that many text to AI video things have in common is that they do struggle with the continuous flow of certain things leading to many things just merging together into one so I think AI does struggle to keep certain objects separate which is something that this AI software does seem to do a little bit better than now this is by another user called amareshi you can take a look at this example right here and he's actually really nicely given us the prompts that he's used now I can't wait to access the beta for this software because it means that we will get a more detailed analysis to be able to test and see which kinds of examples that this software is going to work with more effectively as you know certain AIS do work better with certain techniques for example chat TPT is great at writing scripts and it excels at writing code but in certain things it just isn't good so for example we all know the AI image struggled with hands for quite some time and eventually they managed to work on it so then of course we had this prompt right here the Golden Gate Bridge on a sunny day and then of course I'm guessing that the prompts that he used was of course from mid Journey so pay attention to the Driving Image we can see the quality then of course we can see that once the Driving Image is then used in the video you can see that of course like we talked about before that there is that substantial drop off in quality so that's something that we are hoping in the future does manage to get solved then of course we had this one okay and this one was my complete favorite so this is the prompt okay which is Batman the Animated Series Style cartoon illustration of Batman standing on a rooftop dark Deco style of Bruce Tim and then essentially the animation actually looks pretty decent I mean you can see the cape waving in the background um and these animations aren't that long but if this was some kind of strange Noir Style cartoon animation that I did see on someone's Channel I wouldn't immediately go to the fact that it could be AI I mean you have to understand that the creative that can light with just these base videos people can add filters they can add a zoom in effect they can add soundtrack they can add smokes it's going to be a pretty effective way to create content in the future and I'm pretty sure that we're about to get some probably fully automated YouTube channels where they just post these AI kinds of videos and if you think that this isn't happening you can see here that this channel called AI commercials is already gaining some substantial traction from making these attacks to AI videos and of course it's pretty interesting but anyways we can see right here this is the prompt of a man on a train and it's very effective at localizing where certain things are because you can see that it doesn't move the actual train but it moves the things behind the train which gives us the illusion of the train moving forwards or backwards so it's really effective at that and I would argue that this might be even better than runway's image to video because although this input image here is pretty decent we can see that the final version does look a little bit distorted so one thing that I would like to do if given the chance is to put these two AIS to a head-to-head test in terms of the exact prompts and the same input images to see if they're exactly the same or if they differ in some regards then of course we can see another prompt right here of someone walking I mean the walking animation isn't that great but it still gets the other things as well like for example the Mist the clouds the fog the atmosphere and of course other things too so this one right here this prompt is really decent and with the sound effects it actually does seem pretty good so essentially like we stated before this is eventually going to be some kind of mid-journey sort of subscription I'm guessing that they might have a free version that leaves you with this watermark on every single video and then of course the paid version might be some kind of 30 a month like and like we talked about before if we go on this specific example you can see that the bubbles actually do stay intact which is very good because in many videos like we stated before there's a lot of merging and as we can see from here you can see that this does happen sometimes which is quite unfortunate but I'm guessing that certain AI video generators just struggle with this area because it seems like for some reason sometimes objects can go missing but we will overcome this problem in the future now this was something that I did find interesting one Twitter user managed to actually compare every single AI text video generator in four different other examples so here you can see it says prompt a cat is walking in the grass dramatic lighting at Sunset now here

Segment 3 (10:00 - 11:00)

you're going to see that there are four different AI tools that are compared we do have Runway Gen 2 we have full journey. ai we of course have the newly released picker Labs then we might have not seen this before this is a tool called xeroscope. xl it's actually integrated with stable diffusion slash it uses the stable diffusion engine to generate the pictures that drive it and you're about to see what we have so remember that the prompt is a cat is walking in the grass and dramatic lighting at Sunset so we can see that if we pay attention in terms of what looks the best we can see that full Journey AI doesn't do that well in terms of the merging slash meshing that we talked about but when we do look at Peak collabs and we do of course look at zeroscope. ex so we can see that this does look pretty decent I mean in terms of what actually got the different factors right Runway Gen 2 got the dramatic lighting at Sunset and of course P collabs did as well but in terms of which video I had to choose it would be the one on the bottom left which is peaker Labs because of course it actually does get the right prompt I mean this one is two cats this one the cat looks weird isn't moving but this shows us that the rate of improvements in AI is not something to score fast

Другие видео автора — TheAIGRID

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник