NVIDIA's NEW AI 'Text To Video Takes the Industry By STORM! (NOW UNVEILED!)

10:28

NVIDIA's NEW AI 'Text To Video Takes the Industry By STORM! (NOW UNVEILED!)

TheAIGRID 19.04.2023 211 144 просмотров 3 822 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

NVIDIA's NEW AI 'Text To Video Takes the Industry By STORM! (NOW UNVEILED!) https://research.nvidia.com/labs/toronto-ai/VideoLDM/ Welcome to our channel where we bring you the latest breakthroughs in AI. From deep learning to robotics, we cover it all. Our videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on our latest videos. Was there anything we missed? (For Business Enquiries) contact@theaigrid.com #LLM #Largelanguagemodel #chatgpt #AI #ArtificialIntelligence #MachineLearning #DeepLearning #NeuralNetworks #Robotics #DataScience #IntelligentSystems #Automation #TechInnovation

Оглавление (10 сегментов)

<Untitled Chapter 1>

so Nvidia the company that has been powering the AI race has just released a new amazing tool check out this new research paper in which they document text to video now before I get into this insane research paper and all the examples and use cases remember yesterday's video where there were many comments insinuating that text the video is far away and the rapid advancements won't be coming anytime soon well just like clockwork another new AI software research paper is here and let's do a deep dive so it starts out by saying high resolution video synthesis with latent diffusion models this essentially just means that they're creating a text to video with models like stable diffusion and you can see

Sunset time lapse at the beach with moving clouds and colors in the sky, 4k, high resolution.

right here from this first example this is what we get so the prompt is at the bottom which I've added and you can see right here that it says sunset time lapse at the beach moving clouds and colors in the sky 4K high resolution I don't know about you guys but I do think that this is actually pretty close to as realistic as you do get when it comes to text to video generation now there are some other examples that I am going to show you that aren't as good as this but this just goes to show you how quickly the AI race is actually moving when it comes to things that we never thought were possible like text to video so let's talk a little bit more about how this stuff actually works and then we can get into some more of the examples because they've actually provided us with a ton of examples and there are a ton of things that are included in the research paper so you can see here from the abstract there is a lot of text but the piece that I've highlighted illustrates exactly what's going on it says doing so we turn the publicly available state-of-the-art text to image ldm stable diffusion into an efficient and expressive text to video model with a resolution of up to 1 to 80 by 2048 so essentially what they're saying here is that they've turned a stable diffusion into a text to video editor and the results seem to be pretty impressive now I'm not sure when this is going to be released I think they're still fine tweaking it but take a look at some more of the other examples that they have provided us with before we do get to the end of the video I want to know if you all notice a common theme about these certain types of videos because I'm starting to realize that with text the video there's a certain kind of text of video that works but there's a certain kind that doesn't but anyways let's take a look so here's one

Text prompt: "A teddy bear is playing the electric guitar, high definition, 4k."

of the first prompts a teddy bear is playing on the guitar high definition 4K now this was one of the ones that was one of their favorites I guess it was a star favorite and you can see that this is clearly an image that is based on stable diffusion which Dave used to generate this kind of video now of course this isn't the best kind of video but it does show some sort of progress from what we did see in yesterday's video on Google's dreamix paper and it's very interesting that they've actually used stable diffusion to do this because it seems like all these texture video editors are doing this in a different

Text prompt: "A storm trooper vaccuming the beach."

way now this is a interesting one because this is a stormtrooper vacuuming on the beach and I don't think this one is particularly bad now of course I'm going to State the obvious here by stating that this is by no means ready for film level production but remember what Dali 2 was like in its early days and where it is now we could certainly see some kind of improvement in this kind of video generation definitely a very interesting example but take a look

Text prompt: "A fantasy landscape trending on artstation,4k, high resolution"

at this one now I've got to be honest with you guys this one right here a fantasy landscape trending on Art station 4K high resolution is by far one of the nicer ones and I mean I do think that this is more on the side of I wouldn't say realism but I would say perhaps using it for maybe an in-game loading screen maybe some kind of cinematic maybe a peaceful YouTube video I'm not sure it could definitely pass as one of those videos if maybe the quality was improved in certain areas now we also have another one here which is a

Text prompt: "Turtle swimming in ocean"

turtle swimming in the ocean and something that I've noticed about these videos released by these text of video models is that they seem to struggle with moving Parts I guess you could say moving animals if there's something moving in the image doesn't come out as good but I guess you should be the judge of that because I think this one isn't as bad but of course it isn't on Mid Journeys level quality yet but nonetheless it still provides an early insight to what we might see in the future now this one I really do like

Text prompt: "Close up of grapes on rotating table, high definition"

because it is a close-up of grapes on a rotating table high definition it actually does look pretty decent I mean one thing that I do realize that there is on a lot of these images is there is a little bit of grain which I would like to see removed in the upcoming versions but nonetheless I'm not gonna lie this one does actually seem pretty decent based on the text prompt that it

Text prompt: "A Koala bear playing piano in the forest"

was given now this one right here is definitely one of the more interesting ones that is honestly quite hilarious I mean it says a koala bear playing a piano in the forest and honestly it does kind of look like a koala bear playing a piano in the forest and I mean stay tuned for some other examples because there is something really cool that Nvidia did release with this software that I haven't shown you just yet but I mean this one is a hit and a miss but this is definitely showing you how the AI is trying to work that out now this

Text prompt: "A traveller walking alone in the misty forest at sunset"

one right here is also definitely one of my favorites it's a traveler walking alone in the misty Forest at Sunset and you can see right here that it does get the background very well that Misty Sunset it captures that type of scene perfectly which is I guess of course run by stable diffusion but simply the motion of the character walking on the left isn't absolutely perfect but like I said with a few fine tunings I'm pretty sure that eventually this is going to be

Text prompt: "Time lapse of aurora in the sky 4k resolution"

perfect now here is the last one I'm going to show you before I talk about nvidia's awesome release that they did with this as well one is a time lapse of Aurora in the sky 4K resolution and like I said it seems that when you do have these examples of fantasy Landscapes or just simply time lapses I think that this is where the video text the video kind of software does Excel now there was something that they did include with this which I find to be very good because it does have some use cases so let's take a look at what additionally they released with this research paper so one thing that they did include was the idea of personalized video generation so what I'm showing on screen right now is dreambooth and this is essentially fine-tuning text to image to Fusion models for subject driven video generation so essentially this basically just means if you input a certain amount of images you can actually then get certain images with that object in at different places so for example in this example that we have on screen you can see the dog is then put into random location you see that the dog is in a doghouse it's also in a bucket it's also swimming underwater it's also getting a haircut and of course this does provide a wide range of applications because I mean we all have our own objects and our own things that we would like to put into certain locations without having to go to those specific locations so I can't imagine the amount of different things that you could literally do this with now the reason I actually talked about dream Booth because of course it's something that already exists is because they tried to do this with the AI Video Creator so you can see right here that the training images that they used were that of Kermit the Frog and then what we have here is of course the video so they decided to use the text prompt a SKS frog playing a guitar in a band and if you're wondering what SKS stands for it just means that specific object that's just the code word they use so in this case it would be Kermit the Frog playing a guitar in a band and of course you can see right here that although this doesn't look super realistic we can see that the image General rated output looks far more realistic than some of the moving ones that we saw initially in the beginning so it's clear that when you provide that Driving Image that your results are going to be far more Superior I mean right here Kermit the Frog writing a scientific research paper this definitely does once again look more realistic now I think what this does to get clearly is that it does actually put the subject in the video clearly and I think that's what is most important because although the animations aren't completely perfect although the background isn't completely perfect we do get a highly realistic visual of that specific character now what's also interesting is that they did a very hard image here you can see this teapot has multiple colors and many different ones but somehow the AI actually does manage to get the text prompts correct you can see right here an SKS teapot floating in the ocean and this one doesn't look that bad I mean the ripples do look pretty good although it is quite hard to stabilize that teapot I think this output was very decent now something that they also did was driving C in video generation where they essentially trained the video ldm on real world driving video scene scenarios and essentially what they were able to do is generate these videos and some of these videos are literally up to five minutes but here's the ones that they decided to provide so here you can see this is a high resolution video of someone driving and I think America honestly I'm not sure I'm not like that guy that can just literally guess where you are in whatever country that you are but I do think that these examples are very interesting now I do know that yes there is some I guess you could say distortion on the cars that do go by but I think for dash cam videos it does seem pretty accurate I mean of course like I said Distortion of the cars there is a little bit of manipulation there on the trees and on the highways but nonetheless I still think that you know for example this video right here is quite impressive in terms of what it's able to accomplish what's also interesting is that they wanted to do a specific driving scenario simulations so essentially what they did is they I guess they built up this image using these boxes and what they were able to do was recreate some kind of scenario where I guess they were able to create some kind of simulation for a specific scenario and the resulting video is what they came up with so essentially what they're able to do is to create specific video scenarios based on what they wanted which is really cool because it goes to show that for the use case that you know maybe if you're a new driver and you want to just say okay what happens if I have three cars they're about to crash into me what do I do how's that going to work maybe that could be a real use case so definitely some very interesting stuff going on at Nvidia honestly this stuff is moving very quickly and one thing I do want to add is are we going to see a future where many of these text-to-video generators are going to be using mid-journey instead of stable diffusion because we know that mid Journey right now is pretty much at the stage where it doesn't really need any more Improvement I think it's completely done because you can pretty much do anything in terms of image generation in terms of the quality so it will be interesting to see where my journey Heads next and also where these other models head to

Другие видео автора — TheAIGRID

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник