# Googles New VIDEO AI 'Magvit' Continues To To SHAKE UP The Industry

## Метаданные

- **Канал:** TheAIGRID
- **YouTube:** https://www.youtube.com/watch?v=M2Yx7zwE9q4
- **Дата:** 04.07.2023
- **Длительность:** 14:11
- **Просмотры:** 16,004
- **Источник:** https://ekstraktznaniy.ru/video/14791

## Описание

Googles New VIDEO AI 'Magvit' Continues To To SHAKE UP The Industry

https://magvit.cs.cmu.edu

Welcome to our channel where we bring you the latest breakthroughs in AI. From deep learning to robotics, we cover it all. Our videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on our latest videos.

Was there anything we missed?

(For Business Enquiries)  contact@theaigrid.com

#LLM #Largelanguagemodel #chatgpt
#AI
#ArtificialIntelligence
#MachineLearning
#DeepLearning
#NeuralNetworks
#Robotics
#DataScience
#IntelligentSystems
#Automation
#TechInnovation

## Транскрипт

### Intro []

so Google research have done it again and in this video we're going to be taking a look at a research paper that was just released which shows a various number of uses for their new generative video Transformer so as you may know large language models have taken over the artificial intelligence landscape but video is one that is particularly difficult but apparently not so for Google so in this paper they call it magnit a masked generative video

### What is Magvit [0:27]

Transformer and if I'm being honest with you I've looked at tons and tons of different research papers and so far we haven't actually seen anything like this before so this is going to be very interesting because the range of applications that we do have here are particularly new and I do think that once this is released in the future perhaps next year it will be a game changer in a variety of fields so Google's research paper magnet does get a bit confusing so we're not going to gloss over the research paper too much it basically goes into the very fine details as to how they built this video too what you're likely going to want to see is of course the results and they do show that clearly on another cool page so that's what we're going to show you because the results are truly incredible considering we haven't seen anything like this before in the video space so when you load up macvit you're going to be presented with this page and you can see right here that this shows you pretty much everything you need to know

### Magvit Applications [1:28]

now one thing that you do find on magnvit is that there are simply a huge number of separate applications you can see on this page there are currently around eight different separate applications that you can use for magvit so what we're going to do is we're going to break down every single application for magvit because honestly a lot of these have wider applications than you might actually think and a lot of it is in video so this is actually something that is quite early on in its stages of development so video AI is definitely something that is still relative novel so that means that over time we're likely to get increases and further increases in the level of quality of this kind of software so essentially

### Panoramic Video [2:07]

what we can see here is that this is panoramic video if you don't know what a panoramic video is it's essentially a type of video that is very long in its frame maybe you have an iPhone Android but likely on your iPhone there is likely a setting in which you can use where you can take a panoramic photo and that just means you're taking a long photo of a bunch of images pushed together now looking at MacBook you can see that it states that given a small vertical shot magnific can turn it into a panoramic video by applying video outpainting multiple times on both sides so what we can see here is the first initial shot of the hot air balloon then of course we can see that magvid actually has the ability to be able to outpaint 10 times on each side providing us with a panoramic video essentially what that means is let's say for example someone took a video on their phone and of course as you know many people don't take videos horizontally because of how the phone is designed you could then transform that video using this AI to get it outpainted to get a full scale panoramic video and of course right now the quality isn't that impressive but this is the first type of outpainting that we are seeing on videos you see outputing is essentially where you use the inner edges to predict what the outer edges are going to be and that is something that recently happened with adobe's Photoshop where you were able to outpaint certain images and users were very satisfied with the quality and this is essentially a video version so what we're seeing here is one right here and another right here it definitely looks very interesting because this kind of content is going to be super useful provided in the future I mean take a look at what we actually do see here this is all we are seeing in terms of the initial footage and then this is what we get I've got to be honest with you guys that is absolutely incredible even if it is still in low resolution because as you know artificial intelligence does move pretty quickly

### Further Shots [4:08]

then of course we have the further shots and you can see right here that this is where we get out and painting on five times so rather than just out painting once and using a huge long shot this is where we do it five times and the results are quite surprising now whilst it might not be completely accurate it's definitely really good for merging videos together you can see right here that essentially what it does is it uses the video and merges it into one larger frame and this is going to be really effective for those kinds of videos where you did wish you did shoot on landscape so in order to out paint like this the AI has to use the initial frames on the inside and predict what's going on the outside and from these examples that we've seen although they are low resolution that I'm painting on five sims does look like where you have something that's cropped in 4x3 aspect ratio and then essentially your armpit

### Smart Remover [4:55]

then of course we have Smart remover so this is one of the most interesting things as well so this says given a mask video macvic can generate coherent in paintings to remove unwanted content so remember how in Photoshop sometimes you might want to input something maybe there's a hat maybe there's something on the ground that you want to Simply remove and you use a brush to get rid of it this is basically like that for video so I'm essentially figuring out that what magvit probably does is figures out what area it needs to remove and then somehow manages to use that AI to seamlessly integrate the images together and what we can see here is a pretty seamless response now of course as you know AI video is of course quite still low resolution but the applications for this are pretty interesting here we can see a bunch of eggs on a table and then of course we have this black Square going across the image which interrupts what you would otherwise see and then of course we have this final image right here which looks pretty incredible I mean I wouldn't be able to tell that there was actually a black Square over this so it just goes to show how great magnit actually is then of course we have another example here with all of these lines and it's very interesting in how it actually manages to collect these lines perfectly without any real Distortion so you can see as well right

### Other Examples [6:09]

here with this black line going across this badge it's able to coherently manage to get this as well now take a look at this because I think you'll find that this part is particularly interesting so you see where this square is on this person's tie and shirt you can see that even though magma in paints this we can also see that there is actually a shadow right here and this actually looks really well done I mean when in painting it doesn't just use the inbuilt bits outside to generate the inside I'm guessing that it must understand the context of certain images in relation to one another to be able to generate this because honestly without this first video image there's no way that the AI is going to know exactly what to put here so I'm guessing that they probably used a large amount of data to train the AIS on what goes where then of course also we have this one right here with a croissant and of course a cup of coffee and we can see that using this black box this is exactly what we see here and to be honest with you guys this does look absolutely incredible so having a black box in the way of something doesn't mean that the video is completely ruined from MacBook we can now see that you're going to be able to have a smart remover where you can simply remove objects in a video in real time and this is something that we've really wanted and it will be really interesting to see if future applications like Adobe do add this so

### Autoflip [7:31]

here we have something from magvit called autoflip This is Where magvit can uncrop a video on any sides for an easy switch of aspect ratio the way this works is essentially you have the original video with the black bars and then of course the black bars are eventually removed to present an actual video moving like this so what we can see here is that magvit easily allows users and individuals to use their original software and then of course outcrop it now what I do find interesting here is this technology going to be added to tools like runways Gen 2 or other video editing services like Adobe Premiere Pro as we do know that AI is rapidly moving in the direction of being integrated into pretty much every application that there is for increased efficiency and with this imagine you have something and it's shot in the wrong size I know many times filmmakers shoot things in the wrong aspect ratios being able to change it very quickly just with a click of a button would be incredible then of

### Image to Animation [8:29]

course we have something called image to animation so it says given a single image magnific can turn it into an animation by frame prediction optionally with action conditions so essentially what they mean here is that when you use a certain image magvit potentially understands what that image is and because it understands exactly it's going to perform the correct animation now you have to understand that this means that this is not just a click of a button thing this AI really understands what's inside the video so we know that this is a neon type of sign and as humans as you know this kind of sign doesn't blink it glows which is why we can see the difference in the opacity increase and decrease and I think that key example shows that this kind of AI software is much smarter than we do think for example with this example right here we know as well that this is of course a Christmasy winter style themed image if we wanted to animate this we know that this would be snow and this snow would be falling and of course the AI would have to understand this as well where it that this tree isn't moving but the snow is moving and the snow is not moving up sideways or left it is only moving down then of course once again we have the image of a globe here and of course with we know that this is going to rotate and honestly I'm not even sure how it manages to do this because interpolation of 3D objects onto 2D like this is absolutely incredible so this is truly interest think being able to do this I mean honestly it's quite surprising and then of course we do have the waves here and it does do a very good job at the waves here the only problem that I would say magnet suffers from is of course quality if there is going to be an artificial intelligence tool that many programs do use it would be great if there was one that could increase quickly any low resolution video into a high quality one that way when these AI models output these videos we can get them into a higher quality pretty quick now this example wasn't as

### Frame Interpolation [10:30]

good but it does show that the future applications are going to be interesting so it says given two images macvic can turn it into a stop-motion animation by frame interpolation that's essentially where you have two images and it's easily represented by this one right here we have the brown pencil over here then moved to the right and of course once we have these two images magvit is then going to combine these two images and then of course move them in a specific Direction so using what it knows about two images it can then combine these two images to create a specific video animation which would be very interesting I mean many different applications imagine a video is corrupted or imagine you just got a few images of something happening it would be very interesting if you could get that synthesized into a direct video so right here you can see someone putting a pen inside of a cup which does look not great but still it's the application of these that we are looking to work and of course this one was my favorite where it's someone pouring water into a cup and then of course removing that from a cup and I do find that the animation of the water was actually pretty decent now this was one of the best things that I did see from this and this is arguably what we should have placed at the start of the video but I think this will be the main talking point for magvit because given a single image magvit can turn it into an animation by frame prediction optionally with action conditions so essentially given a single image it can make a video from that and what we see here is that this is largely in driving dash cam videos so of course it's going to be very hard to predict other videos because absolutely anything can happen but certain styles of videos do have a certain set of preconditions that usually do follow for example when a car comes to a junction it usually slows and the AI knows that a car usually comes across and right here this is usually where you are driving and it continues to move down the road the same goes for this one and right here so it's really interesting as well because even when we look at this one of the car turning you can see that the car manages to turn onto the straight road and then generate what appears to be a new kind of road so that goes to show that future prediction I mean you have to understand that right now it's very early and for those of you who think okay this early this is terrible what is the point of even covering this understand that right now this is how good AI is remember gpt2 was released in 2019 and since then we've made huge huge gains in terms of artificial intelligence I mean in 2022 chat gbt was released and you all saw how good that was so within the next five years imagine this kind of technology is absolutely perfect let's say it's on mid-journey level future prediction from images image to animation Auto flipping changing your perspective being able to remove objects really quickly being able to add a panoramic video from a single iPhone shot this kind of stuff is going to be absolutely incredible and I think this is really cool and since this is by Google one of the things that might not be as cool but is definitely good for us is that video compression so what you're seeing is two different videos with various different compression rates and what they're basically saying is that one of them is 600 times smaller than the other but you don't actually notice a visual difference so this will be interesting because I do know that one of Google's largest expenses is video hosting so I think that they'll be Expediting this as quickly as possible because if they do manage to get this working as quickly as possible then they can seriously make a ton of money