# Major AI BREAKTHROUGH!, Meta AI, VideoGPT, GPT Vision HUGE AI NEWS #17

## Метаданные

- **Канал:** TheAIGRID
- **YouTube:** https://www.youtube.com/watch?v=f329TaTe07E
- **Дата:** 30.09.2023
- **Длительность:** 15:57
- **Просмотры:** 23,955
- **Источник:** https://ekstraktznaniy.ru/video/14727

## Описание

Welcome to our channel where we bring you the latest breakthroughs in AI. From deep learning to robotics, we cover it all. Our videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on our latest videos.

https://twitter.com/_akhaliq/status/1704964635957280847 
https://twitter.com/_akhaliq/status/1706840971177066703 
https://twitter.com/hellokillian/status/1706771425909170600 
https://twitter.com/skirano/status/1706874309124194707 
https://twitter.com/nickfloats/status/1707044819518792147 
https://twitter.com/emollick/status/1707076651320770870 
https://twitter.com/skirano/status/1706823089487491469 
https://www.youtube.com/watch?v=SHPxcRBlXN0 
https://www.unite.ai/generative-ai-in-finance-fingpt-bloomberggpt-beyond/ 
https://huggingface.co/papers/2308.04079 
https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/ 
https://www.youtube.com/watch?v=LKCJwLEyruc 
https://www.y

## Транскрипт

### Segment 1 (00:00 - 05:00) []

this week in AI was absolutely incredible there was a major breakthrough that many people haven't picked up on and it is about to change many different Industries and from that to chat GPT to doly 3 there are many things that happen this week that will fundamentally change the future of not only AI but some other technological developments that you will likely see within the coming years so this is one of the things that has recently changed the game in terms of a major breakthrough in terms of computer vision this is 3D Goan splatting for realtime Regents field rendering so in this research paper they outline a new more efficient method of generating these things called Nerfs and they use a different technique which is like Nerfs but is far more efficient can run at higher frames per second and is just pretty crazy so on screen right now what you're seeing is real-time footage of essentially photography scans that someone has taken you know someone's taken several pictures of and then they've inputed into this software and now we have this as 3D data the reason this is pretty gamechanging is because if you look at this with past examples you can see on this page right here that the past examples such as Nerf and the others ones as instant NGP just aren't as high quality and you know there's going to be a video clip from some other YouTubers where they just show you how crazy this jump is because they did a real demo and um they essentially showcased just how crazy this technology is now I can only do so much because I'm only using images and of course it is 3D and you're watching this video but I promise you when I tell you that this is pretty crazy okay and the reason I've included this in today's video is because I think in terms of computer vision it is a very important part of the technological landscape especially if we are talking about AI because one thing about AI is that there are small and different parts that can be interwined with other Technologies such as Robotics and with computer vision so imagine combining this software with something like a large language model that is specifically made for this it would definitely be very interesting to see what can happen now many people are thinking about the possibilities in terms of gaming because this could make photorealistic games really possible since the frame rate is extremely high on your line plugin you got all the details here too can you go closer to the edges though that's where I think if you were to do close-up shots of this we might have some trouble yeah right cuz you can see the artifacts around there let's try we've been promised by the guys who released the paper of gosan Splats that the quality is higher than Nerfs are you ready one two three move around it dude what dude man look at it dude I have every single angle I feel like Superman look at that dude I can't I think we already achieved perfect so overall based on the demos and examples that you've seen hopefully you can gain an understanding of what this is and why this is pretty crazy I do think that if this is something that does come to video games or other applications it will be very interesting to see how this is applied and I do think that when I look at the footage that I am seeing it is really indistinguishable from actual pictures the next thing that we have here is of course Tesla AI now if you don't know Tesla is the same car company that now produces robots now here's the thing they've been working on the this for just over 2 years or maybe even more years than that but we're starting to see rapid development in not only their AI but their humanoid robot so what we're seeing here is Tesla's Optimus now what's crazy about this robot is that you can see that its neural net runs entirely on board using only Vision which is crazy okay that point in itself is something that a lot of companies are striving to do everybody knows the GPT 4 and chat GPT 3. 5s their servers are essentially really huge and their running cost was so much so that they not only had to dumb down GPT 4 but they also had to make sure that GPT 4 was only limited to around you know like 25 messages every 3 hours okay because the amount of people requesting that service was too much and of course the compute was just too much in terms of the cost for open AI now if you can get an AI large language model whatever sort of model that you are running on an actual robot that can work with the robot and use the vision to essentially convert that into robot actions then you have a real winner and that is what we have with Tesla Optimus we can see this robot managing to sort items all autonomously and autonomous behavior is what is the next step in terms of AI Control because it allows the robot to do stuff without human intervention you know the need for constant updates for constant interference um without pre-programming many of the stuff now many of you might say this is not anything new this isn't crazy but trust me when I tell you it actually is you see the Nuance in difference between robots from now and robots before is that many robots that could do stuff were always program to do that they weren't thinking they weren't autonomous they weren't not in a sense alive but they weren't actually doing the task they were just doing a programed preset of actions but these robots are autonomous they can react to changes they aren't like the factory ones where a specific thing is always going to be there this is why this is truly Innovative and of course as you know this is just the beginning so that means that you know in the next 10 20 30 even 40 years I can't imagine what these robots going to look like especially since we do know that not only are Tesla investors super long long-term holders

### Segment 2 (05:00 - 10:00) [5:00]

but there are billions and billions of dollars now flowing into the AI industry and of course robotic because people are predicting that these industries are going to be worth trillions and trillions of dollars so it will be interesting to see how these robots are going to be in terms of where they head and also you can see that the Tesla bot does actually have really sturdy balance which is something that is very important because robots as you know do tend to fall over quite some time now the reason I find this very interesting is not only the fact that was Elon Musk being ridiculed for his previous performance perces in terms of the tetla Optimus not being ready it's shocking to see that they managed to combine this with AI so quickly based on what we've seen so far and of course now adding Vision then of course we had meta produce a whole host of many different things related to AI there is so much stuff that I'm not sure if I'm going to include it in this entire video but what you're about to see is meta collaborating with Rayban and they've essentially created something that is very futuristic and not only that the price of this product is relatively cheap to the point where your or me could actually purchase this item so take a look at this I think this is really cool and not only is this just a standard product is also one of the new AI driven products we are starting to see a whole wave of not just AI driven applications and software but physical products that now have LMS on board including vision and of course conversation being slowly rolled out into our environment I wonder if these products are going to be just a fed or they're actually going to be something that become part of our future in terms of what we use on a day-to-day basis so take a look the next generation of Rayban meta smart glasses these are the first smart glasses that are built in shipping with meta AI in them starting in the US you're going to get this state-of-the-art AI that you can interact with handsfree wherever you go we're going to be issuing a free software update to the glasses that makes them multimodal so the glasses are going to be able to understand what you're looking at when you ask them questions so if you want to know what the building is that you're standing in front of um or if you want to translate a sign that's in front of you to know what it's saying um or if you need help fixing this sad leaky faucet um you can basically just you talk to meta Ai and look at it and it'll walk you through it step by step how to do it um we built in one more feature into these smart classes you are going to be able to live stream to your friends and followers from your glasses everybody is ready to rise and I am getting ready to Let's Go switching to glasses being able to share you what you're doing live with your friends and followers while stay completely in the moment is the kind of thing that you can only do on smart glasses so all right these Rayban meta smart glasses we're launching them on October 17th I'm starting at 29 9 and I'm really looking forward to seeing what you all think of them and of course in The Meta connect keynote speech by Mark Zuckerberg we had some more information on meta's AI chat Bots now I think it was interesting the kind of opportunity that he was going at here because it just shows that he's actually pretty decent at this strategy Mark Zuckerberg knows that chbt is already the leader and probably going to stay there for quite some time so what he's opted to do is he's opted to take Aim and fire some shorts at character AI which is pointed out in another video by AI explained so essentially what we have here is Mark zuberg having AI chat Bots that contain the personalities of celebrities so already leveraging the platform's fame with Instagram and combining that with open source LGE language models potentially like llama 2 depending on what they're running on he's carved out this unique opportunity where users are going to be interacting with these celebrities in this AI chatbot style now this has already proven to be quite popular already with companies like character AI being valued at $5 billion and those are just random characters from around the internet from ones that the platform have created to user generated ones so with that being said take a look at this cuz I personally wouldn't use this because you know I would rather just ask chat GPT or figure out the information myself but I do think that this is going to have somewhat of a large appeal to maybe a niche audience or maybe a very large audience and I think that this is definitely showing that meta isn't one of those companies to take lightly then we have from image to live website using gbt 4 vision and repli in less than a minute so this is pretty crazy because as you know gbt 4 Vision has only been around for about a couple of days now and it's already showing what the possibilities are so this is just like the start and I mean this is the thing okay like people aren't really understanding what Vision enables this large language model to do I mean you have to understand that this large language model is as smart as some of the smartest Minds that have ever lefted according to IQ tests but the crazy thing is that now combined with vision you can do many different things now I was watching a video where the possibility of recursive self-improvement was discussed so for example let say for example so for example let's say GPT 4 in this example

### Segment 3 (10:00 - 15:00) [10:00]

that you're currently watching on screen you get it to design a website based on an image it writes the code then of course it can look at that image and then make a judgment call on whether this actually looks like what it was designed to do in the first place and not only that there are going to be a ton of different applications that are going to be built on top of GPT 4 Vision so rather than people just using it to realize if their egg is well cooked or not you know you can literally use it to build a various different applications and software it's just really surprising because now we're at the stage where you know we've moved from just large language models to large language models that are going to have vision and here's the thing as well is that now that GPT 4 has vision and you can literally create a website in seconds from just taking a picture at it we're going to see other competitors also move into this space too so if we take a look at this example right here you can see that GPT 4 with a vision is able to decipher some text that is written in poorly overdone handwriting it's pretty old and you know some people may not be able to decipher that but with gbt 4 with a vision it's able to do that pretty well so you know the applications are just pretty crazy and of course we talked about you know Healthcare applications where gb4 div Visions if we get one that is trained on you know a specific number of health issues for example certain spots certain rashes certain things that only a large language model or large Vision model or an AI that has seen millions of images is largely going to be more accurate than a doctor that maybe only seen a few hundred or maybe a few thousand one of the last ones of gbt 4 Vision that I do want to show you is this example right here and I want to show you this because it doesn't scare me but it is very intriguing it did scare me at first because there was one account on Twitter that I saw that did actually showcase as to why this is pretty crazy you see we always talk about how AI is not that thing it's just a bunch of code that is outputting some basic code and now we can see that this is a large language model that can clearly understand advanced concepts that are embedded in certain images okay so you can see that it says right here I'm glad we all agree oh aha so it says that this image portrays the concept of group dynamics and perspectives and it really does understand exactly what is going on here and I do think that you know if it was just a vision Transformer like it was just something that looks at an image and says what it is it would have said that you know this is just an image with some characters on screen but it shows okay that it clearly understands exactly what is on each panel and it describes it in detail it says the last panel shows that after some discussion or thought all have come to a consensus or shared understanding and The Envision the same shape and the caption reaffirms I'm glad we all agree so I do think that while some people have said that these AIS aren't sent and they aren't going to kill us or whatever that is quite valid but I do think that these AIS are definitely smarter than they do presume and we aren't being told the full picture in terms of their full capabilities because something that is able to understand these Concepts and then present it in such a clear manner is definitely something that is smarter than we do think then of course this is absolutely crazy I don't know why anyone hasn't talked about this but this okay I think this is gamechanging okay so pay attention all right this is what we call video director GPT consistent multi- scene video Direction via large language model guided planning so it says although recent text video generation methods have seen significant advancements most of these Works focus on producing short video clips of single event with single background meanwhile large language models have demonstrated their capability in generating layouts and programs to control Downstream visual modules such as image generation models this raises an important question can we leverage the knowledge in these llms for consistent long video generation so forget the Jon basically what they're saying is can we use large language models to perfect text to video and essentially what they've done is they've input a scene that is done from a text to video model then they use the large language model I'm guessing with a vision Transformer to analyze the scene and see exactly what it is then they described it so it's multi- sentence the multic scene video so they've explained all the scenes and then of course okay what they do is they use that large language model to re create that video or to make that video properly so you can see that it says fails to keep Mouse throughout all scenes and then video GPT RS the mouse looks consistent through all scenes so essentially they have an AI that essentially gets the data from the video and then ensures that this video is correct and I think this is the right step that we need to move in because of course text a video you know it's great but this is where it is guys so I think text a video in the next couple of months is going to have a huge update to where it's going to be much more accurate and this is why I it's a huge breakthrough because I haven't really seen many people talk about this because of course text video isn't getting that much information but you can see model scope text video is very different to video director GPT okay um and I think that once these Vision Transformers and maybe even with chat GPT because chat GPT could be something you know I guess with doly 3 maybe you could get doly 3 and chat GPT maybe combine it with some of things I'm pretty sure as long as chat TBT can create images with Dolly 3 it can definitely create videos as long as it can continually analyze those images I'm sure it could create videos that are of

### Segment 4 (15:00 - 15:00) [15:00]

a high quality so um that is going to be pretty insane and I cannot wait for the more advancements in that then of course we had General AI do replay an AI model that can generate stunning videos from text and I think it is really good because not only is it smooth the videos do look a lot better than some of their counterparts now of course like we just talked about video GPT along with some other models that are going to be developed is going to be insane because here's what happens once someone releases a research paper essentially these other companies they start to get to work they start to build on that research paper and applications that you can use so I think this week has been absolutely insane there of course is some things that we did Miss if you do want us to do a newsletter where we can give you very best the very best news and just dumb it down or essentially just explain it in as much clarity as possible because lots of the time when you are looking at this stuff there's a lot of technical jargon that does get in the way of you understanding exactly what is being done here so with that being said if you enjoyed this video we'll see you in the next one