Googles 'MultiModal' AI Gato Continues To To SHAKE UP The Industry

9:29

Googles 'MultiModal' AI Gato Continues To To SHAKE UP The Industry

TheAIGRID 28.06.2023 8 766 просмотров 168 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

Googles 'MultiModal' AI Gato Continues To To SHAKE UP The Industry https://www.deepmind.com/publications/a-generalist-agent Welcome to our channel where we bring you the latest breakthroughs in AI. From deep learning to robotics, we cover it all. Our videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on our latest videos. Was there anything we missed? (For Business Enquiries) contact@theaigrid.com #LLM #Largelanguagemodel #chatgpt #AI #ArtificialIntelligence #MachineLearning #DeepLearning #NeuralNetworks #Robotics #DataScience #IntelligentSystems #Automation #TechInnovation

Оглавление (2 сегментов)

Segment 1 (00:00 - 05:00)

in this video we need to discuss a research paper that was essentially released last year but it was one of those research papers that since the rise of AI has been somewhat forgotten now up until recently there was not really the mention of multimodal AI models but as you do know there are certain companies and research teams out there that do try to push the needle on what we do know and what we think is capable now one research team that has consistently pushed the needle in terms of what is possible is Google's Deep Mind and if you're not familiar with this team well let me just gloss over some of their accomplishments now you might not be familiar with deep mind but they are a division of Google a specific research team that do constantly produce new research papers new studies that showcase just how far we can go with artificial intelligence now deepmind on mainly noteworthy for two main projects amongst the other countless other the research papers that they've done the first being outfold which can accurately predict 3D models of protein structures and is accelerating research in nearly every field of biology and of course alphagome which was the first computer program to defeat a professional human go player the first to defeat a go world champion and is arguably the strongest gold player in history now if you're wondering why this research team is being highly regarded just because of a computer that was able to be a human understand that go is a board game with simple rules but an incredibly large number of possible moves and configurations to boil things down the number of possible positions in go is estimated to be more than the number of atoms in the universe which makes it difficult for computers to evaluate and to choose the best news configurations of the board is more than the number of atoms in the universe alphago found a way to learn how to play go so far alphago has beaten every challenge we've given it but we won't know its true strength until we play somebody who is at the top of the world slightly stood on and what was crazy about deepmind's alphago is that the moves sometimes appeared unconventional and surprising to human players which made people think that this AI behaved as if it was thinking and created new strategies that hadn't been considered before of course that just glosses over deep Minds history but what this video is about is one of deepmind's paper that like we stated before was released last year now if you're wondering why we're deciding to cover this it's because this framework was recently used in a recent project called Robocat now what this paper is it's called gato and essentially it's quite simply a mini AGI or one of the first glances at what an AGI system could look like in its very early stages so in the abstract Deep Mind state that inspired by progress in large-scale language modeling we apply a similar approach towards building a single generalist agent beyond the realm of text outputs the agent which we refer to as gato works as a multimodal multi-task multi-embodiment generalist polis the same network can play Atari caption images chat stack blocks with real robot arms and much more deciding based on its context whether to Output text joint talks button presses or other tokens in this report we describe the model and the data and document the current capabilities of gates as many of you know chat GPT has taken the World by storm and in doing so it's kind of overshadowed some of the other AI models that were released slash being researched and this is one of the Frameworks that I really do think is interesting and worth covering now when there's much more of an AI crowd so essentially gato is an AR model that is completely multimodal now for those of you who don't know what that means that just essentially means that it can do more than chat gbt you see chat gbt is simply a text based AI that can generate long pieces of coherent text based on a single or small user prompt but with gato essentially what you have is you have varying different outputs based on the user's input and this means it can handle many different modalities like it stated before a year after this paper was released there have been a lot more interesting multimodal AIS that have been worked on such as Microsoft's visual chat GPT and Microsoft's Jarvis which was very interesting because it essentially was a multimodal AI but moving on from Jarvis if we look at gato the possibilities here are truly incredible you see what makes gato different from other AIS such as Microsoft's visual chat GPT and other AIS that are additional generally multimodal including images video and text is that gato can be applied to the physical world which means that this

Segment 2 (05:00 - 09:00)

kind of AI system if developed more so can have real world implications so we're going to cover some of gato's most impressive capabilities number one is going to be the ability to caption images now as you may know before gpt4 did talk about how they're going to release their multimodal features later on sometime in the year we aren't sure when this is scheduled for but we can predict that at least by the end of the year this should be released and we do know that this does include the ability to describe images now remember this paper was released in 2022 but what we can see here is a bunch of images tend to be exact the image right here that we can see says three different captions under each different image so it states that the colorful ceramic toys are on the living room floor a living room with three different color deposits on the floor and a room with a long red rug a TV and some pictures there are also some other images right here and I do think that this is very interesting because further on in the paper we do get to instances where this AI can do things which we really didn't think it was going to be able to do you can also see that along these images the captions are pretty accurate a bearded man holding a plate of food a man holding a banana to take a picture of it a man holding a slice of cake then we have a big truck in the middle of the road then we have a baseball player pitching a ball on top of a baseball field then we have a surfer riding away from the ocean a surfer with a wetsuit riding a wave then we have a white horse with a white and blue silver bride and it's interesting to see how this AI handle these problems now it's important to note here although some of you may think that this image captioning isn't that great you have to understand that in this research paper they specifically decided not to cherry pick any responses because they wanted to showcase the true capabilities Now understand that with reinforcement learning with human feedback this kind of AI could get incredibly better for example when it outputs all three of these captions a human could then select the best one and give the artificial intelligence a token that tells it this was a good choice and that's how over time AI systems become better and better then of course we have an interesting feature and to be honest I'm not entirely sure why gato wasn't released by Google or developed into a full-fledged product what we can see here is this is the chat functionality by gato so this is called chit chat with gato dialogues with gato when it is prompted to be a chatbot usually Gator replies with a relevant response but is often superficial or factually incorrect which could be improved with further scaling and we use the same prompt in another page so I find the conversational tone of this to be very interesting because it starts with a user saying explaining the basics of a black holes to me and the artificial intelligence state so I guess I'm trying to answer this simple question what's a black hole the user comments back indeed please go ahead and then it gives a pretty decent explanation of a black hole but of course we do see some instances where there are some hallucinations we can see that when it's stated what is the capital of France they say Marseille which of course isn't the capital of France now although you might be thinking okay this is an AI that isn't that particularly impressive on any given task why are people talking about this AI when there are far better AIS in certain categories well you have to understand that this kind of AI model doesn't use a large number of parameters such as chat GPT this model only used around 1. 2 billion parameters in the case of gates so this AI what you have to understand although it might lack complete depth it can do a wider range of tasks and it also can do a strange tasks for example playing video games like Atari which is not something that other AIS can do and remember the goal here for this kind of paper was to create something that could handle pretty much any task needed and whilst you might think that these are just simple experiments and research driven studies this is not the case you see deep Minds Robocat which was released a couple of days ago proves that gato's framework can be used in the real world applications that many people may need which goes to show that once these AI models do have a base that eventually over time we're going to be able to build upon these large multimodal models and Implement them into the real world

Другие видео автора — TheAIGRID

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник