OpenAi's New Q* (Qstar) Breakthrough Explained For Beginners (GPT- 5)

15:39

OpenAi's New Q* (Qstar) Breakthrough Explained For Beginners (GPT- 5)

TheAIGRID 24.11.2023 231 273 просмотров 2 619 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

https://theaigrid.com/open-ais-q-q-star-explained-for-beginners/ Welcome to our channel where we bring you the latest breakthroughs in AI. From deep learning to robotics, we cover it all. Our videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on our latest videos. Was there anything we missed? (For Business Enquiries) contact@theaigrid.com #LLM #Largelanguagemodel #chatgpt #AI #ArtificialIntelligence #MachineLearning #DeepLearning #NeuralNetworks #Robotics #DataScience #IntelligentSystems #Automation #TechInnovation

Оглавление (4 сегментов)

Segment 1 (00:00 - 05:00)

so this video will get into the exact specifics of how Q learning works and it's going to try and break it down in the easiest way possible so you can gain an understanding of why open ai's potential breakthrough could be the next evolution in large language models and AI models so let's waste no time and jump right in so what is Q learning and one of the main things that we want to talk about is where does the name qar come from so the name qar likely comes from two s our es okay so firstly the Q could be a reference to the Q learning which we will discuss later and essentially it's a type of machine learning used in reinforcement learning okay so that's where the queue is from I'm guessing they're trying to merge this and I'm going to talk about the second part so for the second part essentially the start comes from the AAR search there was research paper I think written in 2019 and the AAR search algorithm is a pathf finding and graph travel and the second part is essentially the star so the star comes from the AAR search and the AAR St search algorithm is a pathfinding and graph traversal algorithm which is widely used in computer science for a variety of problems especially in games and AI for finding the shortest pass between two points okay so I'm going to do that again but I'm going to show you guys in more simpler terms how exactly that works so essentially a simpler definition of q learning is essentially you can think of the name qar like a nickname for a super smart robot and then the Q part is basically like saying this robot is really good at making decisions and that it learns from his experience is just like you would learn if you played a video game a bunch of time and of course the more you play the better it gets at figuring out how to win then of course we have the simpler definition for AAR search and essentially you just need to think of it like this so imagine you're in a maze and you need to find the quickest way out there's a classic method in computer science kind of like a set of instructions that help you find the shortest pass in a maze and that is exactly what we call a star search and of course once you mix this with deep learning and then you get the computers to learn and improve from the experience you get a really smart system and it's not just finding the shortest path in the mains it can solve much more trickier Problems by finding the best Solutions just like how you might figure out the best way to beat a video game so now we're going to look at six steps to actually understanding Q learning because there are six Key Parts and they're really simple once they're broken down into these parts and overall Q learning before we get into these six parts it's basically like training a pet if the pet does something good like sitting on command you give it a treat and if it does something not so good like chewing on your shoes you say no or ignore it so that's how the basic of this reinforcement learning actually does work you reward them for the good decisions and then you penalize them for the bad decisions so step one in Q learning is the environment and the agent in Q learning you have an environment like a video game or potentially like a maze and an agent and the AI or computer program that needs to learn how to navigate this environment so that's just a basis we have the agent and then we have the environment that the agent is going to be in then of course we have the states and actions so with this is where we have the environment it's going to be made up of different states and different actions that the agent can take so essentially the agent may be able to move left or right and of course the different positions that they can take on the board or in said game which is fairly simple to understand then of course we have something called The Q table so the Q table is basically like the big cheat sheet that tells the agent what action is best to take in each state and at first this table is filled with guesses because the agent doesn't know the environment yet so of course this isn't going to have all the correct data because it doesn't have the right movements because it hasn't done it yet then of course we have step four which is learning by doing so the agent starts to explore the environment and every time it takes an action in a state it gets feedback from the environment you get rewarded for the positive points and you get penalties for the negative points so this feedback loop helps the agent update the Q table essentially learning from The Experience so it goes out it tries to figure out which way it's going to go and then of course it updates that and of course that's what we have at step five which is where you update the Q table so the Q table is going to be updated using a formula that considers the current reward and also the potential future rewards make sure you pay attention to this part because the potential future Awards is of course one of the key things that separate Q learning from many of the others okay so that this way the agent doesn't just learn to maximize the immediate rewards but also to consider the long-term consequences of its actions because think about it like this if you had an AI system which didn't think about long-term rewards every time it got a reward for doing something good it would just keep doing that same good thing over and over again and it would just kind of be like this you know spiral that wouldn't lead you to Future and long-term better goals so that's why um this algorithm is really cool because it has long-term consequences planned into it then of course we have number six which is overtime with enough exploration and learning the Q table gets more and more accurate the

Segment 2 (05:00 - 10:00)

agent becomes better at predicting which actions will yield the highest rewards in different states and eventually it can navigate the environment very effectively which is why we have this image of an AI that is pretty much a God and is able to do it in the fastest way possible so overall you can think of q-learning like playing a complex video game where over time you learn the best moves and strategies to get the highest score initially you're not going to know the best actions to take but as you play more and more you can learn from your experience and get better at the game that's what this AI is doing with Q learning it's learning from experiences to make the best decisions in different scenarios then of course we do have the most likely future of llms because one thing that I did want to add was that llms do have current limitations and that's why I do believe that qstar is currently being explored as a viable option for the future of large language models so please watch this clip from someone at Google Deep Mind who talks about how llms have these limitations and why these kinds of styles that we're starting to implement and starting to look in are going to be the future of large language models these Foundation models are World models of a kind and to do really creative um problem solving you need to start searching so if I think about something like alphago in the move 37 famous move 37 where did that come from all its data that it's seen of human games or something like that no it didn't it came from it identifying a move as being quite unlikely but know possible and then via process of search coming to understand that the that was actually a very good move so you need to you to get real creativity you need to search through spaces of possibilities and find these sort of hidden gems that's what creativity is I think current language models they don't really do that kind of a thing they really are mimicking the data they mimicking all the human Ingenuity and everything which they have seen from all this data that's coming from the internet that's originally derived from humans if you want a system that can go be truly beyond that and not just generalize in novel ways so it can you know these models can blend things they can do you know Harry Potter in the style of a Kanye West rap or something even though it's never happened they can blend things together but to do something that's truly creative that there is not just a blending of existing things that requires searching through a space of possibilities and finding these hidden gems that that are sort of the hidden away in there somewhere and that requires search so I don't think we'll see systems that truly step beyond their training data until we have powerful search in the process so in this part of the video I do want to talk about some of the limitations of large language models because there are quite a few so one of the biggest things that you didn't know about llms is that and we're going to get into the benefits of Q learning and why Q learning and how it compares to llms and one of the biggest things is of course the data dependency so traditional llms require massive amounts of data for training they learn from examples in this data which means they're knowledge and abilities are limited to what present in the training set there was even a uh a recent paper I can't find it if I do find it I will leave a link in the description because it's going to be one entire article on the website and essentially in that pbut they talk about how large language models cannot generalize on training cannot generalize on information that they haven't seen in their training data which basically just means that these large language models are only good as their training data and essentially we've explored this concept before with Microsoft's 51 essentially it was a very small model and it was able to do coding much better than some of the large language models and it was trained on only specific coding stuff and it was able to excel at that and basically what this means is that if you don't have good data your llm is going to do horrible but if you have good data it's going to do really good but of course that comes with some other limitations okay of course we have static knowledge okay so static knowledge is once trained LMS have a fixed knowledge base so they can't learn or update their knowledge after training which means they can become outdated as the World changes so of course you can see here uh knowledge cut off September 2023 and that means that currently it can't get any more data because we're now in November not sure when you're watching this but if opening eye doesn't decide to update it means you're going to be stuck without that new update so static knowledge isn't entirely great because as you know things change day by day every second every minute the world is changing and if these AI algorithms are going to be really good they need to be able to adapt rapidly to that changing world so that's why traditional llms this is of course a bottleneck SL limitation then we have context understanding while they're good at understanding and generating humanlike text they sometimes struggle with understanding the deeper context or intent behind the query especially if it's complex or very specific and that is something that happens when you're dealing with llms in addition we do have bi bias and fairness which is something that is really prevalent in Ai and essentially the problem is that the bottleneck is the data so when you have data on an llm and you train it on

Segment 3 (10:00 - 15:00)

that specific data set it's going to be geared to that data set so for example if you train it on data that only shows it a certain type of car and every time it's seen that car it thinks the car is orange because it's only ever seen that car in the orange color it's going to be really hard to get that AI model to think of the car in any other color so just think of that in that kind of bias so the AI systems can have two kinds of biases the coordinative biases and the lack of complete data so if data isn't complete it's not going to be representative so like I said if you don't have all the colors of the car then it's not going to be representative of all the colors that it could potentially have and of course there are cognitive biases which are things that could seep into the machine learning algorithms via the designers unknowingly introducing them to the model or are training data set which introduces those biases so bias is something that is really hard but um that you know it's really a big problem and it's something that people are trying to solve by making llms as unbiased as possible but it's not something that is easy to solve of course we have the lack of adaptation which I've already discussed and of course now we need to get into the pros of Q learning or qar which could be GPT 5 so of course we have Dynamic learning so Q learning can continuously learn and adapt based on new data or interactions that means it can update its knowledge and strategies over time staying more and more relevant which is of course what we talked about before something that we're going to need to do then of course we have the optimization of decisions learning is always about finding the best decisions to achieve a goal which can lead to more effective and efficient decisions making processes in various applications and with Q learning that's clearly what it's going to be able to do over time then of course this is the main thing about Q learning is the fact that we have specific goal achievement and Q learning models are goal orientated making them suitable for tasks where a clear objective needs to be achieves unlike the general purpose of traditional llms so essentially the reason that this is going to be really good is because when you apply this to other things that require goals so for example maybe we could apply it to self-driving AI agents on computers that are actually going to be able to have a complete end goal maybe the end goal is going to be a video an entire article maybe the end goal is going to be building an entire business that's where we have the specific goal um achievement that's where you get that next um leap up inability in AI systems which is why this could be really next level then of course we have something about the systems in which companies are already on this so on the 28th of June 2023 Mr sais says that the company is working on a system called Gemini if you haven't heard of Gemini before it's Google's next huge large language Model A company that is going SL predicted to be beating GPT 4 across all benchmarks and it's going to be using a method called treesearch which is going to be able to explore and remember possible scenarios which is quite similar to q-learning so they're moving away from the standard methods and they're now trying to think about Advanced Techniques where they can essentially explore and remember multiple different things now if you find that a bit confusing you should take a look at alphao and how it's going to impact the future of AI because alphao was essentially something that researchers thought that they couldn't predict because essentially with AI what the problem was that the moves on Alpha go were essentially uncomputable they couldn't just like remember every single move they had to get the system to think and essentially the problem was that there are more alphao moves than there are I think at in the universe or grains of sand on the beach it's something absolutely crazy when you look at the statistics so this was something that researchers thought that they were never going to solve but of course the AI managed to solve it so I would say take a look at the quick trailer which I'm going to show you guys um I don't want to spoil it for you it is honestly riveting to see this kind of content but it was something that happened a while back that people do forget if they aren't particularly plugged into the AI space standing challenge of artificial intelligence everything we've ever tried in AI just Falls over when you try the game of Go the number of possible configurations of the board is more than the number of atoms in the universe Alp go found a way to learn how to play go so far Alpha go has beaten every challenge we've given it but we won't know its true strength until we play somebody who is at the top of the world likely sop then of course we had the significance of move 37 was uh I think it was one in a 10,000 move that nobody expected from an AI where it seemed to exhibit bit some creativity and many people weren't expecting this so there's also multiple videos um where they talk about move 37 which was something that of course we didn't expect and of course this brings us to this point Google delays the release of Gemini ai2 q1 of 2024 so this might be in response to the fact that it might be harder than they think maybe they're changing their angle maybe they just want to perfect it currently we don't know what the reason is for them delaying this model but what we do know is that Gemini is going to be current ly delayed and if this model

Segment 4 (15:00 - 15:00)

does come out and it does possess these capabilities it will be interesting to see how it compares to gbt 4 and if it's going to be similar to Q learning or how different it's going to be of course we have one of the main questions and that is will it be in gbt 5 many sources have already shown us that Sam outman has already started training the next level in llms or AI systems so will GPT 5 contain this Q star or is it just going to be something that is in future models like G pt6 either way it's going to be interesting to see how this entire thing pans out and if this video did help you understand don't forget to leave a like subscribe all that good stuff check out the full article in the comment section below

Другие видео автора — TheAIGRID

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник