# OpenAI Employee ACCIDENTALLY REVEALS Q* Details! (Open AI Q*)

## Метаданные

- **Канал:** TheAIGRID
- **YouTube:** https://www.youtube.com/watch?v=rYH381pZIBc
- **Дата:** 03.04.2024
- **Длительность:** 13:37
- **Просмотры:** 56,542

## Описание

https://www.reddit.com/r/singularity/comments/1bqqcwk/openai_planning_expert_noam_brown_tweeted_this/
How To Not Be Replaced By AGI https://youtu.be/AiDR2aMye5M
Stay Up To Date With AI Job Market - https://www.youtube.com/@UCSPkiRjFYpz-8DY-aF_1wRg 
AI Tutorials - https://www.youtube.com/@TheAIGRIDAcademy/ 

🐤 Follow Me on Twitter https://twitter.com/TheAiGrid
🌐 Checkout My website - https://theaigrid.com/

Links From Todays Video:


Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos.

Was there anything i missed?

(For Business Enquiries)  contact@theaigrid.com

#LLM #Largelanguagemodel #chatgpt
#AI
#ArtificialIntelligence
#MachineLearning
#DeepLearning
#NeuralNetworks
#Robotics
#DataScience

## Содержание

### [0:00](https://www.youtube.com/watch?v=rYH381pZIBc) Segment 1 (00:00 - 05:00)

so there was a recent tweet that was actually deleted from someone that works at openai that has got people a little bit rattled and it's got the community wondering why the Tweet was deleted in the first place and if this tweet actually relates to Open Eyes Infamous qstar model which they refuse to talk about so this video is going to include some speculation but there are several key points that I do want to make which shows why this tweet could be rather interesting and of course everything will be linked in the description so essentially we have a tweet here that is from noan Brown and he is a prominent figure in the field of artificial intelligence known for his contributions to developing AI systems capable of playing poker at superhuman level and his work has significantly Advanced standing and capabilities of AI in imperfect information games a category that includes not just poker but potentially extends to real world applications like negotiation cyber security and even strategic decision Mak and he's done really well for him now he currently works at openi and what's absolutely crazy was this tweet right here so he stated that you don't get superhuman performance by doing better imitation learning on a human data and essentially right here he's stating something very quickly and then he deleted the Tweet now what he could be referring to here is a variety of different things but the main speculation that many people have come up with is the fact that he's talking about you know the planning model which open is working on that is allegedly qar now in addition there are some additional tweets that you might want to see from noran Brown so we can see right here that this is the earlier tweets from 2023 when he was talking about joining open AI he says I'm thrilled to share that I've joined open aai for years I've researched Ai selfplay and re ing in games like poker and diplomacy I'll now investigate how to make these methods truly General if successful and take a look at this we may one day see llms that are a thousand times better than GPT 4 and that is a crazy statement but there is a very interesting clip where he actually talks about why this is not only feasible but very fascinating so he also adds on to the Tweet he says in 2016 alphao beat Lisa doll in a milestone for AI but key to that was the AI ability to ponder for 1 minute before each move how much did that improve it for alphao Z it's the equivalent of scaling pre-training by 100,000x and we can see that this really did increase the ability of it you know by 100,000 times in terms of the equivalent scaling pre-training which is very incredible the point here is that what he's talking about is the ability to get more out of a model in a more efficient way and this is something that is not really widely discussed and he also says all those prior methods are specific to the game but if we can discover a general version the benefits could be huge yes inference may be a thousand times slower and more costly for what inference cost would we pay for a new cancer drug or proof of the rean hypothesis improved capabilities are always risky but if this research succeeds it could be valuable for Safety Research as well imagine being able to spend a million dollar on inference see what a more capable Future model might look like it would give us a warning that we otherwise would lack and he Dives further on into this basically stating that look in certain tasks you don't need the model to respond almost instantly what you do prefer is you do prefer accuracy over speed and in order to give the model more time to think which is something that we've seen in recent research papers such as quiet star where llms were given times to have an internal monologue and it improved them by quite a bit it is something that of course could essentially be applied here now something that he does talk about is of course this he expands on this and I'm going to show you all a clip from an interview where he dives into this concept A lot further and remember this is the guy that is working on planning and many people suspect that this is one of the lead researchers that are working on qar planning in these in these games so I mentioned you know in go if you add planning it increases it's like the equivalent of increasing your model and training by 100,000x and the same thing is true in poker if you add search in poker it's the equivalent of increasing the model size and training by 100,000x and I think you can do something similar in language models it's not really clear how you do it yet but I think that there is an opportunity there and I think this is really important because you know if you look at the cost of how of know how much it costs to train these language models today it's really expensive um and we're

### [5:00](https://www.youtube.com/watch?v=rYH381pZIBc&t=300s) Segment 2 (05:00 - 10:00)

going to see it scale up I'm sure the models are going to get bigger and trained for longer you're not going to be able to scale them up by 100,000x at least for the foreseeable future and in a AI Paradigm where scale is the key thing there's a question of like okay well if you can't scale up the model during pre-training Beyond a certain point then how do you scale it up further and I think the answer is you scale up the amount of inference cost the language models typically these days they act they respond very quickly um if you ask it a question it can give you an answer in V seconds or you know maybe seconds at most um but for a lot of applications you don't need a response immediately you can wait a minute or an hour or even a week sometimes to get a response um you know you can think of like uh if you ask the model to write a contract for you uh a legal contract you don't need that in 5 seconds you can wait like a minute for a really high quality answer um and you know thinking like if you wanted to write a novel um that could take a full week but I think you know if it's if it's and obviously the inference cost would be a lot higher in that case but if it's something like running the next Harry Potter then that seems totally worth it to spend you know a thousand times the inference cost um or to find a new life-saving drug or to prove the reman hypothesis or any of these things um there's a lot of applications where it would be worth it to spend orders of magnitude more on inference in order to achieve the equivalent output that you would get from a model that's orders of magnitude bigger so that is a smaller clip from a larger interview and it's important to note that this interview was in 2023 so these thoughts and theories aren't really recent we're now in 2024 and of course since then we don't really know how much his thoughts have progressed in regards to planning since the amount of research that you know open AI may have done um it could have changed in that time and for a quick refresher on qar this is the essentially details of course opening ey made a breakthrough before Sam went firing stoking excitement and concern and with qar there were two main Concepts and the qar the Breakthrough actually also did speak about synthetic data and I'm going to link this back in a moment but it says sugers breakthrough allowed open AI to overcome limitations on obtaining enough high quality data to train new models according to the person with knowledge a major obstacle for developing Next Generation models and the research involved using computer generated rather than real world data like text or images pulled from the internet to train new models so essentially right here it talks about synthetic data which is data generated by the AI itself and there are a variety of different ways you could do this and that's why with this tweet from noan Brown the thing of him saying you don't get superhuman performance by doing better imitation learning on human data many are now speculating that this links back to qar because of course qar was something about planning and of course it was something about synthetic data so this could have been a tweet regarding that now of course the qar Breakthrough there are several pieces of information talking about how opening ey have likely solved planning SL agentic behavior for small scale models and this is something that we've seen happen a lot recently in the industry we're seeing a lot of agents and this is pretty much the year of agentic AI and we're about to see a lot more models be able to plan and reason but there is also some other important you know points about how these models are going to be trained and how this kind of data even works so one of the things that even confirms some of this data is of course yanen and I did cover this in a previous video before but he actually does talk about how you know um one of the main challenges to improve llm reliability is to replace autoaggressive token prediction with planning pretty much every tap top lab Fair deep mind opening eye is working on that and some of us have already published and seen the ideas and results it is likely that qstar is opening eyes attempt at planning they pretty much hired noan Brown of li/ pooker to work on that and of course um this is something that is pretty much true because all the top Labs I've looked at so many different research papers different articles they all pretty much working on planning for this year now what's also fascinating about this is that if you've been paying attention to the space maybe you have seen some demos maybe you haven't but there have been several recent Demos in which we've seen AI systems being able to plan and being able to do these kind of things are making them orders of magnitude more effective for example this is mesa's kpu this is their AI agentic system which is very effective at planning and doing tasks you can see that currently it is being

### [10:00](https://www.youtube.com/watch?v=rYH381pZIBc&t=600s) Segment 3 (10:00 - 13:00)

able to reason and this is a system that is built on top of the GPT 4 stack and it is able to reason very effective now you can see the different reasoning steps that it takes right here and you can also see that just like noran brown talked about he spoke about how with these kind of AI systems if we have an AI system that is you know the inference is a lot slower and the inference does cost a lot higher even though that is the case we do get a higher degree of accuracy and we can see that with these models reasoning in multi-step fashion we can see that they're able to increase their ability to reduce hallucinations and perform the tasks more effectively and this Benchmark that I saw on the Mesa kpu maybe you didn't see this cuz this didn't blow up that much but it was very very surprising and not only did we demos of this and I did an entire video explaining mesa's kpu and how the entire demo did work and some of the benchmarks that I'll likely add to the end of this video but there was also as you all know open not open Devon but there was also Devon which is of course the world's first AI software engineer and of course like with what we just saw with mice's kpu you can see that Devon also has some internal scratch Pad where he is essentially allowed to plan we can see he's got the planner right here and we can see that he you know initially after gaining a prompt he manages to flesh out some kind of plan and then all he has to do is go through this plan and then execute and write the code and this is once again built on top of the GPT for stack now why is this all very fascinating number one we have to remember that this is something that is still in its early phases number two we know the opening eye is already workus and of course number three we know that if openi are working on something like this and they already are using a model like GPT 5 we know that is going to be absolutely incredible now I'm not sure if GPT 5 is going to be natively agentic which just means that it's going to have these kind of planning capabilities natively built into it or there's going to be separate versions of the GPT series which is more than likely considering iterative deployment but the point is that things are about to get really crazy when we do see Q star like systems that do involve planning or systems that are able to do multi-step reasoning multi-step thinking and multi-step planning to achieve long-term goals so this would be something that is rather fascinating and I genuinely can't wait to see how some of this uh does work so there are many different concepts to explore here but let me know what you think about this tweet do you think that this is a reference to qstar why do you think that this tweet was deleted could it just have been something that is you know not that much of a big deal or was it something to do with qar and noan brown accidentally realized that he's talking about synthetic data and maybe something that he shouldn't have either way I don't know but there is a lot of speculation in the Reddit thread below that I'll leave a link to and there are a lot of few comments that do link to of course planning and being able to use an AI system in order to achieve certain goals that are much more effective and this is something that we've seen time and time again in AI so it will be interesting to see where things go and if you did enjoy the video I'll see you all in the next AI update

---
*Источник: https://ekstraktznaniy.ru/video/14413*