# OpenAI's NEW QStar Was Just LEAKED! (Self Improving AI) - Project STRAWBERRY

## Метаданные

- **Канал:** TheAIGRID
- **YouTube:** https://www.youtube.com/watch?v=kDGPGDN5Suo
- **Дата:** 14.07.2024
- **Длительность:** 23:55
- **Просмотры:** 34,382

## Описание

Learn A.I With me - https://www.skool.com/postagiprepardness 
🐤 Follow Me on Twitter https://twitter.com/TheAiGrid
🌐 Checkout My website - https://theaigrid.com/

00:00 - Introduction to Project Strawberry (formerly Q-star)
00:28 - Discussion of Reuters article and other Openai demos
01:50 - Speculation about Project Strawberry and GPT-4
03:31 - Details on Strawberry's purpose for autonomous research
05:11 - Challenges in creating autonomous AI agents
07:44 - Openai's focus on improving AI reasoning capabilities
09:51 - Explanation of Strawberry's post-training process
12:12 - Comparison to Stanford's Self-Taught Reasoner (STaR) method
15:55 - Example of STaR in action for common sense reasoning
17:56 - Performance comparison of STaR to larger models
19:30 - Openai's goals for Strawberry in long-horizon tasks
20:52 - Theories about the name "Strawberry"
22:51 - Summary of potential components of Q-star/Strawberry

These timestamps provide an overview of the main topics discussed in the video about Openai's Project Strawberry and related AI developments.
Links From Todays Video:


Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos.

Was there anything i missed?

(For Business Enquiries)  contact@theaigrid.com

#LLM #Largelanguagemodel #chatgpt
#AI
#ArtificialIntelligence
#MachineLearning
#DeepLearning
#NeuralNetworks
#Robotics
#DataScience

## Содержание

### [0:00](https://www.youtube.com/watch?v=kDGPGDN5Suo) Introduction to Project Strawberry (formerly Q-star)

so open AI are working on a new reasoning technology under the code named Strawberry what's Most Fascinating about this technology is that it is actually previously called qar so finally we do get some new details some new information on the top secret project at openai so let's take a look at what this project is how it actually works and some of the research papers that show us at least some hints towards what we might be looking at with project strawberry so you can see right here

### [0:28](https://www.youtube.com/watch?v=kDGPGDN5Suo&t=28s) Discussion of Reuters article and other Openai demos

that this is an article released not too long ago by Reuters now the thing is with Reuters we can actually trust their judgment when it comes to releasing information especially from open AI this is because Reuters is a very trusted Source meaning that the information that they present is often rather accurate now the crazy thing about this is that not only did we get this article that I'm about to dive into we also got some interesting pieces of information regarding other demos at open aai so one of the things I SP spoke about yesterday was humanlike reasoning you can see here that this is an excerpt from an article that was published yesterday in Bloomberg and in this article they actually discuss how recent demos have been showing gp4 AI that apparently Rises to humanlike reasoning now we can see here at the same time company leadership gave a demonstration of a research project involving it GPT 4 AI model that open AI think shows some new skills that rise to humanlike reason reing now the reason that I've included this in the video is because with project strawberry some people are wondering if the article from yesterday is referring to project strawberry that also demonstrates humanlike reason but as you can see from this reuter's article they clearly state that Reuters could not determine if the project demonstrated was strawberry so what this

### [1:50](https://www.youtube.com/watch?v=kDGPGDN5Suo&t=110s) Speculation about Project Strawberry and GPT-4

could mean is that open ey probably has another model that is specifically focused on reasoning that isn't as small as project strawberry in the sense that it's probably a large language model so what this could mean is that there is some kind of agentic framework wrapped around GPT 4 maybe a small iteration of GPT 4 that uses an agentic framework to do better SL increased reasoning we know that this is a possibility because this is something that many people have done with smaller large language models so the reason that this is actually kind of fascinating is because it now means that potentially there might be a new way for GPT 4 to rise to Human like reasoning one important detail that you might want to also note is that this doesn't actually say anything about GPT 5 I know that some people might speculate that this is GPT 5 but project strawberry and this GPT 4 AI model doing some human-like reasoning is completely separate from what GPT 5 is so I'm guessing that what we are seeing here is the inner workings of open ai's research departments and these areas are where they are focusing on how they can actually make these models a lot smarter in terms of the reasoning ability and the reasoning ability is essentially just where the model is able to think about problems break them down and understand them in a way that is far superior than prior efforts so you can see here that the actual goal of strawberry is to perform research teams inside open AI are working on strawberry according to a copy of a recent internal open ey document seen by Reuters in May Reuters could not ascertain the certain precise date of the document which details a plan for how open AI intends to use strawberry to perform research

### [3:31](https://www.youtube.com/watch?v=kDGPGDN5Suo&t=211s) Details on Strawberry's purpose for autonomous research

the source described the plan to Reuters as a work in progress and the news agency could not establish how close strawberry is to being publicly available so one of the key things about strawberry that we've gotten from this article is that strawberry is going to be used to perform research now it's not clear what kind of research that is but further on they actually do talk about how this is going to be some kind of deep research that's on the internet take a look at this so it says how strawberry works is a tightly kept secret even within open AI the person said the document describes a project that uses strawberry models with the aim of enabling the company's AI to not just generate answers to queries but a plan ahead enough to navigate the internet autonomously and reliably to perform what open ey terms deep research according to the source so deep research could mean that this is some kind of AI agent that is able to perform Research autonomously Now what's interesting here is that this seems to be the next iteration of models that open AI might be working on previously if you watched yesterday's video we saw that there are five levels to AGI SL the levels of AI that opening ey is working on and number two is going to be reasoners and number three is going to be AI agents now it's actually quite hard to gauge exactly what strawberry is and how it's being done but I do think we have enough details to make somewhat of an educated guess so what we can see from this article right here especially this part is that the document describes a project that uses the strawberry models with the aim of enabling the company's AI to not just generate answers to queries but to plan ahead enough to navigate the internet autonomously and reliably to perform deep research so previously if

### [5:11](https://www.youtube.com/watch?v=kDGPGDN5Suo&t=311s) Challenges in creating autonomous AI agents

you've been in the AI space you'll know that one of the main problems with AI is that currently in doing AI agents some of the top CEOs have previously said that in order to get autonomous agents it's actually very tricky this is because of two reasons number one is the lack of skill and number two is a lack of reliability if we don't have models that are truly capable and truly reliable we can't get them to perform consistent actions over a long enough time frame in order for the AI agent to actually perform autonomously so when you ask a model to produce you know to complete a sequence of action let's say it's like three things in you know to to basically let's say to book you know a restaurant that you and I can go to on a certain day yep the first action would be check the availability in both of our calendars so that's a correct function call reconcile the correct moment so that's the second action make sure that it's a restaurant that has availability so that check is another one and then you know go and sign in so that you can basically you know use the correct tool to book the right Resturant at the right time put your credit card details down um having obviously also checked that it's a restaurant that we both like Etc so there like four or five or six different steps of you know just to produce that one quotequote action subcomponents right in order to get that right you're basically saying that the model has to produce perfect function calling for each element and do so in sequence so it can't just be arbitrary it has to be in sequence and that's like saying it has to write a four page document in response to one question that is exactly that document and it be something that is approximate or similar to that document so we all think that obviously these models are magic at the moment and they write beautiful poetry and crative copy and text and give you good answers and sometimes they're grounded and blah blah but when for each one of those answers there's a wide range of correct answers that it could have picked right tens hundreds thousands maybe now what's interesting here is that apparently strawberry might be the Breakthrough that enables the company's AI to plan ahead enough to navigate the internet autonomously and reliably now I may have included a clip there from Dario amod and Microsoft Mustafa Suman but hopefully those clips showed you all why this is a real problem that needs to be

### [7:44](https://www.youtube.com/watch?v=kDGPGDN5Suo&t=464s) Openai's focus on improving AI reasoning capabilities

solved now depending on how good strawberry models are we know that if the reasoning of abilities do Jump Then it does mean that potentially we could be getting AI agents much sooner but we also do know that AI agents are planned for the future models like GPT 6 now there was an open AI spokesperson that said something in a statement they said that we want our AI models to see and understand the world more like we do continuous Research into new AI capabilities is common practice in the industry with a shared belief that these systems will improve in reasoning over time and the spokesperson did not address questions about strawberry for whatever reason now of course this is one of the main goals about open AI because having better reasoning allows the models to be a lot more useful because one of the main gripes that people have with current systems is that they just aren't smart enough to understand what we want them to do time and time again now I would argue that often times you do need to provide as much detail as you can to systems like GPT 4 but there is the argument that sometimes these models just truly don't understand what we want them to do it's clear that this is essentially the main directive for open AI right now and this is where they're focusing all of their efforts in order to be the market leader in terms of these models reasoning is probably going to be the main area where they're going to be focusing now we can you can see something also quite interesting here and this is a call back from an interview that Sam mman had with Bill Gates you can see here that the bit I've highlighted speaks about how Sam Alman talks about how future models are going to have increased reasoning ability the reasoning ability essentially was where samman was stating that is the area of largest progress that they would want of course gp4 right now can reason but as Sam Alman States it's only reasoning in extremely limited ways and it's also not that reliable as he States if you ask gp4 most questions 10,000 times one of those 10,000s is probably really good but it doesn't know which one is which and you'd like to get the best response of 10,000 each time so the increase in reliability will be important now an interesting part of strawberry actually showed some information regarding future

### [9:51](https://www.youtube.com/watch?v=kDGPGDN5Suo&t=591s) Explanation of Strawberry's post-training process

capabilities so you can see that it says here that AI researchers interviewed by Reuters generally agree that reasoning in the context of AI involves the formation of a model that enables an AI to plan ahead reflect how the physical world functions and work through challenging multi-step problems reliably improving reasoning in AI models is seen as the key to unlocking the ability for models to do everything from making scientific discoveries to Planning and Building new software applications that's why reasoning is at the Forefront of open ai's directive is because this is going to be something that unlocks pretty much all future use cases where the model is used extensively now what was also interesting was that it looks like we might actually be getting a separate model as Sam mman has discussed in the past on the Lex Freedman podcast there has been discussions of different model releases now different model releases isn't always necessarily GPT 5 remember GPT 40 was a model that was released with certain capabilities that we truly didn't even expect and I did a whole video on the secret capabilities of GPT 4 which honestly are quite outstanding gbd4 is quite understated in terms of its true capabilities from it being able to do 3D models to photorealistic images it's a truly multimodal incredible piece of technology what this statement says here is that in recent months the company has been privately signaling to developers and other outside parties that it is on the cusp of releasing technology with significantly more advanced reasoning capabilities now what that could potentially mean is that we could be getting an independent reasoning model now I'm not sure if this is actually going to be true but it didn't mention anything in regards to GP pt5 now I do think that if this was referring to GPT 5 they would have probably simply just stated it but that is going to be something that we will have to know/ C in the future but either way we do know that they are on the CP of releasing technology with significantly more advanced reasoning capability and it will be interesting to see how with opening eyes track record of now doing iterative deployment how those models are released into the wild now this is where we get into how this model actually kind of works and what the model is in its bare bones form so it says that strawberry includes a specialized way in what is known as posttraining so opening eyes generative models or adapting the base models to hone their performance in specific ways after they have already been trained on reams of data one of the sources said

### [12:12](https://www.youtube.com/watch?v=kDGPGDN5Suo&t=732s) Comparison to Stanford's Self-Taught Reasoner (STaR) method

the post-training phase of developing a model involves methods like fine-tuning a process used on nearly all language models today that come in many flavors such as humans giving feedback to the model based on its responses and feeding examples based on good and bad answ so if you want to know how posttraining Works in AI essentially the AI learns General language skills from a lot of checks during the pre-training it learns how to understand and generate language kind of like a student learning basic reading and writing skills now after pre-training the AI goes through posttraining which is also called fine tuning and during fine-tuning the AI is given specific task to practice like answering questions summarizing articles or translating language it's like giving the student extra lessons focused on particular subjects this is obviously an important step for AI because it can help the student do better in certain subjects and it helps AI perform better on specific tasks now here's what's interesting about strawberry is that open AI hopes The Innovation will improve its AI model's reasoning capabilities dramatically and that strawberry involves a specialized way of processing an AI model after it has been pre-trained on very large data set so it seems that open AI is testing out a new way to kind of finalize these models it doesn't seem like it's that similar to post trainining but it involves a special ized way of processing an AI model after it's already been pre-trained so basically a new way of quote unquote post-training the model now here's where we get into some of the research papers that were linked to this information so it says here that strawberry has similarities to a method developed at Stanford in 2022 called self-taught Reasoner or star and this is one of the sources with knowledge of the matter said now star enables AI models to bootstrap themselves into higher intelligence levels via it L creating their own training data and in theory could be used to get language models to transcend human level intelligence one of its creators Stanford Professor Noah Goodman said so essentially here this is I guess you could say in theory some kind of self-improving AI because what they're doing is iteratively creating their own training data and then using that to reason about the world now I did a video on a version of this which I will talk about later but if we take a look at the paper it truly is fascinating with as to what it shows us because we haven't seen this explored that much with regards to AI so we can see here that this is called star self-taught Reasoner bootstrapping reasoning with a reasoning if you read the abstract it gives you a clear detail on exactly what's going on so we can see here generating step-by-step Chain of Thought rationals improves language model performance on complex reasoning tasks like mathematics or common sense QA however inducing language model rationale generation currently requires either constructing massive rationale data sets or sacrificing accuracy by using few short inance we propose a technique to iteratively leverage a small number of rational examples and a large data set without rationals to bootstrap the ability to perform successively more complex reasoning this technique the self-taught Reasoner relies on a simple Loop generate rationals to answer many questions prompted with few rationale examples and if the generated answers are wrong try again to generate a rationale given the correct answer Bine tune on all the rationals that ultimately yielded correct answers and then repeat and from this we've shown that star significantly improves performance on multiple data sets compared to a model fine-tuned to directly predict final answers and performs comparably to fine-tuning a 30 times larger state-of-the-art model on Common Sense QA thus Starlet model

### [15:55](https://www.youtube.com/watch?v=kDGPGDN5Suo&t=955s) Example of STaR in action for common sense reasoning

improve Itself by learning from its own generated reason so this is incredible because this is the kind of technology that could generate super intelligence or at least get to an intelligence that is above human having a model that's prove Itself by learning from its own reasoning is something that would be somewhat recursively self-improving now I do know that recursively self-improving means that the model can change all kinds of things about itself but I do think having a model improve Itself by learning from its own generated reasoning is largely the step that the space bace is going to be moving towards because we do know that from history AI models that only learn from Human demonstrations or human informations are limited by that of the human so having an AI system that could improve Itself by learning from its own generated reasoning is absolutely incredible so essentially what we have here is a model that's able to learn from its mistakes and then when it gets the questions right it learns why it got them right and learns from that now here's an example of this inaction now this is the star paper this is a screenshot and overview of stars generated rationale on Common Sense QA we indicate that the fine-tuning out a loop with the dash line and the questions and ground truth answers are expected to be present in the data set while the rationals are generated using star you can see here that this is the architecture of how it works and this is the question so the question is what can be used to carry a small dock then of course we have the choices a swimming pool a basket a dog show a backyard an own home and then we can see the rationale being generated using the star framework so it then says the answer must be something that can be used to carry a small dog baskets are designed to hold things therefore the answer is basket so we can see that the model has rationalized it's got its answer and it's then able to improve itself now one of the crazy things when I was looking at this paper was not only what this model is able to do but what it was able to do compared to other models of much larger sizes so

### [17:56](https://www.youtube.com/watch?v=kDGPGDN5Suo&t=1076s) Performance comparison of STaR to larger models

gptj is a 6 billion parameter language model served as the base model for the star method initially gpj was prompted with a few examples that included rationals which is step-by-step explanations the model was then used to generate rationals for a larger data set the generated rationals were filtered to only keep those that led to correct answers and gptj was then fine-tuned on the filtered data set containing both the questions and the correct rationality for the questions where the model initially gave the wrong answer it was given the correct answers and asked to gener rationals for them and these rals were also used for further fine-tuning and this process of generating filtering and fine-tuning was repeated iteratively allowing gptj to improve its reasoning capabilities over time now what's crazy here is you can see that gptj with star and the rationalization performs comparably to gpt3 which is directly fine-tuned so you can see they managed to get an extremely small model to perform comparably to a model that was around 30 times larger and I'm guessing that people are wondering if this method is going to work with larger methods like GPT 4 or even GPT 5 one thing that I do know now is that all eyes will be on the star paper one thing that I do know now is that maybe many AI researchers are probably going to take a second look at this paper because if what they're claiming is true and it could potentially be used to get to high levels of intelligence that transcend what humans can do this has some true truly scary implications now the article also goes into aiming now the article

### [19:30](https://www.youtube.com/watch?v=kDGPGDN5Suo&t=1170s) Openai's goals for Strawberry in long-horizon tasks

continues to state that opening eye is aiming at strawberry to perform long Horizon tasks referring two complex tasks that require a model to plan ahead and perform a series of actions over an extended period of time you can see here it says the opening eye is creating training and evaluating the models on what the company calls a deep research data set now reuter was unable to determine what is in that data set or how long an extended period would mean and the entire goal of this is that open ey wants to use its models to use these capabilities to conduct research by browsing the web autonomously with the assistance of a CUA or a computer using agent one that can take actions based on its finding according to the document with one of these sources and open AI plans to test its capabilities by doing the work of software and machine learning Engineers now this is a lot of information because it has a lot of implications one of the major implications that this has is that it seems like open AI has a clear path towards being able to autonomously do good internet research now I don't think it's just internet research because what they also stated here was that they're going to be testing the capabilities on doing the work of software and machine learning Engineers so it seems also that openi might have some internal Benchmark for where they're testing how capable these models are in terms of it being able to do software engineering and machine learning engineering these of

### [20:52](https://www.youtube.com/watch?v=kDGPGDN5Suo&t=1252s) Theories about the name "Strawberry"

course are really difficult tasks but if you do remember one of the main goals of open aai was to automate AI research and one of the other things of open AI was to get AI agents right now I'm guessing that what open AI has probably realized is that reasoning unlocks all of these capabilities and once they do unlock higher levels of reasoning then everything is going to be a lot easier now one of the things that many people have questioned is of course the name strawberry why would open AI switch from qstar to Strawberry one of the main theories floating around is the fact that this is from a question that shows how llm struggle with the reasoning a simple question proposed to the majority of llms today gives a wrong answer the question is how many RS are in the word strawberry and usually what happens is that these models will state that there are two RS in the word strawberry if you didn't get it if you take a look at the word strawberry there's one R here and then there's two RS right there so in total there were three but somehow the model didn't reason properly and it didn't realize that there are three Rs in the word strawberry this could be the reason why there's also another theory flow around that this is named after elon's Vanity Fair Strawberry Fields so in a 2017 Vanity Fair profile using a vivid metaphor that involved Strawberry Fields and he described a scenario where an AI designed for picking strawberries could potentially go Rogue and convert the entire Earth into strawberry field he explained that all it wants to really do is pick strawberries so then it would have all of the world be strawbery field now I don't think that this is the case since Elon Musk and open AI have long had a feud in recent years and I think that Sam mman and Elon musk's relationship isn't exactly the best now overall I do think that qar does have a lot of information and probably is something that they are prioritizing a lot at open AI considering very recently Sam mman said that qar was something that they weren't willing to talk about yet now there were many theories about what qar was before and we do have them

### [22:51](https://www.youtube.com/watch?v=kDGPGDN5Suo&t=1371s) Summary of potential components of Q-star/Strawberry

here one of them was the q-learning approach which I spoke about before which was why they potentially called it qar of course we had the a star search which was where the star comes from and of course now we have the star which is the self-taught Reasoner which is why some people are speculating this is why it's called star but also some of the things from the paper also do show us that this is potentially related to the paper so I'm not sure if maybe it's a combination of these papers SL abilities or it's a really unique way because open AI doesn't show their research in summary qar could be an advanced AI system that combines the strengths of q-learning for decisionmaking AAR search for efficient planning and self-taught reasoning in summary qstar could be an advanced AI system that combines the strength of q-learning for decision- making AAR search for efficient planning and the self-taught reasoning star for improving its problem solving abilities through iterative rationale generation and refinement this combination could create a highly capable system that can plan act and learn in a sophisticated manner

---
*Источник: https://ekstraktznaniy.ru/video/14190*