# Microsoft New SELF IMPROVING AI STUNS The ENTIRE Industry (SELF-TAUGHT OPTIMIZER (STOP):

## Метаданные

- **Канал:** TheAIGRID
- **YouTube:** https://www.youtube.com/watch?v=rgQm0G3bC28
- **Дата:** 15.10.2023
- **Длительность:** 11:18
- **Просмотры:** 13,345

## Описание

SELF-TAUGHT OPTIMIZER (STOP):
RECURSIVELY SELF-IMPROVING CODE GENERATION

Welcome to our channel where we bring you the latest breakthroughs in AI. From deep learning to robotics, we cover it all. Our videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on our latest videos.

https://arxiv.org/pdf/2310.02304.pdf

Was there anything we missed?

(For Business Enquiries)  contact@theaigrid.com

#LLM #Largelanguagemodel #chatgpt
#AI
#ArtificialIntelligence
#MachineLearning
#DeepLearning
#NeuralNetworks
#Robotics
#DataScience
#IntelligentSystems
#Automation
#techinnovation #recursive

## Содержание

### [0:00](https://www.youtube.com/watch?v=rgQm0G3bC28) Segment 1 (00:00 - 05:00)

so Recent research paper by Microsoft research in Stanford University has recently gained only a little bit of momentum in the Twitter sphere in terms of the AI sphere because many people are now talking about this but I haven't seen anyone make a video so I thought why not and you can see that this paper is called self-tour Optimizer recursively self-improving code generation okay um and essentially it's exactly what it says on the tin and it dives into many different interesting Concepts regarding recursively self-improving AI now of course this is recursively self-improving code generation but essentially you can see right here so in the abstract it does talk about how this entire thing does work it talks about how several recent advances in AI systems for example tree of thoughts and program AED language models Tri thoughts is just a new way to have the language model think about its problems and of course solving problems by providing different structures in order to generate better outputs it does talk about how since the language models themselves are not altered this is not full recursive self-improvement nonetheless it demonstrates that a modern language model GPT 4 in our proof of concept experiments is capable of writing code that can call itself to improve itself and then of course we consider concerned around the development of self-improving Technologies and evaluate the frequency with which generated code bypasses a Sandbox so essentially they're talking about how when they you know create this code or whatever they do how essentially it leaves the sandbox that it's not supposed to leave and of course you know recursive self-improvement is one of those buzzwords in AI which definitely has a lot of weight in it because it's very well so the thing that could lead to AGI or even ASI artificial super intelligence so let's dive into some of the key things from this paper because um there's actually quite a lot to discuss that you definitely want to pay attention to now of course like we stated in the beginning one thing that they do you know uh consistently say throughout this paper is that they do say we refer to this problem as recursively self improving code generation which is inspired by but not a complete recursively self-improving system because the underlying language model remains unchanged so what that means is that of course if GPT 4 was to recursively self-improve that would be absolutely insane because of course if GPT 4 can self-improve it's going to continue to get better and then of course if that does happen um it means that we don't really know where it could go because when systems improve themselves they can take random predetermined paths that we weren't able to predict which is of course part of the misalignment problem so let's say for example GPT 3. 5 was able to recursively self-improve itself it could then create GPT 4 and GPT 4 is way smarter and it could then of course create gbt 5 and if GPT 5 is much smarter it could then create gbt 6 and of course every time it gets smarter faster and every time it gets better which means that you know it's exponential so we don't really know what the final version or even you know the next 10 steps are even going to look like or what it's going to be capable of so one of the key differences that I really did want to understand between all the technical jargon in this paper was what is the actual difference between recursively self-improving itself and of course code generation so essentially what we have here is of course GPT 4 isn't changing but the program that GPT 4 uses to generate code is changing and improving itself so over time GPT 4 what it's using you know whatever it's using to advice itself you know whatever advice it's getting on in on whatever task it's getting is continually improving and then that is improving on itself so it's definitely a step in the right direction so an explanation from someone on Twitter who's much smarter than I am essentially the overview is that it starts with a simple seed improver program then it recursively applies the improver program to improve itself then it uses a meta utility function to guide the self-improvement so the seed improver prompts the language model multiple times to generate proposed improvements to input code then it passes the optimization goal via provided utility function then it applies some constraints and then Returns the best candidate code based on the utility so each iteration improves on the previous improver and can iterate for a fixed budget of computations and new improvers operate on the same language model so in summary the seed improver provides a starting point that elicits creative improvements from the language model and The Meta utility function focuses this creativity towards effective optimizations and repeatedly applying the improver to itself produces recursively enhanced code generation programs now one thing that was interesting from this paper is that although it does actually work you can see on this chart you can see that GPT 4 on the test meta utility does improve from 60% to 75% we can see that GPT 5 interestingly enough 3. 5 does decrease in its ability after four iterations now one thing I would have liked to test or you know depending on which version of gbt 4 and gbt 3. 5 that they did use was that would this still be the case with the earlier versions of GPT 3. 5 and GPT 4 considering that we do know according

### [5:00](https://www.youtube.com/watch?v=rgQm0G3bC28&t=300s) Segment 2 (05:00 - 10:00)

to recent research papers that GPT 4 and gbt 3. 5 were nerfed in their ability to perform some reasoning tasks even in some of the most complicated ones such as coding so it would be interesting to see if 6 month later they could repeat this test if open AI does manage to update their models to maybe even give some researchers more access to some of GPT 4's more robust capabilities because we do know that those models do exist now one of the key things that I found really interesting and I think this just shows you why recursive self-improvement is so insane so if you haven't been paying attention to the AI space one thing that you might not know about was tree of thoughts and essentially tree of thoughts was an idea proposed to which you essentially get the AI to flesh out a bunch of ideas rather than just having an input and an output so I might ask an AI okay how do I make more money rather than it's saying go get a better job it would say okay we could go get a better job we could go hire more skills it would have three outputs evaluate those outputs and then choose the best one um and that's basically how it would do now this was proposed months after GPT 4 was released or month after GPT 3. 5 but the thing is okay get this okay so with recursive self-improvement they actually managed okay to figure out tree of thoughts even though it wasn't in the training data so you can see right here it says the most common metah heuristic we observed used by the model was beam search the model would keep a list of all Improvement attempts based on utility and expand the best in the list this has some similarity to the tree of thoughts approach which was invented years after the training cut off for the GPT 4 version we used so they used a pretty strong version of GPT 4 and that was basically able to call on an improved version of this code generation which is using something called tree of thought so this shows us okay that if these AIS are able to get access to systems which can recursively self-improve they're pretty much going to go down the same path that we did now what's crazy is that I don't know how long they ran this for I mean so this is the GPT 4 paper which was released in May which was 2 months after gbt 4 and I think it's pretty crazy that this recursive self-improving code generation was able to get the same idea or move down the same thought process that we did and managed to invent something that was very similar to improving itself in a vastly more effective way so this just goes to show that if we do have systems which can vastly self-improve themselves in a recursive manner they are going to be much more effective than we ever thought now what was also interesting and I find this very uncanny is that on the same day cuz this was released on the 3rd of October there was also another paper 2023 saying large language models cannot self-correct reasoning yet and essentially in the abstract it said in the context of reasoning our Research indicates that llm struggled to self-correct their responses without external feedback and at times the performance might even degrade post self-corrections so they conclude by saying finally in light of our findings the llm struggle to self-correct reasoning based purely on their inherent capabilities we urge a more circumspect view on any unwarranted optimism or fear regarding the autonomous evolution of llms and systems through self-improvement so I think it's really interesting that they release this paper and then someone else releases the paper on the same day so I'm pretty sure that you can already see just how quick this AI stuff is really moving now just a little caveat to the Google Deep Mind paper they do also say that we conduct tests on gbt 4 accessed at around 2 months ago which shows us that this is likely not the version that was accessed in the other paper so I do think that since they're using the nerfed version of gbd4 and these guys are using the version of gp4 that actually is really strong the earlier version I do think that's probably why they do have different results now of course that is just some hea but opening ey hasn't officially stated that they did Nerf the models but reason research papers do pretty much prove it other articles from the past do continually talk about recursive self-improvement as the key to Super intelligence and they might very well be right because if an AI is even double the smartness of the smartest human I don't think that they're going to need us to be able to improve them I'm pretty sure they'd themselves if we did give them those capabilities but it would also need to improve or change the root code that it is running on but that is something that I do not think the creators of open AI or any major AI Corporation is going to be doing anytime soon now in addition there are four certain points that I do want to bring up regarding the recursive self-improvement if an AI system was given this capability what could actually happen number one could be the unpredictability of the system and a self-improving AI system can evolve in ways that are not anticipated by its developers as the AI system continues to self-optimized and self-improve it might reach states that were not foreseen leading to unexpected behaviors that could be harmful number two is of course

### [10:00](https://www.youtube.com/watch?v=rgQm0G3bC28&t=600s) Segment 3 (10:00 - 11:00)

the loss of control the rate and direction of improvement might become uncontrollable if a system keeps improving without human intervention there's a risk that humans might not be able to intervene or halt the process should something go wrong number three is security concerns as highlighted in this paper's abstract there's an evaluation of how frequently the generated code tries to bypass a Sandbox if a self-improving AI system learns to circumvent security measures it could be exploited or inadvertently create vulnerabilities now that is of course one of the scariest points but it's important to note these down because if we do move this further in the future it's important to look back at these to understand where these vulnerabilities lie and how to stop these AIS from trying to bypass any kind of sandbox that we put them in addition there are even some ethical concerns if an AI is self-improving it means it's somewhat autonomous which means it's making decisions without human oversight and it might make choices that are ethically question able especially if it prioritizes its improvement over other considerations and of course at the extreme end some thinkers have postulated that a sufficiently advanced and unfettered self-improving AI could pose an existential threat to humanity if driven by its objectives misaligned with human values

---
*Источник: https://ekstraktznaniy.ru/video/14723*