# The Singularity is HERE? LLMS Are Now "Self Evolving"

## Метаданные

- **Канал:** TheAIGRID
- **YouTube:** https://www.youtube.com/watch?v=B-4sW4RqboI
- **Дата:** 23.11.2024
- **Длительность:** 11:09
- **Просмотры:** 36,600
- **Источник:** https://ekstraktznaniy.ru/video/13691

## Описание

Prepare for AGI with me - https://www.skool.com/postagiprepardness 
🐤 Follow Me on Twitter https://twitter.com/TheAiGrid
🌐 Checkout My website - https://theaigrid.com/


Links From Todays Video:
https://www.theinformation.com/articles/what-if-llms-could-continue-learning?rc=0g0zvw 
https://www.theinformation.com/articles/meet-mai-1-microsoft-readies-new-ai-model-to-compete-with-google-openai?rc=0g0zvw 


Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos.

Was there anything i missed?

(For Business Enquiries)  contact@theaigrid.com

Music Used

LEMMiNO - Cipher
https://www.youtube.com/watch?v=b0q5PR1xpA0
CC BY-SA 4.0
LEMMiNO - Encounters
https://www.youtube.com/watch?v=xdwWCl_5x2s

#LLM #Largelanguagemodel #chatgpt
#AI
#Artificia

## Транскрипт

### Intro []

so the llm space doesn't seem to be dying down at all because recently there was a new type of llm called a self evolving large language model and this is absolutely revolutionary because one of the key problems with current LMS is that they can't update themselves it's like a time capsule for knowledge embedded into a large language model so apparently there is something called writer a $2 billion valuation startup

### What is this [0:25]

that's developing AI tools for Enterprise that have now developed this new type of self-evolving llm that they claim can continue learning and updating its parameters and the implications for this are quite staggering because it means that they can continue learning and it allows them to change how it answers questions even after it is deployed so basically one of the main

### Problems with LMS [0:48]

problems with LMS is that they are really expensive to make and the problem is that if we have to keep retraining models it's going to continue to get more expensive take a look at this paper that dives into how expensive these things are going to cost you can see it says that the largest training runs will cost more than a billion by 2027 which means that only the most well-funded organizations will be able to finance Frontier AI models which kind of leads to a poor Monopoly on the situation so the craziest thing about all of this is that most companies don't have a billion dollars to run these AI training runs and of course if we could solve this issue it would be an enormous breakthrough for the AI Community because it would reduce the amount of spending needed to fund these AI projects and thus speed up AI development and not only is it the cost of these large language models that is the issue we also do have the problem that current llms when we are trying to ask them things that have happened recently they always say without using their current search capabilities their knowledge is accurate up to potentially a year ago so right now GPT 40 the version that I'm using the knowledge cut off is actually up until 2023 and that is without search so if I wanted to ask anything really specific I would have to get it to browse the web and sometimes the web doesn't have certain pieces of information now I hope you can all understand why having a model that is up to dat is really important because we live in a really fast moving world so if this model is still thinking in 2023 terms it's going to have a severe disadvantage over a model that is able to dynamically update itself based on what recently happened now they

### SelfEvolving LMS [2:29]

basically state that this actually works in a new and interesting way it says traditional Transformers which are the model architecture that underlies popular llms like chat gbt and Claude are made up of layers or special filters that help the model learn different aspects of the input data and within each of those layers writer self-evolving llms also include a memory pool which stores important information from past interactions said writer and co-founder the CTO so basically you can see here they have this memory Pole which is able to update itself from past interactions and it's able to get better at certain responses by storing that somewhere inside of it so each time the llm receives new information it hasn't seen before it's able to update the memory pool with that new information and it does this throughout all its layers so that it can refer to the information in future interactions and crazily they also state that they do have some control over what the llm does and doesn't learn so you can't just troll the llm with fake facts and this is actually pretty interesting because I do remember when I was talking to chat GPT often times I would try to see what the model is able to believe from me like I would try to tell it recently I was appointed the king of England and sometimes it would try to say no that's not possible and it was pretty funny to see the kinds of responses that I was able to get from the models but of course I know that this kind of thing wide skill is going to be really fascinating because I'm wondering how the llms are going to discern from what is true from what is false and of course when things are on the internet people can tend to mess with them so now when we actually refer to the training cost they state that developing a self-evolving llm increases training cost by 10 to 20% but doesn't require additional work once the llm is trained unlike other methods to update models with new information like retrieval augmented generation or fine-tuning so they're basically stating that this doesn't require any additional work it just requires work on the front end which basically means that once you have this model it's going to be like the kind of model where you won't really need anything else provided there aren't any maj major architecture breakthroughs like the mum architecture or those ones that other people are using so it's going to be really interesting to see if this is widely adopted but there is also some fascinating effects from this so

### Benchmarks [4:42]

one of the fascinating effects as well is the fact that it leads to some interesting results on popular benchmarks it says specifically the model's performance on benchmarks actually improves every time it's tested since it learns the information on The Benchmark over time for instance the model was tested on a common math benchmark and got less than 25 of the questions correct and the third time it was tested it's accuracy jumped to nearly 75% now I think this is a little bit controversial because some people would argue that if the model is able to remember what it was seen in the test previously then the model isn't actually getting smarter it's just able to understand what has gone on before but when we look at the actual benchmarks we can see that yes the score does improve but I do wonder if that is just based on the memory of the model remembering the exact kind of questions or it's able to remember the kind of reasoning that it has before and because it isn't just something that you're asking questions to it's able to dynamically understand the kinds of questions that it failed in the past and of course it's able to get the future ones right I think if that is the case like the second scenario where it's not just remembering previous questions and it's remembering the actual way to be smarter then that would Mark a huge breakthrough because this would essentially mean that this is the kind of model that can literally get smarter and if this is baked into something that's like a one level smart then that would mean that you could technically get to AGI or ASI with a singular kind of system which has truly Stark implications if it's actually doing the reasoning part and not just remembering the answer to those questions and I'm sure you guys can understand why this is one of the most important things to talk about because self-evolving llms are something that we realize is literally going to be the cause of an AI catastrophe if there ever

### Problems [6:26]

is one now they also currently talk about how they do have this in beta with just two customers right now and of course they're still ironing out all of the Kinks so that is going to be something that they will need to do but one of the major problems that they are currently having with this model is that the information the llm learns the worse it gets at refusing to answer new and dangerous questions so you can think of this as the new information that the model learns over time will begin to override the original data such as safety instructions it was trained on and that's not great news for businesses that want to incorporate that kind of AI into customer facing products and one of the main problems that we do have with today's AI is the fact that we always pay attention to the safety aspect because we don't want these models outputting harmful things and if the model is able to learn new things or new ways of surpassing those kind of guidelines then it's going to be really problematic for that model to be out in the public space so it's quite likely that this is the kind of model that will only be used in the business sense because it's too obvious that a lot of people will just try and ruin the model by updating it with new and terrible information now they actually

### Limits [7:33]

talk about this as well they state that they have to limit how much new information that they can learn and they argue that this isn't a big of a concern for businesses as they're just typically just trying to update nlm with their own private information rather than all of the latest data on the web and he added that if you make the memor pole around 100 billion to 200 billion words it's enough for the LM to learn for at least 5 to 6 years for the typical Enterprise so that's pretty crazy when we think about this model's able to learn for about 5 to 6 years for the typical business so I mean I do wonder if this is going to be something that more people adopt as they're able to have these models that continuously getting better rather than ones that they have to continually find new ways to prompt engineer and of course do all these different hacks to be able to get the model to do exactly what they want now

### Memory [8:19]

of course this is something that has gone on in the industry because mustafi suan actually recently spoke about how memory was basically going to be solved and this is something that we do pay attention to because of course if you've seen on this channel actually spoke about Microsoft's recent breakthrough and how they spoke about this giant breakthrough coming through I mean I'm really confident 2025 memory is done permanent memory I mean if you think about it we already have memory on the web we retrieve from the web you know all the time quite accurately now co-pilot has really good citations it's up to date 15 minutes ago knows what's happened in the news on the web and so on so we're all we're just kind of compressed in that to do it for your personal Knowledge Graph and then you can sort of add in your own documents and your email and calendar stuff like that so memory is going to completely transform these experiences because you will be it's sort of frustrating to like have a meaningful conversation or go on a interesting exploration around some creative idea and then come back three or four or five sessions later and it's like let's start again we've completely forgotten what we talked about you know so I think that's going to be a big shift as well CU you'll know not only does it lower the bar to entry to you expressing a creative idea but those things don't get forgotten too so you can do this ambiguous cross reference back to something that you wanted what was that thing I said like 3 weeks ago and that is it's sort of like having a second brain in that it's like an extension of your so it will be interesting to see how these memory models do change because I think on one side yes you have the model that is able to remember stuff specifically about you probably contained on a web page and lm's able to you know use that context every time it gets a response but I think an actual self-evolving model that manages to improve its own performance and is able to dynamically update its own parameters I think that kind of thing is going to be more akin to AGI ASI and I do wonder that you know regardless of any of these Enterprise uses are there going to be some of the large Labs working on this for example is super intelligence is going to be working on this and I do wonder what kinds of things they doing behind the scenes to have these kind of models that are able to dynamically update themselves of course there is the security risk with you know these kinds of models we don't really know what they can kind of do but it is quite fascinating to take a look at how these experiments are changing the field and

### Outro [10:32]

of course one of the things that you might not remember is the fact that ma1 is going to be far smaller than any other model and ma1 or my1 is basically the model that is 500 billion parameters it's being developed by Microsoft and apparently it's going to have some new things in it so it will be interesting to see what kind of model that is and how they're moving forward with that model at Microsoft because there's a lot of things going on behind the scenes that happened when Microsoft acquired the company in flection and there's an entire team working on a large language model and consumer products that are going to be developed into some amazing things so let me you guys think about this crazy stuff and I'll see you guys in the next one