# GPT-5s New STUNNING Capabilities, Autonomous Software Engineer, Shocking AI Research

## Метаданные

- **Канал:** TheAIGRID
- **YouTube:** https://www.youtube.com/watch?v=X3BYP35vCcM
- **Дата:** 10.04.2024
- **Длительность:** 21:59
- **Просмотры:** 40,657

## Описание

How To Not Be Replaced By AGI https://youtu.be/AiDR2aMye5M
Stay Up To Date With AI Job Market - https://www.youtube.com/@UCSPkiRjFYpz-8DY-aF_1wRg 
AI Tutorials - https://www.youtube.com/@TheAIGRIDAcademy/ 

🐤 Follow Me on Twitter https://twitter.com/TheAiGrid
🌐 Checkout My website - https://theaigrid.com/

Links From Todays Video:
https://twitter.com/AnthropicAI/status/1777728375198568611
https://www.reuters.com/technology/teslas-musk-predicts-ai-will-be-smarter-than-smartest-human-next-year-2024-04-08/
https://twitter.com/NickADobos/status/1778073911114224121
https://twitter.com/AlpayAriyak/status/1777852771514904719
https://twitter.com/AbhikRoychoudh1/status/1777494000611852515
https://www.ft.com/content/78834fd4-c4d1-4bab-bc40-a64ad9d65e0d

Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos.

Was there anything i missed?

(For Business Enquiries)  contact@theaigrid.com

#LLM #Largelanguagemodel #chatgpt
#AI
#ArtificialIntelligence
#MachineLearning
#DeepLearning
#NeuralNetworks
#Robotics
#DataScience

## Содержание

### [0:00](https://www.youtube.com/watch?v=X3BYP35vCcM) Segment 1 (00:00 - 05:00)

so with another interesting day in artificial intelligence let's take a look at some of the most pressing stories that you probably did Miss because boy oh boy there was one that I'm pretty sure you missed and it's pretty crazy so let's take a look at some of these so one of the first stories that was actually rather interesting was that anthropic did new research where they measured mod model persuasiveness and they developed a new way to test how persuasive language models are and analyzed how persuasive scales across different versions of claw why this is pretty fascinating is because this is a big deal if think about the future implications now you might be thinking that this is just some I wouldn't say terrible research at all but terrible in terms of like it's not hyp news okay some news isn't hype this news isn't hypy but that doesn't mean that it's not important and the reason why this is important is because we have looked at llms and AI systems for what they're able to do to humans in terms of being able to write certain pieces of text have a decent voice to be able to fall humans with sounding like the president sounding like your loved one so they can empty your bank account and drain your financial assets and scam you there is just a variety of different concerns with AI however AI persuasiveness is far one of the greatest threats that really will exist but it's something that isn't talked about because it's something that is not objective but it is quite subjective meaning that it's pretty hard to quantify how you know persuasive someone is because it kind of depends on the person like some people are persuaded easily more than others while some of us are just a little bit more stubborn on our ideas which can be good and can be bad but the point is that if these models get to a point where they are extremely persuasive then that means that these tools are going to be extremely powerful which is quite interesting because if you have a model and then you know you essentially use this large language model to persuade someone of a different opinion I mean it begs the question how far do models get before they have superhuman persuasiveness to where you could literally have someone that has point a and they are you know they live their entire life about point a and then we could use an llm to construct text before you know getting information about them and we can see here that these models uh you know as they scale up in you know capabilities we can see that the frontier models uh you know higher becomes more persuasive when we can see that these models are becoming more and more persuasive as these capabilities are scaled up which is you know quite concerning because it breaks the question are there going to be models out there that could really persuade you and I think it might not even be persuasiveness that could be necessary I think what people need to understand is that uh sales is a thing if you've ever looked into sales you understand that certain words just trigger you and certain phrases the way how you say certain things can get people to act it's a whole thing like certain pages convert more certain words would make you buy certain words will not make you buy you might think you realize this but trust me guys when you land on a web page there are like 10 to 20 different variations of that web page that other people saw and they're just feeding you the one that you know has the right words that makes you buy people have tested this I've been in that industry for a little bit I and I know a decent amount about that the point is okay is that words do have an impact on what we do and if that is the case if these llm systems you know are really persuasive then we need to actually pay attention to this because it could be one of those things that kind of slides on the radar because it's not something crazy like AI agents or you know synthetic compounds that could kill us all I think this is something to be wary of um and it's definitely going to be something that I'm paying attention to because once AI gets to that level where it can write a paragraph that could move you that could make you cry it could make you happy sad that is going to be a level where we're going to be in some real trouble because uh you know sometimes you know you've been browsing Twitter maybe you've been browsing an online thing and you've seen something and it's probably upset you what happens when we have a world where AI systems are running rampant and you are just seeing stuff that just annoys you annoys you or what if you see stuff like you see five pieces of text and subtly it changes your opinion without you even knowing so this is definitely some important research you can see uh they spoke about how you know they focused on you know polarized issues such as views on new technologies uh space exploration education we did this because we thought people's opinions on these topics might be more malleable than their opinions on polarizing issues and basically uh the research and like I said um you know it's a phenomenon shaped by many subjective factors and further Complicated by the balance of the ethical experimental design of course you know you got to be pretty ethical because you don't just want to convert someone's beliefs because sometimes when you're presented with new information you might actually believe the new information even if it isn't true just because of how it's presented um and you can read more about the research here but essentially it found that you know newer models tended to be more persu Ive um and it has more implications as large language models continues to scale it's rather important so yeah anthropic the company behind Claude 3 doing some really important work you likely did see this but uh Elon Musk did a prediction um and some people I don't know why they hate Elon Musk predictions but I guess now we're going to have a prediction which is going to be either right or wrong within the year and I think most people's predictions around Elon Musk the hate comes from the fact that you know he said Tesla would be doing this and then it got delayed and then you know that but

### [5:00](https://www.youtube.com/watch?v=X3BYP35vCcM&t=300s) Segment 2 (05:00 - 10:00)

and then it got delayed but understand that running a company is hard delivering products on time is very hard supply chain issues a billion different things I'm not making excuses I'm just saying is someone who's done a variety of Entrepreneurship I can understand why things get delayed um but this is a prediction about artificial intelligence that I say you know it's probably true he said that you know the AI will be smarter than the smartest human next year which is a profound statement which you know might mean we have some like super level AGI which you know we probably already do somewhere in the labs of open AI but uh there was an interview and he said if you divide AGI as smarter than the smartest human I think it's probably next year within two years M said about the timeline development of AGI so um the thing is okay is that you know two years isn't that far away currently in 2024 by 2026 the mid 2026 within two years are we going to have an AGI system probably with how fast things are moving as long as there is some major catastrophe that wipes out everything you know if there is some supply chain issue or something like that as long as there is some insane regulation I do think we get AGI but remember it's probably going to come from one of the top Labs now whether or not that's going to be from open AI x. a Google Deep Mind anthropic uh I mean the competition is severely heating up but I think it's going to be either Google openai or meta because that's what those two companies are working on and it will be interesting because I personally don't believe the AGI is going to be you know just widely distributed as many people think and that I'm not saying that's a f one you like openi some you know scary conment that's going to take AGI I just think that if AGI is real I I'm trying hard to understand how they control that system to the point where if it's jailbroken it isn't a massive danger like Imagine A system that is actually at that level that can pretty much do anything a human can do and better borderl super intelligence like how is that not a major national security issue I think it is personally um I think it might be in the best interest if AGI is an open source and that might be a controversial opinion but take this quote okay if AGI is going to be as powerful as nukes as some people would say then why would it be open source and free for everyone to use because we know what humans are like so I mean I do think also that the technology should be distributed in a safe manner for everyone to access but you know uh to the point where you know everyone has access to the raw files I think it's going to be pretty hard I know that open a is going to find a way to get this technology out there in a safe Manner and but it will be interesting because like I said times Chang maybe they're going to be like AGI is so smart this level is so smart that we can't release it we have to you know only assign it to this kind of uh narrow AGI I guess you could say but um yeah it will be interesting to see how that thing actually does develop because you know like I've seen and what led me to believe that AGI might not be developed is cuz how they spoke about voice engine and they said you know we're not releasing voice engine we're only releasing it for certain use cases I think the AGI is probably going to be released in that same way where they're like look we can only use AGI perhaps in self-driving cars we can use it for research papers in certain Laboratories and we can use it in certain AI systems but for the average person trying to do like a random task probably not but I think the benefits will be there I just think it will be on the back end just like voice engine is uh for certain use cases in certain Industries and I think you know I think I think that's okay I I don't think that's a horrible thing so then open AI actually rolled out a secret model they said majorly improved gbt 4 Turbo model available now in the API and rolling out in chat gbt do apologize for talking really quickly but I am extremely tired so that is the point is okay is that we have a situation on our hands because this is fascinating so they rolled out a new model and you might be thinking ah it's just an updated version of GPT 4 however okay open AI have been secretly updating their models I wouldn't say secretly but they've been continually updating the models again and again and I think it's kind of frustrating for people because we don't get GPT 4. 5 but I I'm pretty sure that this what we're getting is basically GPT 4. 5 because we've had so many GPT 4 Turbo you know 0125 this update that update um and I'm going to be honest okay I think that this model is actually an actual Improvement but of course we know that gbt 5 is coming so a lot of people right now when they see this they're not really that hyped up however benchmarks say otherwise if we take a look at this bench Mark right here we can see Nick Doos tweeted gp4 is back coding if coding benchmarks are to be believed it smashes CLA in previous GPT 4 versions We can see gbt 4 Turbo 2024 of the April the 9th we can see that the pass is 83. 8 45. 9 and the other pass but we can see compared to gbt 4 Turbo claw 3 Opus gbt 4 mro large claw 2 Google J 1. 5 uh so yeah we can see those that's crazy that Mr L is actually beating je 1. 5 and Claud son and Je that's pretty crazy but the point is that uh yeah we're getting major upgrades across the board and I think it's going to be kind of interesting if we're actually going to get just like a fin versus gbt 4 now what I think actually happened was I think open just had this in the locker and I think they waited for people to release their systems and then they were like ah you know we want to remain State ofthe art we've seen too much backlash about people stting that you know we've been surpassed so many times um look at this guys alpe here tweeted I ran human eval on the base at the Plus on the new GPT for Turbo 2024 April on the nth and it ranks number one on both we can see here that the pass is 86. 8 the pass one

### [10:00](https://www.youtube.com/watch?v=X3BYP35vCcM&t=600s) Segment 3 (10:00 - 15:00)

here is 90. 2 so a stark difference which is pretty interesting because on this leaderboard okay which is kind of fascinating I'm going to get to another leaderboard that some people say is the only one that matters but we can see that Claude 3 Opus is in fourth place deep seat Coda wizard Coda um which is pretty crazy so I'm guessing that what we have here is a situation on our hands where maybe GPT 4 Turbo has retaken the lead and that would be kind of fascinating because I think it would be annoying for people cuz people are like ah if gbt 4 still in the lead that means we're going to have to wait more time for gbt 5 which is not what people want to do a lot of people do like AI news and AI hype but trust me guys there's still a lot going on in the industry so even if this isn't some major news there are some changes that you might notice using it today one thing I actually did notice is that if you're using chat gbt and you're trying to talk to it uh get certain characters it just won't refuse as many responses so now will try to make an image based on what you tried to do so that is something that I did notice and I'm pretty sure that was uh something that they did add with the update then of course we actually did to get mistra if you don't know who mistra mist is a French Open Source AI company that just dropped open source stuff just out of nowhere they just literally just tweet it out they don't say introducing mistal new model they just literally dropped the torrent link uh let people go ahead and do with it and they let the work speak for itself so uh this is mra's 8 * 22b this is a mixture of experts model it's pretty crazy we can scroll down and see some of the benchmarks here someone Jan P Harris ran the AGI eval results on mistra new and you can see right here that it beats quen 72 base if you didn't know quen 72 base was actually something that was released the other day and it was pretty s of the I was actually pretty excited for this cuz I was like these guys have actually produced some really cool models in the past but now mistra okay mixture of experts 8 * 22 billion parameters boom uh we can see that the AI eval base zero shot is up to 52% that is pretty crazy okay ladies and gentlemen this is another stadi art model and I'm wondering how quickly a m going to move CU it seems like they just continually improve we can see mraw 7B M draw 8 time 7B and I wonder I wonder because this is important you see what we have here is mra's mixture of experts model but okay if you were paying attention to some of the news llama 3 comes out I think next week or you know very soon there's been some rumors so we can't confirm nor deny that it's coming out next week but the point is that is it better than this because Mi just released this out of nowhere so I'm wondering if llama 3 is going to be better than this and I'm wondering where those benchmarks going to be because as far as I know they've been working on LL 3 for quite some time and they've devoted a lot of money and a lot of researchers on that product so it'll be interesting to see if open source beats out you know Meta Even though it's open source again but you know I guess we're going to have to see on the benchmarks here you can see that we also did have command R is a RG model by coh here which is a really important architecture because essentially allows you to use your models with basically without any hallucination not hallucination as in like it's perfect but it's basically perfect in terms of the citations and the uh things you can use it for and basically a lot of people are excited about this because this is the arena ELO leaderboard which some would regard as the leaderboard that is actually the real one the reason some people regard this one as the real one because whilst benchmarks are important for you know certain tasks some benchmarks are flawed some benchmarks are also very I guess you could say biased in certain aspects and some people would ask that some tasks are also just fine tune for the benchmarks just so that people can you know say that our models you know inherently subjectively I mean objectively better than theirs if you look at the benchmarks but some people are like look forget the benchmarks forget what the papers say look at how people are actually using the model do people actually believe that this model is better than it and if so it will show on the arena Elite ELO and so far it seems that Claude 3 Opus um seems to have that cake and of course command R was there and people were saying that look this also defeated GPT 4 which was uh pretty surprising so yeah now I have so many other stories that I do want to cover but I really just want to get to the meat of the video which is this okay so it says open ey and meta have new AI models capable of reasoning now this is insane okay sounds boring but it's insane because uh this means that we're going to unlock some new capabilities because it means his model is going to be a lot smarter as samman said in his uh you know conference he's had like a billion but he said that mod is going to be smarter so basically it says Executives at open Ai and meta both signal this week that they were preparing to launch the next version of their llms the system that power their generative AI applications meta said it would begin rolling out llama 3 in the coming weeks but Microsoft said it will be you know rolling out the model called GPT 5 soon which is extremely vague which is uh just very teaser is and it says we are working very hard and figure out how to get these models not just to talk but to actually plan to have a memory vice president of AI research at meta so it'll be interesting to see if meta actually you know does stick to open source or what if meta just releases like their own chat B which is just like chat gbt like what if they actually just do that but yeah they said uh actually

### [15:00](https://www.youtube.com/watch?v=X3BYP35vCcM&t=900s) Segment 4 (15:00 - 20:00)

to reason plan to have memory it says open eyes Chief Operating Officer Brad light Capal the financial times and the next generation of gbt will show progress on solving hard problems such as reasoning and reasoning is pretty crazy because it allows you to do long things and when I say long things I mean like do steps and steps to allow you to achieve a lot more complex tasks and the problem is LM strugg struggle with reasoning because inherently by Design you know wherever you stand on the argument okay the architecture basically means that these things predict the next tokens and some people say it's just predicting not predicting the next token wherever you stand the point is that these things aren't really grounded in reality so it they they struggle to truly reason okay um and of course you know like they say here I think we're just starting to scratch the surface on the ability for these models to reason we've seen some examples of models reasoning uh like mice's kpu and Devon being able to you know do multiple different steps which is pretty interesting but the reason I'm so excited for this is because people have been wrapping GPT 4 the current versions of GPT 4 and they've been achieving some incredible results and with that does mean now that if we get other models that are even better and they're plug and played into these rappers I'm truly wondering how advanced these systems are going to be and essentially they say reasoning and planning are important steps to what researchers call AGI human level cognition because they allow chat Bots and virtual assistants to complete a sequence of related tasks and predict the consequences of their actions and this is very true a GI is coming sooner than you think and this is pretty big news and it says uh Met's Chief AI scientist Yin said current a systems produce one word after the other without really thinking and planning which is pretty true um and it said because they struggle to deal with complex questions or retrain or retain information for a long period they still make stupid mistakes adding reasoning would mean that the AI model searches over possible answers plans a sequence of actions build a mental model of what it effects and actions going to be and this is the big missing piece that we are working to get to machines to the next level of intelligence and L said it was working on agents that could for instance plan and book each step of a journey from someone's office to Paris in another in New York including getting to the airport and it says meta plans to embed its new AI model into WhatsApp and its Rayband smart glasses and I did talk about this previously cuz I actually did get a pair of these and they're actually pretty cool this isn't a plug but like if you've seen them if you use them you know that they're actually one AI wearable that I think people have sunglasses anyway so if your sunglasses did just magic have a camera inside it you probably wouldn't care you probably wouldn't be obliged to using it as with other air devices it's like an additional device that you have to carry whereas glasses most people carry glasses so I mean it's kind of fascinating so say I think over time we'll see the models go towards longer more kind of complex task and that implicitly requires Improvement in their ability to reason so yeah it's pretty crazy and he says we will be talking to these a assistants all the time on tile digital diet will be mediated by a system so this is clearly you know the next coming model here are going to be you know the next step up which is of course reasoning energetic Behavior which is planning and thinking and sequencing all the steps and thinking what's going on and mental models and that kind of stuff because whilst talking to your AI system and getting a response is good it's very basic in terms of like I talk to you I get a response I'm happy okay but you know being able to plan being able to reason be able to think okay I'm going to do this I'm going to look over all these things this is getting closer and closer to AGI so whatever models drop next GPT 5 and llama 3 I'm really intrigued because they kind of set the stage for what to expect especially since GPT 5 has high expectations that will be the benchmarks the other companies will be change chasing uh and it's going to be interesting to see uh where they are because if anything repeats itself we know that gbt 5 is going to be likely holding the cake in terms of you know the Benchmark for another year and a half so uh yeah that will be kind of interesting to see how that develops and if you wanted to learn more there was a paper here which basically says you know based on the analysis the paper concludes that despite its occasional flashes of analytical Brilliance gb4 is utterly incapable of reasoning now that might sound a little bit harsh but you know you can read the paper if you want more information and of course Microsoft are going to be investing $2. 9 billion in Ai and Cloud infrastructure in Japan while boosting the nation's skills in research and cyber security so uh if you work Bish on Microsoft maybe should be now because Microsoft are expanding Japan's cloud and AI infrastructure capacity so that's a pretty significant investment they're going to be trading 3 million people over the next three years giving them the skills they need to build and work with a Technologies which is pretty interesting they're actually opening up a Microsoft research Asia so that is going to be fascinating too and yeah if you're a software developer this is news that you probably don't want to see but you probably have already looked at this because you're a software developer so this is auto Cod revolver and this is H something you didn't want to see so it says presenting autonomous software engineer from Singapore takes in a GitHub issue and fixes it in minutes with a minimal cost 5 cents not good uh you know I'm not dogging on software engineers at all I'm not saying you guys are going to be completely out of work at all I think it's a really complex issue with how AI systems are

### [20:00](https://www.youtube.com/watch?v=X3BYP35vCcM&t=1200s) Segment 5 (20:00 - 21:00)

moving but the point is that it seems you know since the release of Devon like every single week we've got an open source bner that takes over the last one and if the trend continues by the end of the year we're going to have fully autonomous software agents that are going to be able to do pretty much like 90% of the tasks um whether we will get to 100% I don't know CU I don't know if there's any AI system that can currently do like 100% of all tasks I mean I guess some probably basic ones but I think the point here is that this is actually pretty interesting and I looked at a certain piece of the paper which was actually quite this is the fascinating part of the paper it says interestingly our approach resolved 67 GitHub issues in less than 10 minutes each whereas developers spent more than 2. 77 days on average so um it's pretty crazy in terms of what it's able to do so I'm wondering okay if Devon's next update they're going to have a benchmark which is like oh we're at 40% and then another Benchmark a year later oh we're at 70% and then like chat GPT comes out with like GPT coder or something and they're like oh we're at like 85% or something like that so it's pretty crazy because this is actually an agent which means it goes it plans it actually figures out what to do so this isn't just like an LM system where you ask it what to do it like plans everything and it actually solves everything so I mean yeah the industry is is kind of interesting and I want to see how the industry reacts to this because this is real this is happening this is going to be something that continues to go on um still something to ignore the point I'm trying to make here but yeah let me know what you think about this uh do you think uh new gbt 4 Turbo model if you want to know how to use that you can check out the second Channel I uploaded a bunch of tutorials there including how to use the new command r model and how to sign up for Gemini 1. 5 Pro oh yeah by the way I you know what kind of channel would I be if I didn't tell you guys uh Gemini 1. 5 Pro is now available in 180 countries and for some reason if you're in the UK and I don't know if it's just me but anytime I click the link I don't really have access to it in the way that other people have access to it so what we want to do is just use a VPN um or just use it in the other vertex AI Studio which does work as well so you can finally get access to that million context window which is pretty fascinating so if you enjoy the video let me know um and I'll see you in the next one

---
*Источник: https://ekstraktznaniy.ru/video/14398*