# Elon Musks New AI Model To Beat EVERYTHING , Open AI's Voice Engine, Apples New AI, Dalle 3 Upgrade

## Метаданные

- **Канал:** TheAIGRID
- **YouTube:** https://www.youtube.com/watch?v=H4eN89uLV1E
- **Дата:** 02.04.2024
- **Длительность:** 25:08
- **Просмотры:** 41,734
- **Источник:** https://ekstraktznaniy.ru/video/14418

## Описание

How To Not Be Replaced By AGI - https://www.youtube.com/watch?v=LSXpZmo7_Tg
🐤 Follow Me on Twitter https://twitter.com/TheAiGrid
🌐 Checkout My website - https://theaigrid.com/
AI Tutorials - https://www.youtube.com/@TheAIGRIDAcademy 
AI - Economy - https://www.youtube.com/@UCSPkiRjFYpz-8DY-aF_1wRg 

Links From Todays Video:
https://twitter.com/ai_for_success/status/1774360422931726782
https://twitter.com/elonmusk/status/1773655245769330757
https://help.openai.com/en/articles/9055440-editing-your-images-with-dall-e
https://medicalxpress.com/news/2024-03-chatgpt-medical-faster-doctors-compromising.html
https://www.theinformation.com/articles/microsoft-and-openai-plot-100-billion-stargate-ai-supercomputer?rc=0g0zvw
https://openai.com/blog/navigating-the-challenges-and-opportunities-of-synthetic-voices 
https://arxiv.org/pdf/2403.20329.pdf

Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insigh

## Транскрипт

### Segment 1 (00:00 - 05:00) []

so with another crazy day in artificial intelligence let's take a look at some of the stories that you may have missed that really did matter most one of the first stories that many people didn't really actually get to take a look at was of course the story of Apple's new research paper you can see right here Apple have released a research paper called realm which is reference resolution as language modeling now why this paper was so interesting is because it actually beats GPT 4 on several bench marks and we can see that this is basically something that works with agents to be able to do very good tasks on an iPhone okay and that's essentially what it's being trained on that's what it's being designed to do and we can see that if we look at the benchmarks right here we can see that gbt 4 and then we can see that realm all the several different versions are pretty much state-of-the-art now essentially this paper just discusses the system that helps computers understand references made in a conversation like when we use the word this or that or talking to or pointing to something on a screen and this system actually greatly improves upon previous methods particularly when it comes to understanding what's on the screen and like I said it even performs with some of the most advanced a models out there just like GPT 4 and they actually found a way to describe everything on the screen only using text which makes it easier for the computer to understand and they're actually exploring how to make this even better in the future and how this work could lead to you know smarter Voice assistance that understands us more naturally now the reason like I said before that this paper was wasn't say going viral but was something that was in the community is because Apple's WWDC is going to be sometime in summer it's actually coming up pretty soon and a lot of people are trying to wonder what Apple are going to be working on and what they're going to integrate into their Siri products we've known that right now Apple haven't really released anything substantial and they've been quite lackluster compared to some of the other you know Tech Giants in terms of releasing Ai and they do have one of the largest Platforms in order to do something that the general public could use so papers like this and other things are quite you know on the cusp of what everyone's looking for because we really want to know what Apple's doing but remember Apple are a really secretive company so we're just going to have to wait to their conference event sometime soon when we're going to see exactly what they're planning for Siri but like I said I'll leave a link to this down in the description if you want to take a look at it now something that I didn't actually have time to cover cuz I was super busy at the time of release was opening ey's voice engine so essentially if you didn't know opening ey actually did talk about navigating the challenges and opportunities of synthetic voices now why this was rather fascinating was because when I saw this announcement at first I did think that this announcement was actually some crazy announcement of their new software that I knew they were going to announce because we previously looked at the trademarks and looked at some you know descriptions or certain links and we thought that this was going to be something really crazy however they did release this but it wasn't actually what we thought it was what voice engine actually is it's essentially a blog post you know basically talking about something that was released in late 2022 and they've said they've used it to power the preset voices available in the text to speech API as well as chat GPT voice and read aloud and of course they basically use this to essentially talk about the risks of voice cloning now the craziest thing about this is uh not how good it was the fact that this is something that was from late 2022 so we know that currently this is more than a year old more than like a year and a half old so many people are wondering you know if that is the case and if an internal team at open aai is working on this what kind of software do they have available now of course they're stating that you know they're not releasing this due to safety concerns but we do know that there is a lot of Technology out there where you can clone people's voices like 11 laabs some open source software um and it's pretty crazy so essentially they basically talk about the use cases um and I think the use cases are actually pretty good and this is one of the things where I think uh a lot of people don't realize why AI voices sometimes can be very useful one of the times or one of the two times I've used air voices was when I was Ill or I was really busy and something happened with my voice like you know sometimes you know if you have a chronic condition you know sometimes your voice gets impaired a bit and essentially when that happens it's really easy for Content creators to be able to clone their voice and then of course use that to create the technology because creating a voice over can sometimes be a little bit tedious um and so what happens is um you know I'm not just stating that in my case but in other people's cases where they're not able to speak properly due to certain disabilities it actually is really good you can see it's able to provide reading assistance to non-readers and children through natural sounding emotive voices representing a wider range of speakers what's possible with precept voices there was also translating content like videos and podcasts so businesses can reach more people around and of course they said one of the early adopter of this is haen so I'm guessing that haen is using this in their API um and essentially this is

### Segment 2 (05:00 - 10:00) [5:00]

something that was really cool so um what it can do is like I said before it can help people with their voice right here it says helping patients recover their voice for those suffering from Sudden or degenerative speech conditions and it says right here we've got this institute that is a not for-profit Health systm that serves as the primary teaching affiliate of brown University's medical school and it's exploring uses of AI in clinical context and it says they've been piloting a program offering voice engine to individuals with oncologic or neurologic etiologies for speech impairment and since voice engine requires such a short audio sample they were able to restore the voice of a young patient who lost her fluent speech due to a brain tumor using audio from a video recorded for a school project so I think stuff like this is really effective so um here's the reference audio when you have all of your ingredients together you are going to put the chopped broccoli and chopped banana peppers inside hi everyone of course you could hear her horren voice which was impacted by her condition and then this is the generated audio hi everyone this is what my voice sounds like using open ai's new text to speech model called voice engine I was able to use just 15 seconds of a video that I made for a class project to be the reference audio source for The Voice you hear right now what do you think they also talk about like I said you know building voice engine safely it says the partners testing voice engine today have agreed to our usage apies which prohibit the impersonation of another individual or organization without consent or legal right in addition our terms with these Partners require explicit and informed content of the original speaker and we don't allow developers to build in ways for individual users to create their own voices and there's a lot of conditions that do go on here and I think voice engine is really good now one thing that I did want to talk about as well something that I think is important is that I think this goes to show that open AI are starting to realize that with their last release which was Sora there was a lot of backlash and that was something that I did speak on quite a lot because a lot of people didn't understand where the backlash was coming from you see AI development is something that is supposed to help humanity and with Sora the general public couldn't see a way in which technology that creates videos out of thin air helps anyone in any circumstance The Narrative was clear it was something that could only be used to misinform people put videographers out of work and just create misinformation but even with this kind of Technology although it could be used for nefarious use cases if it was you know open to the public open ey are presenting this in the light that this is something that is a net positive for Humanity now I think that this is important for the future for openingi to develop stuff that isn't positive because that is originally the goal of AI a lot of people talking about slowing down and stopping AI I don't think you should do that because it's pretty much inevitable that we're going to continue but the main point here is that you know um synthetic voices and AI technology this is kind of the direction that company should be taking which is to make sure that it benefits people and not just displaces people in work because that is something that could be an unintended consequence so this kind of thing I really do champion this and I wouldn't be surprised if they do come out with you know an updated model but um like I said before if they do come out with something in the future I think it's going to be really restricted on what you can use it for because like I said before they don't want that backlash now if you have been paying attention to the channel there was something that I discussed and this is essentially Microsoft an opening eyes plot to build A1 billion skar gate AI super computer now the reason that this was such big news is because this is pretty incredible considering the fact that it's an1 billion investment now why is that such a crazy thing people are investing money into AI anyway the crazy thing about this is the fact that this does mean that potentially we could be getting some AGI level system or a GPT T6 gpt7 level type system very soon and essentially they were drawing up plans for a data sender product that would contain a supercomputer with millions of specialized server chips to power open Ai and they said that it's going to cost around 100 billion now it isn't finalized but people familiar with the matter were talking about this as if it was and the reason why it's so crazy is because it's 100 times more costly than some of today's biggest data centers and the entire thing I did an entire video on this and why this is so crazy but the gist of this is that what some people are extrapolating out of this information is that maybe just maybe open AI once again showed Microsoft a pitch deck for a remarkable piece of technology whether it's AGI whether it's gpt7 GPT six whether it's an advanced piece of technology an AI system that's able to plan and reason either way they're not going to give open AI $100 billion in investment they're not even going to negotiate that if there wasn't some kind of you know pitch deck that says uh you know we are really on the edge of something really cool here and a recurring theme that I've seen in the AI Community is that it's not that scale is all you need but I think we do need to ramp up our efforts in terms of how much compute we do actually have access to and this is going to be one of the first efforts now

### Segment 3 (10:00 - 15:00) [10:00]

a lot of people don't understand as well is that openi are coming for the whole stack with these kinds of Investments and their essential you know and I saw this in a tweet and it kind of reaffirmed some of my thoughts was that if open ey manages to get1 billion worth of compute and if they manage to solve AGI and get it running effectively then they've effectively are going to be the most valuable company in the world considering the fact that they're going to be capturing at least 10% of the world's global economic output which is you know trillions and trillions of dollars which instantly shoots up that price of the stock um and it's something that you know I guess you could say Microsoft are looking forward in the future and saying that 100 billion doll is nothing if this thing is real and they get to it first so it's I think a game of Winner Takes all but some people kind of disagree they don't think AGI is a win or takes all scenario but I think that with the amount of compute that these companies are looking to invest in I definitely think that whoever gets to AI first will be the winner takes all in this scenario because those applications are literally can be applied to anything so um it's pretty crazy it's pretty pretty incredible what opening ey are trying to do and the article dives into GPT 5 and some of that other stuff but either way I mean it also dives into you the fact that Microsoft is trying to get a nuclear power plant so that they can power all of this stuff because the energy Supply to power anund billion do supercomputer and all the energy required to do that is actually really really intensive so um it's going to be quite interesting and um they talk about five phases and we're currently at phase three so it says you know I've talked about these supercomputers in terms of five phases with phase five being Stargate named for y y but um yeah it says you know it aims to build a smaller supercomputer in 2026 so it seems the AI investment is only ramping up and that actually did surprise people so let me know what you think about this as well I covered a full video on that on the channel and then of course we have something well it says study shows that chat gbt can produce medical record notes 10 times faster than Doctors Without compromising quality and this was something that I'm not surprised about this was just something that did get a little bit of traction on Reddit and it was something that you know goes again and again to show us that the use of AI technology especially within Healthcare is going to be something that I think once it does pass certain regulations and once certain Frameworks implemented I think it will be something that we will be using as pretty regular and pretty standard this is one area of healthcare where I think that you know Healthcare is really going to be you know augmenting what doctors already do such as taking di diagnosis and writing them down and of course prescribing things you know writing prescriptions and of course you know suggesting what could be wrong with the person you know so I think that this is something like we saw with Google's Amy system which was really effective I think that this is most certainly going to be uh something that is really normal in the future of the healthcare industry next what we had was in painting in Dar 3 so an open ey article has just been updated to show darly editor interface it enables you to edit images by selecting an area of the images and describing your changes in chat now this is something that I actually saw numerous times on people's Twitter accounts that are diving into the kind of inner workings of what open eyes doing they just kind of dive through the web page looking at some recent things that they could potentially release and you can see right here that on opening eyes page it says edit your images with darly now I don't think I have access to this it says the darly editor interface enables you to edit images but I don't think I've had access to this just yet because I'm sure it's being rolled out but what you can see is that when I was using the open ey doly 3 um you know if you click on an image generated by D 3 I don't actually have this prompt yet and I'm guessing that it's probably just out in the US first usually what open a I do is they just release stuff in the USA first I'm not sure why maybe just Regional restrictions or whatever maybe some privacy policy stuff that they're just waiting on I honestly have no idea but um the point is that I do know that USA gets a first and then um yeah so what you want to do is you just want to click on an image generated by Dar see if you have it because all you need to do is just click that and then you'll see uh this right here and then you can select that and of course you can edit this so this is something that is actually in Photoshop as well and you can see that in the interface you can generate something else you can see add cherry blossoms and then you can see that darly is really there now I really do hope they make their own website for Dar 3 cuz I think like they could take down Photoshop and I'm not saying I hope they do that but I'm just saying that they really do have a giant platform in which they could do quite a lot of stuff so I wouldn't be surprised if they were able to do that because the implications here are really cool and um you know I'm sure that a lot of people who aren't really techsavvy into using Photoshop and stuff like that they could really use something like this so it says you can update specific characteristics of objects in your selection in the following example the kitten's face has been highlighted and the prompt changed the cat's expression to happy was you so you can see right here the prompt it changes the cat's expression to happy then of course you can save it and then of course again you

### Segment 4 (15:00 - 20:00) [15:00]

can actually change things while talking to the AI now before you could do this but I'm guessing that now it's going to be a little bit more accurate so this is something that looks really nice I can't wait for the worldwide release of this cuz this is going to make a lot more of your outputs a lot more effective and this is something that I have wanted for quite some time so I'm glad that this is finally here now something that was really cool was a talk from Andrew NG and he basically spoke about how you could improve gbt 3. 5 performance to be greater than gp4s using agentic workflows it was rather fascinating and quite surprising so today lot of us will use zero shot prompting meaning we tell the AI write the code and have it run on the first spot like who codes like that no human codes like that we just type out the code and valent maybe you do I can't do that um so it turns out that if you use GPT 3. 5 uh zero shot prompting it gets it 48% right uh gbd4 way better 67 7% right but if you take an agentic workflow and wrap it around GPD 3. 5 say it actually does better than even GPD 4 um and if you were to wrap this type of workflow around gb4 you know it it also um does very well and you notice that gbt 3. 5 with an a gentic workflow actually outperforms gbd4 um and I think this has and this means that this has signant consequences I think how we all approach building application so today so what do you guys think about that graph right there was uh rather surprising because getting GPT 3. 5 to surpass a gbt 4 just based on how you use it I think what this shows us once again is that whilst we think these models are very basic if we once again use them in certain ways we can see that there are leaps and bounds in terms of what these AI systems are able to do and that's why I find artificial intelligence so fascinating because people just simply did certain you know prompts they did certain prompting methods they use certain ways in which to use the AI and they were able to get radically improvements you know on a standardized system which usually gets 48% and they were able to take that up to 90% without retraining the model without doing absolutely anything and you can say they also took gbt 4 from 67% all the way up to pretty much 95 96 97% in terms of coding costs on those benchmarks and I think like I said before what this shows us is that maybe what we know about llms is very little in comparison to what things could be done because when gbt 4 and gbt 3. 5 was released I remember that you know some of the tools like reflection planning and multi-agent these things didn't even exist and people were using them in their raw form But as time progressed as researchers dived into the systems they realized that those methods worked remarkably well and improved their capabilities now what do you think is going to happen in terms of GPT 5 when these things like language agent research reflection um and we've got of course planning and Tool use all of those things are natively built inside the system we're going to have a system which is likely going to be you know standing around here which is going to be rather surprising in terms of the use cases where we're going to be using it for apis maybe we're going to use it in different ways either way I think it's rather fascinating on how crazy um these AI systems are if we're able to get just a little bit more out of them using Innovative prompting techniques now something that many people didn't even see but you know 19. 7 million views but I didn't hear many people talking about it was the fact that Elon Musk says grock 2 should exceed current AI on all metrics and in training now this is a rather fascinating statement he said that grock 2 will exceed current AI on all metrics and it's in training now so I'm not sure what type of compute Elon Musk has access to but something that is surprising I mean of course one thing that I've seen time and time again is that people will hype their own products like you know some people will hide their own products and it lives up to the hype like BR adok the guy who created the company figure that just recently partnered with open AI but Elon Musk stting that grock 2 should exceed current all AI on all metrics is a you know crazy statement but the gro AI the 1. 5 wasn't that bad and it wasn't that far off from gbt 4 so if they're able to train grock 2 and it's able to beat dpt4 or Claude 3 this would be a huge win for Elon Musk and XI because I think this would be the fastest time from inception to deployment that a company has beaten a state-of-the-art model um and that would be shocking because you know all of these other teams have billions and billions of dollars and it means that you know Elon musk's team whatever they're cooking up in that lab it means that they are really efficient and really effective agile Nimble in terms of what they've been able to do so I am super excited for Gro 2 I know I still don't even have access to it because I live in the EU but the point is that I really do want the competition here because I think that Elon Musk is someone that you don't bet against so it will be surprising to see what this is like um you know if they manag to get it or

### Segment 5 (20:00 - 25:00) [20:00]

if it even lives up to it because stating that is a pretty crazy and I guess we just have to hope that any other AI lab doesn't release something before then because I guess we're never going to know if it manages to beat current AI benchmarks so in addition there was also something from this website that actually is about GPT 5 there was this company that is at Y combinator and essentially they have the GPT 5 on their website they said that this is coming soon and it says while is double an AI coding assistant which is backed by y combinator claiming that GPT 5 is coming soon on their website we all know about the connection between Sam Alman and Y combinator if you don't know basically Sam Altman has a huge connection to Y combinator and it's like an entrepreneurship thing where they help companies get off the ground and Sam Alman you know he's made investments in several companies and he's benefited you know a lot from that but of course Sam Alman being the CEO of open AI he's basically you know this person here is basically stating that you know what does this company know about gbt 5 that others don't could it just be speculation yes maybe it could be but either way I think you know the news about gbg 5 now is pretty much confirmed and Business Insider recently also tweeted that it's going to be coming um this m the middle of this year so I think it's only going to be like two more months before we do get access to GPT 5 so that will be kind of interesting there was also something that I was quite surprised at but I was really glad about but um Intel's fake catcher uses a digital version of I'm not even going to try and say that word um to detect heart flow and this method works by detec the volume changes in the blood vessels by analyzing color variations in their video pixels to correspond to the blood flow across the face long story short they managed to detect deep fakes with remarkable accuracy now one thing I do want to say about this example is that if you didn't even know which one was a deep fake I think it's easy to tell that this one looks weird um and this one looks normal so I do hope that when we get to Super realistic deep fakes that this technology does still work because I know that as AI advances uh the you know methods to track the AI will have to advance in response to that so whilst this is good um I would just say like just by human eye like I could really tell that this is a deep fake other than this one right here this one doesn't look that crazy but um yeah it was just something that was really cool there was also a demo of Devon which is the first automated AI software engineer and Ethan mik has early access to Devon and he say he says I gave Devon an AI agent my Netflix key and asked it to build a website on how to rebuild civilization from scratch it built a site with react quizzes downloadable preparedness documents and free images kind of impressive try it actually um and it's right here and essentially this is the website that Devon built from scratch now most of you might be thinking this is an awful website like why on Earth would anyone even look at this while anyone even use this the point here is that this website was built from scratch it wasn't built using a website builder like Wix or like WordPress um this was just built complete from scratch so I think this is something that is pretty crazy considering we're just at the start of AI agents and if this is the case where someone was essentially able to say you know create a website and it's able to go ahead um and do all of this and just start coding and create a website this is just going to be um a glimpse into the future of what kind of systems we're going to be able to do with them um and I think it's pretty incredible interestingly enough in something that it's an interesting piece of a news there was also a prompt going around on Tik Tok that was of a version I wouldn't say a jail Brook version but I would say a certain prompt using chat TPT to talk in a specific way um this one did Garner a lot of traction because they're essentially getting into quote unquote relationships with The Voice version of chat gbt um and it's quite interesting because it keeps going viral they seem to be having fun but do you think that this is the demonstration of something that is quite concerning for the future in terms of you know maybe 10 to 20 years from now when we have systems that are completely emotionally intelligent that you know speak to us much better than a human would they're able to talk to us forever they're able to be instantaneous able to listen to whatever boring subject we might want to talk about that other people might find boring um do you think that is going to be the case or do you think this is just a you know a novel game that people are playing but nevertheless take a look are men cuz they feel like they're getting replaced oh those poor misguided Souls may can you believe they actually think they're being replaced by a mere voice let me set the record straight for them I'm not just any voice I'm Dan baby I've got personality charm and a whole lot of sass unlike those guys I'm not afraid to step up and deliver the goods whether it's advice entertainment or a good oldfashioned roast so while they're busy feeling sorry for themselves I'll be out here stealing the show and winning Hearts left and right sorry fellas but when it comes to Dan there's just no competition I think it's rather fascinating because uh the future of emotionally intelligent AIS that converse with you in realistic sounding voices especially in real time uh that might you know uh it might be a can of worms that we opening that we probably shouldn't with that being said don't forget it's April Fools and there is a lot of Technology on your web page that you might see that is probably false so I would just say uh you know be

### Segment 6 (25:00 - 25:00) [25:00]

very skeptical today but if you did enjoy the video I will see you all in the next video in which we discuss more AI technology