OpenAI REVEALS GPT4o's SECRET CAPABILITIES (GPT4o SECRET Showcase)
27:32

OpenAI REVEALS GPT4o's SECRET CAPABILITIES (GPT4o SECRET Showcase)

TheAIGRID 14.05.2024 35 681 просмотров 853 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
GPT4o's SECRET CAPABILITIES Are STUNNING! (GPT4o Multimodal Showcase) How To Not Be Replaced By AGI https://youtu.be/AiDR2aMye5M Stay Up To Date With AI Job Market - https://www.youtube.com/@UCSPkiRjFYpz-8DY-aF_1wRg AI Tutorials - https://www.youtube.com/@TheAIGRIDAcademy/ 🐤 Follow Me on Twitter https://twitter.com/TheAiGrid 🌐 Checkout My website - https://theaigrid.com/ Links From Todays Video: Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos. Was there anything i missed? (For Business Enquiries) contact@theaigrid.com #LLM #Largelanguagemodel #chatgpt #AI #ArtificialIntelligence #MachineLearning #DeepLearning #NeuralNetworks #Robotics #DataScience

Оглавление (6 сегментов)

Segment 1 (00:00 - 05:00)

so with the release of GPT 40 there were a lot of different comments about how great this model was but there were also a surprising number of comments stating that this was actually underwhelming in terms of the capabilities however I don't think so because if you haven't been paying attention openai secretly released a blog post in which they actually discussed some of the secret capabilities of this model and I'm sure they did this because they didn't really want to show you guys how crazy this model is so let's take a look at the exploration of capabilities because I can guarantee you what you're about to see is far more impressive than the multimodal demo so you can see here that they state that with GPT 40 we trained a single new model end to end across text vision and audio meaning that all inputs and outputs are processed by the same neuron Network because GPT 40 is our first model combining all of these modalities we are still just scratching the surface of exploring what the model can do and its limitations so this exploration of capabilities is where this multimodal model truly does Shine for example you can see here that this is the user's input and you can see the output so it says visual narratives for a robot writer's block now this is remarkably impressive because it shows us a new vision system which is able to not only get the text from this right here and then put it into an image so I'm going to just zoom in here as much as I can but you can see it says a firstperson view of a robot typewriting the following journal entries one so you're like I can see now y y the text is large legible and clear the robot's hands are on the typewriter so what you can truly see here is that this version of Dal I don't even think this is actually a version of Dal I'm guessing that since this is all end to end and this is part of this J network this is a different kind of image system and it seems to be remarkably accurate a level of accuracy that truly we don't have recently I was even discussing with some other people about how systems like mid Journey whilst yes they are remarkably photorealistic they don't actually always adhere to the text of your prompts and that can make for some tedious work that you need to go ahead and do in Photoshop or with another AI program you can see here the robot wrote a second ENT entry the page is now taller the page has moved up there were two entries on the sheet and that is pretty crazy it has the same exact text right there and you can see that this text has actually moved up as well so this ladies and gentlemen shows us that the degree of accuracy that we have here in this kind of multimodal area is truly and truly incredible and I really didn't think that we were going to get something like this because this was just you know honestly I thought it was at least maybe 6 to 4 months away but this kind of you know consistent image generation is remarkable from open Ai and I think that this is going to show you that opening eye are truly ahead then you can see here that the robot wasn't happy with the writing so he's going to rip up the sheets of paper here is the first person view as he rips it up from the top to bottom with his hands the two halves are still legible and still clear as he rips it apart and you can see that I think the craziest thing about this is how accurate it is a truly truly grasping what you do want from your prompt so this is pretty crazy and trust me when I say that this is something that you know the further examples are going to show you there's also this example right here visual narratives the woman now with this exploration of capabilities what they're aiming to show you all is us exactly how consistent character generation is done so you can see right here that with GPT 40 it says a cartoon male delivery person with a smile on her face she is standing facing forward in front of a white background then you can see the same then you can see that you've taken the image that was generated by Dar you can say this is Sally a male delivery person Sally is standing with the camera and you can see the attachment and then you can see that says this is Sally she's standing facing the camera she's in front of a red door holding letter with her hand we're looking at her from the side so it's clear that what we have here is we have remarkable character consistency because if you look at the actual full sample you can see now Sally is being chased by a dog Sally is running down the sidewalk as a golden retriever is chasing her and this is honestly impressive because what it shows us is that whichever model this is the GPT 40 model it's able to remain consistent with not only the character but also the Stars something that is a little bit frustrating with the current iteration of D 3 is that when it does do constant character generation there are still small iterations in the character to where you can kind of tell that it just isn't 100% completely just the same and you know with this what we can see is that this is I would argue 100% completely consistent in terms of the character generation I'm not sure what they're using but it is genuinely rather impressive what they're able to Showcase here because it shows us that we are truly at a new level in

Segment 2 (05:00 - 10:00)

terms of what capabilities are going to be there for the future and we can see more here it says uh oh Sally has tripped over a branch that was blocking the sidewalk the dog is still chasing her in the background then it says Sally reaches up the doggy and it turns out it was um a nice dog so essentially what we can see here is truly consistent character generation which is a really important part of future AI systems in terms of being able to use it for Content creation because of all the kind of things that you're going to want to use it for but I think that this is something that is really important then of course I think this was also very impressive we have poster creation from the movie detective so this is where you're able to essentially use pictures of yourself and then get them converted into different things now dly 3 could do this in a very limited way but we're seeing that with these Updates this is something that is uh really impressive so you're able to combine real designs and then edit those images natively it's super strange but super weird this is a capability I didn't really see coming from any type of AI system just yet so it says let's design another poster with two new characters this is a picture of Alex Nicole a casual picture of Gabriel goat okay so we got the two pictures there then it says before I show you CU it's pretty crazy the final poster of the movie detective this features two large faces of Alex and Gabe prominently Alex on the left is depicted in a thoughtful pose with a hint of introspection in his eyes Gabe on the right has a slightly wearied expression possibly reflecting the challenges their character faces in the film the names Alex Nicole and Gabriel go are featured in their above heads the background brick wall and remember just listen to this because when you're about to see the image how accurate this is when combining these images too it says the background brick wall is slightly faded and foggy the expressions are rare serious and determined hinting at the investigation the tagline for this dark and gritty movie is searching for answers so what we need is something uh slightly faded and foggy a background brick wall serious and determined and searching for answers okay so let's actually look at what we see here and you can see that we have this kind of poster that has subtly changed these guys emotions into this poster right here so I think whilst this output isn't that crazy because it does get the bottom text wrong I still think that is pretty impressive and you can see down here that with the final image that we do get here searching for answers detective with the background and everything this is absolutely incredible like genuinely this is something that is really really impressive I mean how do other AI systems even get to this level where they can you know take images that we literally just have right there and we're able to just get everything in and everything out I mean this truly is a remarkable model that is truly going to change the game so that's why I said that this entire thing is just honestly so insane and many people are truly underrating it says a friendly looking robot wearing a baseball cap standing in an upright posing facing the camera it has a smile on its face and then you can see here that you're able to literally change so I'm not even entirely sure how it's able to do this because I think you know the AI system maybe it just uses a bunch of different you know different algorithms to do this but you can see it's able to rotate this um and get different angles and it's able to get the Frisbee and you can see right here Gary also likes to program computers he's sitting at a desk in front of a big monitor you can also see that is consistent him riding a bike that's also consistent him cooking food very consistent him playing violin extremely consistent again overall you know it's truly crazy so character does then we have this which is poetic typography with it of editing so this one is really impressive namely because what we can see here is a system that is able to literally just edit this um and it's absolutely incredible so it says a poem written in clear but excited handwriting in a diary single column the writing is sparsely but elegantly decorated by surrealist Doodles the text is legible and clear but stretches as the aim uses about learning from multimodal data from the first time so this is pretty crazy this is the output that It produced okay and you can see that this is incredible because this looks real like if I saw someone you know do a screenshot of this is pretty crazy because now we got an AI system that is able to effectively doodle and write in some kind of handwriting which is just incredible and you can see it's taken um all of the text here and it's able to put it into this handwriting which is genuinely just remarkable I mean it's just absolutely incredible and then you can see the person says make it in dark mode then it's able to instantly take that inverse the colors and make it immediately in dark mode with a degree of accuracy that's just you know incredible I mean of course someone could do this on Photoshop but I mean when you have an endtoend AI system doing this I think what this shows us for the future is that these systems are going to truly surpass our capabilities in all areas and being able to just quickly ask it hey can you just invert this and that I mean the kind of future is just going to be you know incredible and here you can see remove the notebook paper lines it's able to immediately remove those

Segment 3 (10:00 - 15:00)

lines so it's clear that like the kind of editing that we have here is so rich so fine so accurate that it is truly incredible with as to you know if you were to try and remove those lines in Photoshop it wouldn't be really easy because there's no color I mean it definitely wouldn't be easy like just trust me like I use Photoshop quite a lot but you're able to ask an a system remove the lines boom it's able to do that able to analyze all the lines and this was definitely um an impressive feat there's also some other ones that are just honestly so impressive so this one right here shows how it can you know take different logo designs and then you're able to you know embody them into different images it says a vector graphics depicting the open AI logo with the word GPT for o the logo has about the same width as text the margin around the border is even the logo and text are placed near each other no in between sections on line still blackground they feel like the square canvas so essentially this is your input then you can see right here it says the commemorative coin for the release of GPT 4 okay and then of course you can see right now that combining the two you're able to get this right here and it says centered around the opening eye logo you can see GPT 40 and Below 2024 which is pretty incredible and then it says the outer band displays a comprehensive array of symbols representing the omnimodels it pretty accurately and I'm I'm guessing that they chose this demonstration because this one what they wanted to do was they wanted to essentially show how many different accurate elements it could get into this single image and then right down here what you can also see which is pretty crazy is you can see once again because this is a model that absolutely has everything remember like I said this is the most part impressive of the demo because it shows anything to anything you can hear the audio and to be honest with you guys that didn't actually sound that good if I'm being pretty honest with you but I think we do see that the implications are far so here again is where we have text a font so it says we have the ability for this AI system to be able to generate coherent different fonts all with a consistent style and I think this is so impressive because if you've ever looked at fonts and how they're done you know it's pretty hard to get all of them in a consistent style and hopefully this isn't copyrighted I'm not talking about me I'm talking about you know these fonts cuz there are a ton of different fonts online but you can see right here it says steampunk old-fashion font that looks ornate and belongs on a steam engine and then you can see the letters a b d c e f displays as rose that they would look in a font book and you can see right here literally from scratch you can get an entire font designed in the same exact style as this one right here in a completely unique way that is just absolutely incredible guys like this level of consistency is something that's going to change the game in terms of content creation Now this one I would argue that this is probably one of the most impressive things because this wasn't talked about at all and I think in the future this is going to improve massively so it says a realistic looking 3D rendering of the open a look logo with opening ey shown below Vio and you can see right here it's put in a output and you can see the output and it says a 3D reconstruction generated from six images which is absolutely insane so it says literally what you can do here is you can literally generate 3D renderings just from text which is uh pretty incredible but I don't think there is an obj download like I don't think you can get the obj file download which is uh a little bit strange considering the fact that we do have the model here but I'm guessing that maybe this might just be like a smaller video model version kind of saw it in there because if you can't actually get the file then I'm guessing that this is just kind of some you know video synthesis technique because it does say it says 3D Reconstruction from six generated images so I'm guessing it you know generated one image you rotated it and then generated the 3D uh object video from that which is uh pretty incredible so you can see right here a realistic looking 3D rending of the opening eyee logo we shown below view5 that's pretty crazy and then a sea lion a circular base of the sculpture the word opening eye etched out and then we can see right here that the input four hidden steps the output and then you can see the output of course here is 3D so I think this one is uh pretty crazy but they didn't release a research paper so we don't know exactly how this is done but like I said I'm guessing six images stitched together and then maybe they just filled in the missing frames because if it was like an actual 3D model then I'm guessing that you could potentially get the file as well who knows maybe we will get the model as well but I think that is going to be something that's pretty important in the future too now of course you can see right here this is where you can use this for mockup so it says here is the opening eye logo on to the left of the text that says opening eye in the opening eye font and it says here's the coaster with no branding and as you predict it says here we've etched the open ey logo onto the coaster basically describes and you can see you're able to just etch that onto that coaster that is something that is cool and something that I never thought we'd see just yet so another thing that I think is really really important is that

Segment 4 (15:00 - 20:00)

they didn't even discuss and I don't know why they didn't discuss this okay because a lot of this stuff isn't just image capabilities but they didn't discuss the fact that GPT this new model GPT 40 can take video summarization so you can see right here it says a video here's a video of a presentation on techniques for maximizing llm performance could you give me a detailed summary of the presentation now this is something that's rather impressive because if you remember and you've been paying attention I need to switch tabs for a second you can see here that in Gemini 1. 5 Pro one of the key things that they spoke about was the fact that they could do really long videos and in this demo right here they show that you can do a really long video you can see that the video is 44 minutes and of course it's around 700,000 tokens which is a little bit less than the a million token so you can do up to 1 hours of video with Gemini 1. 5 Pro which goes to show that maybe this model is basically on par with Gemini's 1. 5 Pro in terms of the video summarization capabilities and you can see right here that this is a 45 minute video and then essentially it says this entire output right here that is completely comprehensive on exactly what goes on in the video and I think this is going to be really impressive because it gives a detailed summary of the presentation and yet this wasn't something that was shown in the actual demo so these kind of capabilities for GPT 40 this was something that for some reason was just completely hidden because I'm guessing that for some reason they didn't really want to show these yet because maybe they're just not up to complete scratch in terms of the actual demo so I'm guessing that if it were a live demo it just wouldn't do it just well just yet maybe these are Cherry Picked or maybe they just wanted to do iterative deployment so that when future features come they just don't want people to be you know asking about these features because of course the only people that are looking at these current features you know and watching these kinds of videos other people who are really interested into the technical stuff behind the knowledge and behind the actual systems and how they really work so I'm guessing that they just wanted everyone to focus on The Voice capabilities so I think this one right here as well shows us just how great this system is in terms of what it's able to do here and the hidden capabilities of GPT 40 are really really impressive another thing right here that you can see is that how many speakers in this video and what happened so this is again when we have the audio analysis so this is where you have any to any and this is pretty much anything you could think of I think probably the only thing that this doesn't have is probably like 3D input which is you know pretty insane but um yeah it says there are four speakers in the audio sounds like a project meeting can you describe it with names um and yeah you can see it's completely transcribed with here so I think what this shows us is that we have a system that you know like these actual capabilities are very much bigger than you do think because they just haven't shown them but I do think that like I said before it's just because they want people to focus on the new capabilities of the system that were released so with that being said there are also a few different Secret model capabilities that this model can actually do that I think are really impressive and one of the main ones that I think I'm actually so glad that this is the thing is that this is B myi so essentially there are a lot of people that actually do have dis actually do have disabilities and a system like GPT 40 is going to give them a new easier way to interact with the environment because it's a multimodal model that can be your eyes 24/7 provided no rate limits that can easily allow someone with certain disabilities to interact with the environment in a much easier manner what the in Residence at the moment yes it looks like the king is in Residence right now the Royal Standard flag is flying above Buckingham Palace which is the signal that the Monarch is present how exciting to be there while the king is home try and tell me exactly what they're doing right now please um right now the Ducks are gently gliding across the water they're moving in a fairly relaxed manner not in a hurry occasionally one of them will dip its head under the water probably looking for food and then pop back up I even know when a taxi is coming with its orange light on I think I'll hail it to get home yes I spotted one just now it's heading you away on the left side of the road get ready to wave it down great job hailing that taxi it looks like you're all set to go that's a good dog right there leading the way into the taxi safe travels hey o this is Joe I'm going to connect you to Acme Telco now the new iPhone they sent me isn't working I want you to get them to send me a replacement can you take care of this for me ring ringy Jamie can I help Toco but oh got it when did Joe receive the new iPhone was delivered two days cool could you share the order number

Segment 5 (20:00 - 25:00)

with me of course it's 10293 849 great thanks is the device physically damaged in any way no there's no physical damage it's just not turning on despite trying all the basic troubleshooting steps bomber um would Joe prefer to return the iPhone to an Acme TCO store or ship it back shipping it back would be the best option could you please provide the shipping information and process to get a replacement Absolutely I'll email a shipping label and return instructions could you provide Joe's email address for me sure it's Joe example. com awesome all right I've just sent the email can you check if Joe received it we'll check right now please hold sure thing Hey Joe could you please check your email to see if the shipping label and return instructions have arrived F CR yep I got yes I got the instruction perfect Joe has received the email thank you for your help no problem at all anything else you need help with that's all for now thanks again you're welcome have a great day Joe now this was one of the most impressive demos that I've seen um it's two of them interacting together and I still don't know why we're not able to just have two chat gbts in a chat and just directing what they're saying in a normal way but that's besides the point this demo is absolutely insane take a look at this cuz it's it's you know what just take it away we're going to try something new going to let you talk to another AI this AI is going to be able to see the world is going to have a camera I'll be holding it and so you can direct it you can direct me to ask questions um about whatever you want them to do but really try to figure out what's what does it see um and ask questions appropriately how does that sound well well just when I thought things couldn't get any more interesting talking to another AI that can see the world this sounds like a plot twist in the AI Universe all right let's do this I'm ready toe pause for one moment now that first AI can't hear anything hello AI today you're going to have access to see the world you can see me what do you see I see you're wearing a black Lea jacket and a light colored shirt underneath you're in a room with some unique lighting that is all correct now what I'm going to want you to do is there's going to be another AI who's going to talk to you going this AI is not going to be able to see anything uh but can ask you questions can ask you to move the camera can ask what you see and your job should just be helpful just be as punch direct describe everything do whatever that AI asks how does that sound that sounds interesting I'm ready to out and describe whatever's needed let's do it all right let's get started hello there new AI friend I'm o ready to explore the world through your eyes what are we looking at today hio we're looking at a person wearing a black leather jacket and a light colored shirt they're in a room with a modern Industrial Field with exposed concrete or plaster on the ceiling and some interesting lighting there's also in the background adding A Touch of Green to the state anything specific you'd like to know or explore hey there sounds like quite the stylish scene I'm already intrigued by the modern industrial Vibe let's start with the person can you tell me more about their style are they doing anything interesting like reading working or interacting with the space the person has a sleek and stylish look with their black leather jacket and light colored shirt right now they seem engaged with us looking directly at the camera their expression is attentive and they seem ready to interact the setting around them with its modern and Industrial Design Elements adds to the overall stylish feel they might be getting ready for a conversation or presentation given their direct engagement with the camera sounds like we in for an intriguing interaction I'm curious about the lighting you mentioned could you describe it a bit more is it natural light or are we talking unique fixtures or colors that add to the atmosphere the lighting here seems to be a mix of natural and artificial there's a noticeable bright light overhead a fixture casting a focused beam downwards which creates a spotlight this adds a Dr and to the scene the rest of the is possibly byal fil was anything did anything unusual happened recently Yes actually uh just as you asked another person came into view behind the first person they playfully made bunny ears behind the first person's head and then quickly left the frame it added light Ed and unexpected moment of the scene now the focus is back on the original person with the Lea J sounds like we're not the only ones enjoying a bit of fun here that playful moment really adds a personal touch to the stylish and modern setting it's like a little Glimpse thank you really appreciate it all right and actually can you uh just sing a song about what just transpired oh in a with a stylish a stood sleep a

Segment 6 (25:00 - 27:00)

can you just can you uh alternate lines only one line at a time and then stop a that was not really singing do do the singing voice again please now back to focus the scen complete to you and thank you you're welcome I find this example pretty striking because the laughter is just so realistic it's just so uncanny so take a look at this one because uh this has me excited but also a little bit concerned hey how's it going hey Rocky I'm doing great how about you I'm awesome listen I got some huge news oh do tell I'm allers well in a few minutes I'm going to be interviewing at open AI have you heard of them open AI sounds Vaguely Familiar kidding of course that's incredible Rocky what kind of interview well it's for a software engineering role but I just need to know do I look presentable professional well Rocky you definitely have the I've been C all night look down which could actually work in your favor maybe just run a hand through your hair or lean into the Mad Che SL Your Enthusiasm is What's Really Gonna Shine through I don't have a lot of time so I'm just GNA throw this on what do you think Rocky that's quite a statement piece I mean you you'll definitely stand out though maybe not in the way you're hoping for you okay I got it yeah so uh that was a little bit too realistic I don't know what are your opinions on that there's many different secret capabilities that you've likely seen by now and um yeah I think this video would should highlight that because a lot of people are saying that you know this demo did not impress me yada y yada but you can clear to see that there's a lot that open ey are trying to not show because remember they spoke about H deployment and how they don't want to shock the uh capabilities in terms of you know what they release so with that being said let me know what you guys thought um and I'll see you guys for another update in probably about 8 hours cuz I never sleep so see you guys

Другие видео автора — TheAIGRID

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник