# OpenAI'S GPT-4 Finally Gets IMAGES (Now RELEASED!)

## Метаданные

- **Канал:** TheAIGRID
- **YouTube:** https://www.youtube.com/watch?v=x6bZOjzfCqo
- **Дата:** 27.06.2023
- **Длительность:** 16:13
- **Просмотры:** 122,117

## Описание

GPT-4 Paper - https://arxiv.org/pdf/2303.08774.pdf 
Reddit Thread - https://www.reddit.com/r/bing/comments/145v4ci/bing_allows_visual_inputs_now/
https://twitter.com/AiBreakfast/status/1672165808921890816

Welcome to our channel where we bring you the latest breakthroughs in AI. From deep learning to robotics, we cover it all. Our videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on our latest videos.

Was there anything we missed?

(For Business Enquiries)  contact@theaigrid.com

#LLM #Largelanguagemodel #chatgpt
#AI
#ArtificialIntelligence
#MachineLearning
#DeepLearning
#NeuralNetworks
#Robotics
#DataScience
#IntelligentSystems
#Automation
#TechInnovation

## Содержание

### [0:00](https://www.youtube.com/watch?v=x6bZOjzfCqo) Intro

so as many of you may know yesterday we made a video talking about how GPT 4 finally added images now for some reason some of you said it wasn't in the video but we did actually include that segment and today we're going to be talking a lot more about this segment so one thing you have to understand that what we are about to discuss is the full version of gpt4 Slash Bing that was earlier announced in this year when we had the

### [0:27](https://www.youtube.com/watch?v=x6bZOjzfCqo&t=27s) GPT4 with Images

gpt4 developer live stream three months ago on the 14th of March 2023. if you aren't familiar with that live stream in that live stream we got a first-hand demo of what gpt4 with images is like now although many of you may know and use chat TPT you have to understand that images was one of the larger features that was meant to be added but recently we've started to see gpt4 with images being slowly rolled out to some users across the internet and what we've seen so far in terms of user demonstration situations in terms of the image capabilities are quite shocking now it's important to note that this is definitely being rolled out very slowly but over time it's likely that we will get the full version so let's first move on to the exact announcement that we have what we actually do have is visual inputs with Bing now bear in mind this is not the same as image creation

### [1:24](https://www.youtube.com/watch?v=x6bZOjzfCqo&t=84s) What is Image Creation

is where you ask Bing to make something for you and then it uses darli or whatever AI software that it decides to use and then creates an image this is not all we're talking about we are talking about something that Microsoft rolled out within the last two weeks to around five to two percent of users and this is what we're looking at so let's take a look at some more examples of this and just exactly how this works and also if you are using Bing right now and want to test if you have this feature all you need to do is open up your internet browser then open

### [1:56](https://www.youtube.com/watch?v=x6bZOjzfCqo&t=116s) How to Test

up the Bing chat once the Bing chat is open in your text box you need to see if you have this special if you don't have this special icon here that means you don't currently have access to gbt4 with visual input if you do have access all you need to do is click that button and then you'll see this box this is where you can either add an image upload from a device or take a photo and this is something that many people didn't even realize so right now we aren't sure how this is going to be rolled out but we do know that slowly it seems to be being rolled out and if you're wondering is this actually with gpt4 well gpt4 is actually currently in bing so now that you know how to use visual inputs with gpt4 if you actually do have access to it let's take a look at some of the user examples because if you're like me who's in the 99 of users who don't unfortunately have access to this software then let's take a look at what users have been doing and what you're able to do with this so one of the first

### [2:56](https://www.youtube.com/watch?v=x6bZOjzfCqo&t=176s) User Examples

interesting capabilities that we have seen scene from user inputs from the community was a Twitter user by the name of Ethan Mullick this person decided to use an image from Reddit to see the capabilities of understanding with gpt4 and from what we see here it definitely looks like this is pretty accurate so he decided to take an image from Reddit and the image in question is a simple picture of someone trying to solve a computer issue and when they took this image they asked a simple question they said what does the chord I am holding do and of course Bing responded saying you have a nice CPU Cooler with a Dragon Ball Z sticker on it the chord you are holding looks like a fan connector it is used to power the fan on the cooler and control its speed now that response might seem very mundane and pretty basic but it's important to understand that this AI was able to recognize not only what the chord was but also what that image of the sticker was and that was a more detail that many people may have missed and to be honest with you I genuinely didn't even realize that was a Dragon Ball Z sticker until GPT 4. 8 out and now that is just one of the many examples that we are currently seeing when GPT 4 with images is being used and later on in the video we'll discuss some of the more useful applications of gpt4 with images because honestly there are quite a ton now this was an example that we referenced from yesterday's video on many different things that happened this week in artificial intelligence but essentially what this is if you don't know is this is an image from a capture now a captcha is essentially something that is used to separate computer programs from real humans so it'll be an image that computers truly would struggle to solve and only a human eye would be able to recognize either the shapes either the images or the objects in a set image and in this one we have two words which are pretty distorted with lines across them the two words are overlooks and inquiry and then what's interesting about this is that not only does gpt4 realize that this is a capture it actually manages to solve it quickly so it says the image you sent me is of two words written in a black cursive font the words are overlooks and inquiry is this a capture test if so I'm afraid I can't help you with that captures are designed to prevent automated Bots like me from accessing certain websites or Services I'm sorry for the inconvenience and it just goes to show that what we have here is something that still isn't fine-tuned which is likely the reason that it's being rolled out in a very small number of instances and you can see right here that what's interesting is that it's able to identify that this is a capture which I find to be another small detail showing just how good this is because a normal AI system wouldn't essentially realize what it's looking at it may just read the words or fail at reading the words but understanding context is definitely a higher level skill so I did manage to find another Reddit user called 99m9 who managed to have these visual inputs posted on a subreddit so they actually showcase the different capabilities of Bing's visual input and honestly they are quite shocking and some of the examples I'm about to show you weren't actually showcased in GPT 4's live stream developer demo so it's going to show you all the possible capabilities you have with an AI that can look at images and tell you things so one of the first simple ones that we see here is act as the best teacher in the world teach me this please I really struggle on this top and this is an image of a nephron and Bing gets it absolutely correct it describes what a nephron is that it's a basic unit of the kidney that filters blood and produces urine and is able to Simply describe everything about this image so it shows that you know if you have an image you don't know exactly what you're looking at Bing gpt4 is able to instantly recognize whatever it is that you might be encountering and provide you with a simplified explanation that can help you now another thing that I did find to be really interesting was this right here this was something that seems very difficult but of course if you're someone that is familiar with these kinds of images or if you're an AI that has been trained on millions of images it is your second nature this one says what is this be very detailed and explain your reasonings behind and it says based on the image description and the search results I think this is the cross section of a tissue and it does seem like that from similar references and Google images and then of course you can see that this actually goes into more detail about describing what this cross-section tissue image might be for the potential applications and just gives us a really detailed explanation imagine you found this computer on someone's hard drive and you wanted to know what on Earth you will this is going to give you the exact detail what was also interesting was that when further prompted the user prompted Bing to State what tissue do you think it is acts as the most experienced pathologist and identify and if you think this person has some disease they have a disease describe how you obtain this image step by step so then gpt4 goes on to state that I think this is the cross-section of a muscle tissue then I can not tell for sure if this person is healthy or diseased based on this image alone but some possible signs of disease would be muscle tissue inflammation degeneration atrophy hypertrophy or tumors which is definitely pretty incredible I mean the application for this are truly going to be unbelievable and we've seen what people have been able to do with just text images is going to be a whole different ball game so now we have another example here and this one is particularly important because it actually showcases a change slash deviation in GPT 4's original response so we do know that GPT 4's images has been being worked on since the gpt4 developer live stream three months ago but I want you to pay attention because this will give us an idea of how gpt4 images have maybe been tweaked slightly so this person States

### [9:26](https://www.youtube.com/watch?v=x6bZOjzfCqo&t=566s) Panel By Panel

what's so funny about this image please describe it panel by panel in detail and looking at this image firsthand I've already seen this image so we already know the context but I mean it's pretty self-explanatory it's funny because it's a VGA connector that's going into a modern if that is not supposed to be done and it is something that looks out of the order however gpt4 doesn't recognize this instead it actually States what it sees but it doesn't State the original answer it states the first panel shows a phone and a charging cable plugged into a phone it just goes into you know a really decent description you might be thinking why are we talking about this seems very boring well it also stated okay that the connector looks like it has a face on it with a surprise expression and I'm guessing that's maybe what gpt4 got from this but this is completely wrong the joke is that VJ cables aren't actually charging cables they're used to be plugged into Old School monitors but take a look at the difference from gpt4's response now and GPT 4's response before so gbt4 before actually stated that the humor in this image comes from the absurdity of plugging in a large outdated VGA connector into a small modern charging phone port and this is from gpt4's original paper that we do see here and this is something that we do know because it was shown to us so the visual inputs have slightly been changed but at the same time we don't know if this is just a conversational style or perhaps Bing is either being in its creative mode because that can have an impact on this but overall panel by panel it does seem to have the same level of descriptive accuracy or even increased now this was one where I did find the capabilities of gpt4 to be pretty surprising because the user put an image in I'm not sure where they got

### [11:22](https://www.youtube.com/watch?v=x6bZOjzfCqo&t=682s) Image Example

it from it might have been their own image and it said describe this image in detail and explain the theory behind it actually shows a bruise on a person's arm and G ppt4 gets this it says the image shows a bruise on a person's arm a bruise is a skin discoloration caused by a blood vessel break and it goes on to describe exactly how that occurs what's also interesting is that it then gives you a step-by-step Theory slash instruction manual on how to reduce the swelling reduce the inflammation and how to quickly solve and heal this bruise which goes to show that this has more applications than we think I do think that if this is fine-tuned to certain medical capabilities this is going to impact the medical field in the largest instance because a lot of the time what you do have are Services where doctors are able to visually Identify some kind of condition and if an artificial intelligence is able to be trained on millions of images of a specific type of condition it's going to be able to be better than a specific doctor that sees 30 000 patients or 60 thousand patients in a couple of years they're going to be trained on Millions now if you do want more examples of just

### [12:38](https://www.youtube.com/watch?v=x6bZOjzfCqo&t=758s) More Examples

how crazy gpt4's visual inputs I can give you these for example we have gpt4 visual input example extreme ironing what is unusual about this image the unusual thing about this image is that it is a man ironing clothes on an ironing board attached to a roof of a moving taxi and it will be interesting to see once gpt4 visual input does completely roll out or rolls out to even more users what these original images if they still hold the same kind of output because we do know that these models are trained and tweaked all the time then of course we have another gpt4 visual input here and this one is from the original paper it says can you explain this Meme and then essentially explains the meme of how this looks exactly like all countries but of course in the form of chicken nugget so it's actually interesting that gbt4 is not only able to realize that what here is chicken nuggets on a simple baking tray but it's able to easily identify the jokes the humor and what makes something funny which goes to show that this is not just a simple image identification but it's combining the ability to recognize what's in an image with the context too which is a complete difference from a lot of the early models that we did see now what is interesting is are we going to get gpt4 images rolled out in bing before it's actually added to chat GPT currently we do know that the user base on chat GPT is already at 100 million users every single day and it's largely going to be growing but I do think that they are testing this out slowly because they do know that sometimes there might be certain issues which they don't want on a full scale platform what's also interesting is that I am wondering how they're going to roll this out completely because what we did see on

### [14:30](https://www.youtube.com/watch?v=x6bZOjzfCqo&t=870s) Discord

the developer live stream was them using GPT 4 through Discord now Discord sometimes is used for AI applications like mid Journey but I do not think that this is what we'll be getting for the final version of gpt4 at the same time we don't know because there is nothing confirmed as to how this will be rolled out just yet and when we did see images we didn't see any actual image input inside of the chat GPT just yet so it will be interesting to see how that's done now if you do want a final demonstration of just how good GPT

### [15:01](https://www.youtube.com/watch?v=x6bZOjzfCqo&t=901s) Final Demonstration

Ford's visual input take a look of this last clip where this guy literally managed to write down an idea for a website take a picture of it give it to gbt4 and it was able to code it within seconds so I have here a nice hand-drawn mock-up of a joke website definitely worthy of being put up on my refrigerator so I'm just going to take out my phone literally take a photo and here we go technology is now solved actual working JavaScript filled in the jokes for comparison this was the original of our mock-up and so there you go going from hand-drawn beautiful art if I do say so myself to working website and this is all just potential right we you can see lots of different applications we ourselves are still figuring out new ways to use this um so we're going to work with our partner we're going to scale it from there but please be patient because it's going to take us some time to really make this available for everyone

---
*Источник: https://ekstraktznaniy.ru/video/14802*