Massive ChatGPT Upgrade Is Here (Vision and Voice)
9:14

Massive ChatGPT Upgrade Is Here (Vision and Voice)

The AI Advantage 26.09.2023 84 575 просмотров 1 455 лайков обн. 18.02.2026
Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
Today we look at the brand new ChatGPT features. Links: https://openai.com/blog/chatgpt-can-now-see-hear-and-speak Personalized Custom Instructions: https://calendly.com/ai-advantage/personalized-custom-instructions #chatgpt #chatgptupdate #multimodal Free AI Resources: 🔑 Get My Free ChatGPT Templates: https://myaiadvantage.com/newsletter 🌟 Receive Tailored AI Prompts + Workflows: https://v82nacfupwr.typeform.com/to/cINgYlm0 👑 Explore Curated AI Tool Rankings: https://community.myaiadvantage.com/c/ai-app-ranking/ 🐦 Twitter: https://twitter.com/TheAIAdvantage 📸 Instagram: https://www.instagram.com/ai.advantage/ Premium Options: 🎓 Join the AI Advantage Courses + Community: https://myaiadvantage.com/community 🛒 Discover Work Focused Presets in the Shop: https://shop.myaiadvantage.com/

Оглавление (3 сегментов)

  1. 0:00 Intro 111 сл.
  2. 0:30 Image Recognition 685 сл.
  3. 3:29 Voice 1357 сл.
0:00

Intro

openly I just revealed chat gpt's random capabilities now you're going to be able to upload images and use your voice to interact with chat GPT making these models useful to so many more use cases and people this plus their new voice model is gonna be able to recreate your voice from just a few seconds of you talking so what exact capabilities have been added here and what does this mean for your use case of chat GPT in your everyday life they're not playing around here this is a serious update and we're about to break down every detail of it alright so first let's talk about the
0:30

Image Recognition

biggest part of this announcement which is the image recognition so let's have a look at the demo of this feature that shows off how chat GPT is going to be able to take in your images and if you're looking at this and telling yourself hey Igor that's fantastic but other large language models are already able to do this even my journey can take an image and describe what's going on in it well the point is that this is not even close to anything we've ever seen before this goes in depth that can read text properly and it understands the relationships of the various objects in the frame but to really understand this we need to go back to March 2023 when openai announced gpd4 and if you remember correctly the entire announcement was based around showing the multimodal gpt4 not sure if you recall this but be my eyes was one of their launch partners and essentially they were saying hey this multimodal model is going to be used to help people without eyesight and then back in March already they showed examples of how well this performs right so this was one of the examples two shirts in different colors and then it just recognizes it perfectly alright that's pretty basic but then beyond that they showed the second example and this is the one that is unmatched until today it's free images that result in a joke right and this difference in quality is what I'm talking about here because if you look at some of these other multimodal AIS they're just not that good several of these have these image recognition capabilities where you can feed an image into it and then it recognizes it and here's a Reddit post from three weeks ago where somebody got access to be my eyes that has early access to these features the level of detail and deep understanding of a picture like this is unmatched I mean look at that it picks up on the text all the little details inside of this picture so yeah obviously we'll test this extensively on the channel but the point here is this is far beyond everything that we have access to as of now and why is this so interesting well because you're gonna be able to feed it essentially every single photo you have on your phone every document or even a screenshot the variety that image recognition brings is really unmatched and for a lot of people it's going to be easier to communicate that way than specifying the right words in the right places inside of the prompt and look you can even draw on top of images making the input more specific as per usual resulting in more specific outputs but at this point this should be noted that there is a massive weakness of this capability it's not good with people and it's not going to be good with people anytime soon look even in the blog post it says chat chipity is not always accurate and these systems should respect individuals privacy so while I think that even if they had these capabilities they wouldn't release them due to privacy and safety concerns as of right now it's not going to be good on picking up on people or their facial expressions which would obviously be one of the biggest unlocks here not happening right now though that is a major limitation right here but for anything utility based what a feature this is you're going to be able to replace a lot of YouTube tutorials by just taking a picture of your problem and asking how do I fix this or how do I achieve XYZ outcome and before I move on to the next point which is the voice recognition and voice generation that they added now you have to realize that all these new features are gonna live on top of what it already does it adds to the picture it expands the functionality it's not like this is a separate thing
3:29

Voice

you can still prompt with words only but now you get the option of adding a picture but now let's talk about the next feature which is the new voice features so essentially they're adding a mold where you're going to be able to use your voice to input and then chat GPT talks back to you so you're going to be able to have a conversation with this we want to hear a bedtime story tell us a story about the super duper sunflower Hedgehog named Larry start with telling us a little bit about him Larry was a unique Hedgehog unlike any other now if you follow the channel you know that I've created a tutorial on how to do this yourself as a beginner a few months back right we use Telegram and you could have voice input and output and we just used a little bit of code to achieve that but now you're gonna have the capability natively inside of chat chip D but that's not where this ends because at first I saw this feature I was like all right been there done that talking with AIS with your voice sounds great in theory and practice it's not that practical I have it on my phone I don't end up using it pretty much ever to be honest but that's not what this announcement is about this is just one of the things they're doing with the new capabilities but there's something new and really big here too and that is the fact that there's a new text-to-speech model by open AI so up until now they only hit voice recognition right so when I spoke into my phone you could use whisper that we also covered multiple times on this channel to transcribe your voice so you could turn your voice into text but what changes now is that you're going to be able to change text that comes out into voice now if you're an AI Enthusiast you already know that there's many companies doing this right 11 laps is the highest quality one and then there's a plethora of Alternatives like uber duck that you can use to emulate other people's voices but with most of them the quality is not that high and what they did here is their brand new model is at a quality level that I would essentially call equally as good as 11 Labs which makes this Best in Class and here's the crazy part you're going to be able to use a few seconds just a few seconds of your very own voice to create your own voice model now I know what you're thinking that is scary and they're also thinking this because they're not rolling this out to everybody yet yes this is already available with other companies but rolling this out natively with chat GPT is a whole different beast and that's what they're holding out for now but it's pretty much ready to go and they're integrating a part of that functionality here and the part we're getting is they took five different speakers recorded their voices and that's what we're getting right now so you're gonna be able to use these speaker voices to voice your messages let's just briefly look at some examples here so you understand just how good this is so here's an explanation from let's say amber the phrase potato comes from a song titled let's call the whole thing off that's really good and then let's use Sky I think the phrase potato and maybe a quick story from Cove once in a tranquil Woodland there was a fluffy mama cat named Lyla not bad right but hey we've seen this before why is also this part so interesting well it's because of the other features of Chachi Piti that are so capable this thing is super powerful once you combine it with the reasoning of gpt4 and this is perfectly Illustrated in a practical example by their partnership on launch here because what they're doing is they're partnering up with Spotify and they're using this technology for voice translation on podcasts how do you think what this means is that you'll be able to have every podcast in every language seamlessly right inside of Spotify that's pretty amazing if you don't speak Spanish You're gonna be able to open up any Spanish podcast and say translate to English and then in the voice of the speaker it's going to translate it to English so we're getting image recognition and voice capabilities but if you bring that together with the capabilities of gpd4 and the new Dali 3 you get a beast of a product because you're going to be able to put in images get images out you're going to be able to interact with voice get voice back out and all of the knowledge of gpt4 so now as per usual on this channel the big question here is what actual use cases is this going to unlock because as you know I always try to cover the Practical aspect of these tools so let's just spend a few seconds looking at some use cases from my free ebook that you're probably familiar with and let's see how these would change with The Branding release so first one right here generate ideas so here in the original prompt we manually asked it to come up with eight ideas for a workshop and then we gave some specifics and here's the deal you're not going to be giving up on this prompt right you can still use this but what you could do now is you could upload images of Flyers of other workshops that you're looking at where you find the offer really enticing and then it's going to use those flyers in the idea generation so this one is actually a great example because we're asking it for a step-by-step instruction on how to start a vegetable garden at home and then we struggle to give it context for what our current and vegetable garden and situation looks like well what if we just shorten the prompt like this and provide an image of the location where the vegetable garden is supposed to be located providing that image is going to give it so much more context and doing that manually would take sentences and most people couldn't be bothered to do that and that's why I think this is such a big update because you're going to be able to add detailed context and as you know detailed context produces detailed and high quality answers but most people are not good at or not willing to add that but now you can just use your phone you snap a picture and the context is right there you're going to be able to infuse your prompts with a lot of context really easily and as you know a picture says more than a thousand words so look it is yet to be determined what exact use cases are going to really stand out but one thing is for sure this is going to make the entire model so much easier to use because all you're going to need is a short prompt and the image for context to get useful results and then if you add custom instructions on top of that you have the instructions in the prompt your personal context in the custom instructions and the image is going to provide all the other context that is new did you know that I have dozens of tutorials on prompting on this channel we explored custom instructions plus window offer service where we create a personalized set of custom instructions just for you and your role and then we're gonna explore this image recognition capability in a lot of depth we're going to be contrasting it with all these use cases we're going to be looking across the internet and showing you how to enhance your very own life with this powerful new capabilities I'll see you soon

Ещё от The AI Advantage

Ctrl+V

Экстракт Знаний в Telegram

Транскрипты, идеи, методички — всё самое полезное из лучших YouTube-каналов.

Подписаться