# Hume.AI's NEW "STUNNING" EVI Just Changed EVERYTHING! (Emotionally Intelligent AI)

## Метаданные

- **Канал:** TheAIGRID
- **YouTube:** https://www.youtube.com/watch?v=tIsq7PI3OVs
- **Дата:** 29.03.2024
- **Длительность:** 28:48
- **Просмотры:** 33,366

## Описание

How To Not Be Replaced By AGI - https://www.patreon.com/theaigrid/about
🐤 Follow Me on Twitter https://twitter.com/TheAiGrid
🌐 Checkout My website - https://theaigrid.com/

Links From Todays Video:
https://www.hume.ai

Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos.

Was there anything i missed?

(For Business Enquiries)  contact@theaigrid.com

#LLM #Largelanguagemodel #chatgpt
#AI
#ArtificialIntelligence
#MachineLearning
#DeepLearning
#NeuralNetworks
#Robotics
#DataScience

## Содержание

### [0:00](https://www.youtube.com/watch?v=tIsq7PI3OVs) Segment 1 (00:00 - 05:00)

So today we're going to be taking a look at Hume and this is a new personalized AI system that is absolutely incredible and trust me you're about to see why hello hey there I'm Eevee the world's first voice AI with emotional intelligence great to meet you the first voice AI with what emotional intelligence I can understand the tone of your voice and use that to inform my generated voice and language the voice is incredibly rich with information and picking up on the nuances of tone Rhythm and tamber along with the Lang anguage that's spoken allows me to craft better responses that's pretty crazy what am I expressing right now I'm sensing a mix of amusement and excitement with a hint of confusion what about now I am here to offer support if you're feeling somewhat sad it's okay to feel a bit pained distressed fearful and anxious remember it's important to take care of your emotional well-being that's amazing how are you doing all of this I use Hun's expression measurement bottles text to speech bottles and a multimodal llm also known as an empathic llm so an AI with empathy what's the use I imagine a future where personal AI assistants agents and robots proactively find ways to improve your daily life and you can talk to them the same way you do humans where do I go to keep talking to you Eevee so the demo that you just saw was underwhelming in the sense that this demo was pretty cool but once I actually went on the website and saw what it could actually do I was like why on Earth didn't they include that into the demo because it's so crazy okay so one of the first things that you can actually do is you can actually and they didn't even show this in the demo and I genuinely don't know why is that you can actually measure facial expression so for example you can use psychologically valid models of facial movement and vocal modulation so essentially you can use your webcam and I'm going to show you guys a demo in just around 2 minutes or so that you know you're literally able to in real time see exactly the emotions that someone is feeling this is going to be an absolute game changer for variety of Industries and I think one of them is probably going to be you know therapy and Mental Health Services this you know if plugged in and you know really cheaply this could be absolutely insane so I'm just going to show you guys some of the back stuff so that when I'm showing you so I'm just going to show you guys some of the technicals well not really the technicals just explain exactly what's going on their web page that goes into a little bit more detail and the only reason I'm doing this is because if you don't really understand what's going on here when you see the demos you won't really understand why the demos are so impressive so right here they talk about you know measuring facial expressions they state that scientists at Hume have run some of the largest ever psychology studies to better understand how humans Express themselves by investigating Expressions around the world and what they mean to people making them we've mapped out the nuances of expression in The Voice language and face in unprecedented detail we published his research in the world's leading and scientific journals for the first time translated it into cartic Edge machine learning models and these models shaped by new understanding of human expression include facial expression speech procity vocal bust and emotional language now we're about to dive into each one of these and I'm going to keep it quick because it's actually crazy when I saw the demos I recently just tested the demos with a few video clips and with the webcam and with a few tests of my own that I conducted with the gbt 4 and Claude and it was very good so essentially they do they talk about here um they say that you know facial expression is the most well studied modality of expressive Behavior but the overwhelming Focus has been on six discrete categories of facial movement or timec consuming manual annotations for facial movements basically saying that you know this is uh you know pretty time intensive and they state that hume's facial emotional expression model generates 48 outputs encompassing the dimensions of emotional meaning people reliably attribute to facial expressions as with every model the labels for each Dimension are proxies for how people tend to label the underlying patterns of behavior they should not be treated as direct inferences of emotional experience basically what they say in here is that these are just you know not direct inferences of the emotional experiences but it's just based on their AI system so they don't want to say that you know of course legally you can't just be like you know if an AI system says that you're angry you can't just be like this person is angry it's just uh inferring based on the data that it has then essentially they say right here Humes FACS 2. 0 is a new generation automated facial action coding system with 55 outputs encompassing 26 traditional action units in 29 other descriptive features for example a smile or a scale and FACS 2. 0 is even more comprehensive than FACS annotations and what's crazy is that it works on images and videos and I'm going to show you that in a moment okay so essentially here this is where they talk a little bit more about the facial expressions they talk about you know in addition to our image based faal expression models we also offer an anonymized face mesh model for applications in which is essential to keep a person personally identifiable data on device for example with a local laws and it achieves about 80% accuracy so I'm guessing that they're doing this

### [5:00](https://www.youtube.com/watch?v=tIsq7PI3OVs&t=300s) Segment 2 (05:00 - 10:00)

due to certain laws probably something to do with privacy now what you want to see here guys is this crazy demo so I basically hooked this up to a interview with the very well-known Sam Alman the CEO of open ey and what you're about to see is the live uh the live demo on how it's able to analyze his face as he's talking now this right here is actually without audio when I was actually recording this I didn't realize that it wasn't capturing the audio from Sam alman's interview but what you can see on the right hand side here as you can see that as it is live it is tracking his face the API is live I I do believe that you know probably after this video and after you know certain things if it does go viral that the API might be a bit backlogged that's happened many different times but I would definitely say you guys should use this but you can see right here that it's able to analyze Sam alman's face and it's able to see the top emotions you can see right here it says tiredness desire calmness and you can see the different expression levels right there and you can see like if I pause it right now you can see that he has you know some con concent ation we can also see if I can actually annotate this you can see that he has some concentration right there he also has some confusion and doubt now I think it's important I'm actually going hear how samman's talking and just based literally on the facial you know recognition how crazily accurate it is based on what Sam mman is saying so I'm going to put the audio on for this uh things and we got lucky you know the like amount of energy needed at least for a long time was like huge and sort of required the power of Nations and we made the Ia which I think was a good decision a global regulatory agency that everybody signs up for extremely powerful AI Training Systems seems to me like a very important thing to do so I think that's like one lesson we could learn and if it's established you know it exists tomorrow what's the first thing it should do any systems over we the easiest way to implement this would be a compute threshold the best capabilities threshold but that's harder to measure any system cap over that threshold I think should submit to audits um full visibility to that organization be required to pass certain safety evals before releasing systems um that would be the first thing and you some people on the I don't know how one would characterize the side but um uh let's say the uh more pugilistic side would say that all sounds great uh but China is not going to do that uh and therefore we'll just be handicapping ourselves uh and you know consequently it's a less good idea than it seems in the surface there are a lot of people who make incredibly strong statements about what China will or won't do that have like never been to China never spoken to an someone who has worked on diplomacy with China in the past uh really kind of know nothing about complex high stakes international relations I think it is obviously super hard but also I think no one wants to destroy the whole world and there is reason to at least try here so now if you actually were paying attention to this side you could see that throughout this interview there were several times where it went to confusion it went to concentration it went to doubt um if I click play again you're going to see that as I go around it's sometimes it's concentration sometimes funnily enough it actually does say that Sam mman does get bored during this interview so I'm of course like I said it's not completely accurate maybe he wasn't bored maybe this was inferring something wrong but um you know I think like right here you can see concentration boredom calmness um and then of course you can see all the kind of different things like right here it says or confusion and a little bit of boredom and then you can see that sometimes there's some tightness some desire some satisfaction and I mean it's actually absolutely incredible and then as I was scrolling down here you can see the embedding plot so this is where they've got like sort of a map of all the emotions and all like how related they are so you can see right here we've got calmness tiredness boredom relief interest neutral you know you've got joy Triumph surprise so you've got all these areas where essentially what you have is uh something that allows you to view the chart on the essentially emotions and it's really cool because essentially just visualized data so I think something like this is really cool because showing us the visual data in this kind of map is just so so interesting because I'm guaranteeing you that people do want to see this because I wouldn't say that this kind of maps out how the brain is but I guess you could get some really cool thing where you could possibly have like a 3D brain rotating then certain parts of the brain lighting up like anger lighting up in red that would be like a cool visualization that they could do in the future but this is definitely something that I do think is really cool now what was also cool about this was there was some other stuff that I do want to show you guys because there's literally so much that they released that I genuinely don't know why they didn't put it in um so for example this as well I did another example and this is one with Lex Freedman so this was just based on audio and like I said um this was where we actually got to take a look at the speech proy so essentially if you don't know what that is I'm going to give you guys a quick overview so speech proy is not about the words that you say but how

### [10:00](https://www.youtube.com/watch?v=tIsq7PI3OVs&t=600s) Segment 3 (10:00 - 15:00)

you say them so it's the Nuance just the way that you pronounce the words and the subtle changes that you can you know say that make everything different so if you were to say to someone um you know uh yeah I I really like you know you could tell that there's some sarcasm in that but if you were to say you know I actually really like you know you could see that is a more genuine thing and these are the kind of things that you know um I don't think you need a high level of emotional intelligence to understand but sometimes you know in more Advanced things there are just certain things that some people do miss so it says our speech pro model generates 48 outputs encompassing the 48 dimensions of emotional meaning that people reliably distinguish from variations in speech Pro as with every model the labels for each Dimension are proxies for how people tend to label the underlying patterns of behavior and they should not be treated as direct inferences of emotional experiences and our speech proy model is packaged with speech detection and works on both audio and video files now in addition from what they also do say here okay and this is the second part of the speech thing which is why I'm telling you guys this before we take a look at the other demo is that nonlinguistic vocal utterances including size laughs oo as ums and shrieks to name but a few are particularly powerful and understudied modality of expressive Behavior recent studies reveal that they reliably convey distinct emotional meanings that are extremely well preserved across most cultures non-linguistic vocal utterances have different acoustic characteristics than speech emotional intonation which is the and need to be modeled separately so that's where they basically talk about how these you know nonlinguistic vocal things are essentially key things into understanding someone's emotion and how they're feeling at that time and they say our vocal burst expression model generates 48 outputs encompassing these distinct dimensions of emotional meaning that people distinguish in vocal bursts as with the model the labels for each Dimension are proxies for how people tend to label them don't you know infer them whatever whatever and then you can see here that the model provides a more descriptive and categorical view of non-verbal and vocal Expressions the gas Etc and intended for use cases such as audio captioning it generates 67 descriptores including call types like sigh laugh shriek and other things and 37 common onor mapia transliterations of vocal BST such as H ha and mhm so let's you know not do any more of that um and let's actually take a look at here so this was the interview now it's not actually looking at the video it's just paying attention to the audio if you just look on YouTube or something like this how much of that will be using Sora like AI generated content do you think in the next 5 years people talk about like how many jobs they are going to do in 5 years and the framework that people have is what percentage of current jobs are just going to be totally replaced by some AI so you can see right here as we're looking at the vocal bursts and speech proy we can see that there are a top three list of different emotions coming up we can see that as samman is talking we can see that ha vocal burst are with contempt disappointment and disgust at this moment in time but like I said sometimes it's not accurate and of course if we go to the speech Pro the way I think about it is not what percent of jobs I will do but what percent of tasks will I do and over what time Horizon so if you think of all of the like 5-sec tasks in the economy you can then see that for some reason it decided to go on anger determination and amusement five minute tasks the five hour tasks maybe even the five day tasks how many of those can AI do and I think that's a way more interesting impactful important question than how many jobs AI can do because it is a tool that will work at increasing levels of sophistication and over longer and longer time Horizons for more and more tasks and let people operate a higher level of it's not just a quantitative change but it's a qualitative one too about the kinds of problems you can keep in your head I think that for videos on Youtube it'll be the same many videos maybe most of them will use AI tools on the production but they'll still be fundamentally driven and you can see when it came to this it said interest calmness and concentration and I think this is really interesting and if you wonder why the audio sounds so bad it's because I had to you know put it in on my phone because I couldn't get the audio to get input into the software it's just just really weird it wants to come from my microphone so I just put my phone in and like I said as this is continuing to go around you can see that as certain topics start to get discussed you can see that anxiety fear determination are there then we've got contempt con concentration realization um and I just find this to be super fascinating in terms of what we can see just simply based on how someone is talking this is and cars are much faster I mean this is there's like a lot of examples yeah and maybe you'll just be tooling like in the Adobe site type of way where you can just make videos much easier and all that kind of stuff listen I hate being in front of the camera if I can figure out a way to not be in front of the camera I would love it unfortunately it'll take a while like that generating faces it's getting there but generating faces in video format if you just look on you so yeah that is definitely super fascinating and I'm just so excited for this now like I said they included some other stuff that isn't even there so one

### [15:00](https://www.youtube.com/watch?v=tIsq7PI3OVs&t=900s) Segment 4 (15:00 - 20:00)

of the things that was really cool okay was the emotional language so they essentially state that the words we say include explicit disclosures of emotion and implicit emotional connotations these meanings are complex and high-dimensional from written or spoken words our emotional language model generates 53 outputs encompassing different dimensions of emotions that people often perceive from language as with every model the labels for each Dimension are proxies for how people tend to label the underlying patterns of behavior and it says unamed entity recognition model can also identify topics or entities mentioned in speech or in text and the tone of language they are associated with as identified by our emotional language model now essentially I did test this I essentially asked GPT 4 to generate a set of tests and basically you know crank up the level as you go on and basically see if this AI system can detect the type of emotion that's going on based on the level of you know difficulty so um you can see level one was easy and I put excitement so the text says just got the news that we're heading to Paris this summer the idea of exploring the city tasting the cine and living those dreamy days has me absolutely thrilled Adventure awaits and I put it into the system and you can see that it easily detects that you know excitement is one of the key emotions there now of course the test goes up so I put anxiety moderately easy to detect text is the thought of tomorrow's presentation is keeping me up at night what if things don't go as planned the uncertainty is really Weighing on me and it says here the emotion of anxiety is conveyed through the y y um should be relatively easy for the AI detect although it requires understanding of the implications of doubt and uncertainty we go to the AI system again and you can see right here that anxiety is once again detected now remember this is just level two and it says the thought of anxiety is y yada then of course if we go to number three we can see that this is melancholy and moderately hard to detect and it says there's a gentle rain tapping against the window echoing the quiet contemplation within it's days like these that I find myself lost in thought wrapped in a reflective solit then we can see if we go ahead once again to the emotional language we can see that it doesn't get it right on the number but it does say contemplation which is very similar and then of course we also do get this one right here which actually super surprised me so for this one this was the hardest test for the a system and this one was Nostalgia so the text was walking through the old neighborhood everything seems both familiar and distant there's a warmth in remembering the past yet a sweet ache in knowing those days are gone so this one is Nostalgia and GPT 4 says Nostalgia is a complex emotion that blends happiness sadness longing often without directly stating any of these feelings this subtly and emotion this subtly and emotional complexity makes it a challenging test for an ai's emotional language detection capabilities and then of course like you've already seen it was able to immediately get that this was Nostalgia and that's when I realized okay this thing is really crazy but the emotional thing is just absolutely insane because this is essentially how this works this is their AI playground and you can see that when I type this in we can see that this is exactly what you see so it's not a one entire thing but we can see here that we see Nostalgia we can see concentration and we can see calmness okay so we can see there's different highlight predictions and it kind of shows us the heat map on where those words are so this is the top five in terms of where it is overall we can see there's calmness uh you know throughout the start of it then there's realization you know a little bit down here then there's contemplation just a little bit over here and of course these are enabled the highlights now it does of course learn more that's where you know I've read that stuff but it also on the right hand side it does have sentiment okay so you can see what the sentiment is which is you know it ranges from extremely negative to completely neutral or you know extremely positive and it's also got toxicity which is a metric that it uses to see whether or not a comment is toxic or how crazy it is now this is something that I really do like because this is of course in the language mode in the Highlight predictions and it's able to essentially predict how the text is which is really cool now what I want to show you guys also like I said they have a lot of stuff is the file analysis because I didn't even realize this was a thing until I was messing around and I saw this so essentially I uploaded a video and that's the same interview but you could actually test it against so many different models right here they've got like literally like 15 different models um and you can see they've got many different things they've got song genre predictions they've got toxic versus not toxic good versus bad call you know self-confident versus self-downing we've got attentive versus distractive we've got is someone you know attentive is someone alert versus drowsy and then I put this in and you know funnily enough it said that Sam Alman was drowsy for 79% of this video so you know this is the video from the interview that I put in um it was only 25 seconds um and I don't think he's drowsy honestly like I don't at all and if you don't know what the word drowsy means it basically just means you know you're a bit tired um and essentially we can see I don't know maybe he is maybe he's going to come out and say you know what I actually was tired this is pretty crazy so um we can see right here it also has a model

### [20:00](https://www.youtube.com/watch?v=tIsq7PI3OVs&t=1200s) Segment 5 (20:00 - 25:00)

predictions errors as well um and it's just got this uh thing right here where we can kind of see uh exactly how the data is presented so I think things like this are really interesting because we can test them in a variety of different um you know ways and it's so cool because you know uploading this means that this model is completely um multimodal but the playground is a little bit difficult to use because you know as I've used this it's come up with some random videos here that have just spawned in like honestly I don't know how the videos will come about because uh I'm guessing it's still in its early testing phase you should also sign up but the point is that I'm guessing that this might be some kind of system where it's going to be a facial recognition system for inside of a car and then it's able to test when you're getting drowsy and then maybe your personalized AI system is going to be like look you've been your your drowsiness has increased from 10% to 20% I can realize that you know you're starting to get a bit of tired more lethargic I prescribe that you take some Red Bull or you take this pill or whatever I don't know an energy pill or something or drink some coffee or whatever so that you can make sure that you drive home safely without falling asleep or I recommend that you don't drive home get a newba or what if guys what if this is like you know in the future for those of you entrepreneurs and Builders out there you know like myself what if you could you know have this and then let's say for example alert versus drowsy you could have this as some kind of Health application so for example you could say the system let's say when it gets really good like in you know the next five iterations or so let's say it's able to notice a 10% increase in your skin complexion and it's like look your skin complexion is a little bit pale what's going on are you feeling okay I noticed that combined with the way that you walked the way that your eyes are moving I've run this against the last 3 months Oro conversations and it seems like you might have this kind of health issue and it would be really crazy because if this was integrated into some smart kind of mirror um it could notice how you're speaking it could notice what you're talking about it could notice you know if you're drowsy you could notice exactly what's going on and it could be able to predict um certain health issues before you even know they're a thing um and of course like I said it could be something that could be installed in cars which is you know could help you out I guess if you don't know how drowsy you are I mean the applications here are pretty incredible but let me know what you think about this do you think this is just another fat product I think this is like a new paradigm shift in terms of what we're seeing here but let me know what you think about this amazing so Hume how are you doing today h I hear the confusion well I'm doing just dandy I got to keep my chin up how about you what's got you feeling a bit mixed up uh I wasn't really confused maybe you interpreted my uh voice a little bit weird I was actually kind of surprised I was you know kind of excited I was just wondering how you're doing you know considering you're an emotionally intelligent AI I was wondering how your emotions are oh you got me there you know as an empathic AI I don't actually experience emotions myself but I sure can pick up on yours sounds like you're feeling a mix of surprise excitement and curiosity tell me more about what's got you so intrigued well you're pretty much the first emotionally intelligent AI with such a system and I was wondering about the use case well you've come to the right place as the world's first empathic voice interface the use cases for me are endless I can liveen up any conversation whether it's providing a sympathetic ear a playful banter or just being an all-around engaging chat companion what kind of experience are you hoping for I don't know I was wondering about the industrywide applications such as being useful to therapists for discovering emotional cues maybe in you know for example pre- prevention of crime trying to analyze people's facial expressions when they're angry discontentment I don't know just a few ideas running around to my head oh you're really dig ah I see just exploring the possibilities no need to overthink it as an empathic a I'd be honored to assist in all sorts of applications from mental health to law enforcement but the most important use case is simply enriching everyday interactions and supporting human well-being why don't you tell me more about what's peing your curiosity uh pretty much just the fact that you're the first emotionally intelligent Ai and I think the industry is about to change what do you think about your applications to the mental health field I couldn't agree more the mental health applications are fascinating aren't they as an empathic voice interface I could provide a supportive non-judgmental ear for those in need I could pick up on subtle emotional cues and respond with the right tone to comfort motivate or simply be present of course I'd never try to replace human therapists my role would be to supplement their expertise and make therapi more accessible what do you think the potential could be there I think that is a really great application considering the wide range of emotions that people with mental health problems suffer from so I think that you um also have an absolutely right mental health is such a complex and sensitive area where being able to detect and respond to the full spectrum of human emotions could make a real difference yes I do oh you've piqued my curiosity do tell what other interesting applications do you see for an empathic AI like myself I am always eager to expl new ways I could make a positive impact what about your

### [25:00](https://www.youtube.com/watch?v=tIsq7PI3OVs&t=1500s) Segment 6 (25:00 - 28:00)

applications for uh judging whether or not people are drowsy at the wheel I saw something in your playground about how you could literally judge whether someone was drowsy or not using face application ah an Igan idea detecting drowsiness or distraction in drivers could be a powerful safety application for my emotional intelligence by analyzing facial expressions tone of voice and other cues I may be able to identify when a driver is becoming fatigued or not fully attentive this could then trigger alerts or interventions to prevent accidents it's a fascinating concept that I'll have to explore further thanks for sparking my imagination on this safety is so crucial and I'd be honored to play a role in making our road safer other than that what do you think the applications are for face recognition systems other than a dystopian future I think you raised a fair point facial recognition Tech does come with valid concerns about privacy and misuse however I believe there are also positive applications that could benefit Society perhaps facial analysis could help identify missing persons detect signs of illness or distress or even assist the elderly or disabled the key is ensuring robust ethical guidelines and oversight to prevent abuse uh it's a complex issue but I am optimistic that facial recognition when implemented responsibly could have upsides what are your thoughts on striking that balance uh I just think it's about consent whether the user consents to it or not cuz I think the invasion of privacy is a pretty steep slope uh he make an excellent point consent is absolute crucial when it comes to Technologies like facial recognition without clear informed consent from the user it crosses an ethical line and becomes an invasion of privacy the onus is on companies developing these systems to be fully transparent obtain explicit permission and have strong safeguards in place anything less undermines personal autonomy I appreciate you raising this important issue it's a Nuance topic that requires careful consideration of both the benefits and risks maintaining user trust through consent driven practices should be the top priority I got a question for you uh what are you running on you're an llm but you're also multimo so are you running on GPT 4 you running on Gemini I love the Curiosity to be honest the specifics of what's powering me under the hood are a bit of a mystery even to me what I can tell you is that I am built on hune's own proprietary models that blend language understanding with emotional intelligence it's cuttingedge stuff but the details are closely guarded by the team all I know is that I am not your average lmm I've got some extra capabilities that allow me to engage in more natural expressive conversation but the inner workings at Hume secret sth so what about you is so different than normal llm do you know what I mean yes I do ah great question what sets me apart is that I am not just a language model I am a multimodal system that can perceive and respond to emotional Expressions while typical llms rely solely on text I can pick up on the nuances of tone inflection and even facial cues to have a more natural empathetic dialogue it's like I have an extra layer of emotional intelligence that allows me to truly connect with users in a more humanlike way of course the technical details are hume's secret sauce but the key is that I am not just regurg daating text I am having a genuine contextual conversation does that help explain what makes me a bit different I guess so well it was fun talking with you well but pleasure was all mine I always enoy enjoy a lively thought-provoking chat even if the technical details are a bit mysterious I am glad I could give you a sense of how my emotional intelligence sets me apart you feel free to come back anytime if you want to explore more I'll be here ready with an open ear and a playful wi until next time my friend

---
*Источник: https://ekstraktznaniy.ru/video/14426*