# AI Frontiers: Jesper Hvirring Henriksen (OpenAI DevDay)

## Метаданные

- **Канал:** OpenAI
- **YouTube:** https://www.youtube.com/watch?v=VKzlXjVRWNA
- **Дата:** 15.11.2023
- **Длительность:** 9:25
- **Просмотры:** 28,673
- **Источник:** https://ekstraktznaniy.ru/video/11558

## Описание

Meet Jesper Hvirring Henriksen, CTO of Be My Eyes which uses GPT4’s image recognition capabilities to convert the visual world into text and speech for the visually impaired.

## Транскрипт

### Introduction []

-Hi, everyone. I'm Jesper and I'm with Be My Eyes. Be My Eyes, we launched a brand new feature, Be My AI on GPT-4V an enclosed partnership with OpenAI.

### Be My Eyes [0:29]

Be My Eyes was founded on the idea that giving blind and low-vision people access to visual assistant through a community of volunteers, through an app. Today we have over half a million blind and low-vision users who are supported by more than 7,000,000 volunteers. Through a video call, the volunteers can lend their eyes to the person asking for assistance. The calls work great, but we wanted to provide our users with a choice. We've heard various kinds of feedback. Our users say they don't want to lose their independence, be a burden on others. It might also just be Monday morning, you haven't had your first cup of coffee yet, or your house is messy so you don't feel like talking to a stranger. What if they could have an AI assistant available 24/7 to see for them?

### Be My AI [1:29]

That's what we built. Up until now, computers have not been able to see. With GPT-4 vision, that's not possible. There's a million applications in the real world and the physical world. Some are utility-based like these two, others are describing humans and pets and landscapes and buildings helping with navigation and so on, but there's also a whole range of digital use cases because our lives in the past 10, 20 years have been locked up in these screens that we look at all the time. Many apps and websites are accessible at a basic level, but they also contain tons of media and lots of photos and other images, which most often lack a meaningful alt text, which is what makes them understandable to those who can't see them. With Be My Eyes and Be My AI, you can now get a thorough description of any image that you encounter online or in an app. It could be photos you receive in a group chat where you're the only blind person and you just can't make sense of it. Like, "Why will he make the neighbors joyful? Ah, all right. He's playing the cello in his garden. " There's also a series of images out there on websites that are inaccessible. What's on the webpage might be accessible through screen readers, but the images are not. If you look at this one, this is from a US government website. The graph here is completely inaccessible. The all text says, "Bar chart showing rising global temperatures since 1880. " Now, compare that with the description that came back from GPT-V. It's just another world that we're living in. Here is just-- it's not just that the model, is extremely good at describing things and have become very, very accurate. It's also a little bit witty. It's also surprisingly human-like in responses. We have a million more images, but we all need to go have a drink soon. I'm hearing. Did this work? We wanted to give people a choice. If we look at Caroline, one of our oldest users, she's made about two calls a year, and then she became a Be My AI beta tester in March, and now she's done more than 700 image descriptions.

### Lucy Edwards [4:21]

We think it worked. This is another one of our beta testers, Lucy Edwards. She recorded a video showing how she's using Be My AI in her daily life. It's about a minute and a half, so please see this with me. -A day in the life of a blind person on a lazy Sunday. First, I'm going to cook eggs. Open Be My Eyes. -Take picture. There are three eggs, -ask more. -This is how I use Be My Eyes to check if I've got any eggshells in the eggs. -of shell in the frying pan near the bottom left of the broken egg yolk. -Got him. -Take picture. -Ask more. -Ask more-- What's the expiry date of this oat milk? -05, 2024. -Perfect. We're in date. I may have used a little bit too much milk, but it tasted really nice in the end. Now I'm doing my washing I'm separating out my blues and blacks. We had a bit of a disaster the other day with Oliver drying all of my white bras. It's never going to happen again with Be My Eyes by my side. Boom. Into my new accessible washing machine. Now that me and Lola are having a roast, -thank you lovely fleece. -Analyzing picture. Roasted potatoes, mashed potatoes, peas, and a Yorkshire pudding. -Now I know what's on my plate no matter if I go to a buffet, I can just be served anything, and Be My Eyes will tell me. Now I'm going to open some PR that I got earlier on in the week. -It's so gorgeous. -Please make sure [? ] the rich and long-lasting scent-scape with its rich oils. Take picture. It has lemon, geranium, Bulgarian rose. -Oh, it's rose. This rose up. Oh, God. Time for bed, scrolling through Insta. When people don't audio describe their photos, I just ask Be My Eyes to describe it. -The picture features a woman who appears to be in her late 20s or early 30s. She has long dark brown hair and a fair complexion. -Night. Night. -I love Lucy. She's always so happy. -This is just a small sample of the feedback we've received and it's just so overwhelmingly positive that I've never experienced anything like it in the 25 years I've worked in software. I just want to take a moment and pause and thank OpenAI for inviting us in very, very early, so the GPT-4 Vision Alpha, and working closely with us and partnering on getting this feature out. You know who you are, and I just want to say that what you've enabled us to build is absolutely life-changing for our community. Since this is DevDay, I figured we should just look at a prompt. To me it was mind-blowing that we had to tell the model, a computer that it can't help people in the physical world. We had quite a few examples early on where we asked it to describe the layout of this room and then explain how I get to the staircase as an example. The response that came back is like, "The staircase is about 15 feet away, take my hand and I will guide you there. " -It's like the model was trained on center of a woman and was pretending to be Charlie helping the colonel. Then a few numbers also to show how well this is doing, we rolled it out to a small Beta System group in March when GPT-4 launched, and we rolled it out about six weeks ago to all our users on iOS. We're seeing about a million image descriptions a month now. The satisfaction ratings that we get from our users is over the top. It's about 95% if we discount downtime and other system errors. We were able to add language support up to about 36 languages now, pretty much only by telling the model to respond in the same language that the user is using. This has just been amazing to roll out. We also were able to deploy Be My AI into our enterprise customer support product. The example here is Microsoft's Disability Answer Desks, where users can now choose to start with a chatbot instead of a call. So far we're seeing that 9 out of 10 users who start with a chat, they don't escalate to a call. The model answers their question about how to fix a certain problem on their computer. We now have AI models that can see, and they can hear and understand what we're saying. They can speak in human-like voices, and we believe that they will profoundly improve accessibility in assistive technologies in the future.