# Python Speech Recognition in 5 Minutes

## Метаданные

- **Канал:** AssemblyAI
- **YouTube:** https://www.youtube.com/watch?v=qPHVIIXu0hM
- **Дата:** 06.08.2021
- **Длительность:** 5:31
- **Просмотры:** 2,363

## Описание

Python Speech Recognition in 5 Minutes using AssemblyAI's API. You can get your free API key by signing up at AssemblyAI.com

You can find the original blog post on how to do speech recognition in Python in under 25 lines here - https://www.assemblyai.com/blog/python-speech-recognition-in-under-25-lines-of-code

## Содержание

### [0:00](https://www.youtube.com/watch?v=qPHVIIXu0hM) Segment 1 (00:00 - 05:00)

what's up y'all today i'm going to show you how to use python to turn an mp3 file into a text like this in under 25 lines of code we're going to be following this blog so let's get right into it all right so you'll need an assembly ai api key which you can get by going to assemblyai. com and checking out signing up in the upper right hand corner you'll need an mp3 file and jupyter notebook so uh to install jupiter notebook you can pip install notebook or through vs code and then you can open it up with jupyter notebook and before we get started i want to just point out there's going to be an off key line but i'm going to skip because i have already imported my hotkey so let's get started we're going to import requests we're going to be sending htcp requests um so then we'll make a headers which will be uh the authorization we'll need an authorization uh and this will be the auth key that we um got from assembly ai and then we'll need the content type which will oops which will be application slash json um and then we'll create a generator function that will um yield the bytes of our mp3 file so with open file name maria's bytes uh while true data is going to be equal to file. read and we're going to give it a pretty big chunk size because we're reading an audio file uh if not data break and then we're going to yield our data in each while loop all right now we're going to send a request to assembly ai's upload endpoint so their upload endpoint is an endpoint which you can upload an mp3 file to and it will temporarily host it for you and so we'll be doing this to upload an mp3 file and then send that link as the audio url to the transcription request all right so this upload request uh upload response we're going to send a post request to https colon slash api. assemblyai. com v2 upload headers is going to equal our headers and then the data is going to equal read file and this is where you need the mp3 file so this is my mp3 file all right now let's look at our upload response cool so this tells us where our file was uploaded now we're going to um make that transcript request as i was saying well i should have just kept that but uh whatever um so audio url and that's going to be this okay so what we're going to do is we're going to send this transcript request to a transcript endpoint i'm going to send a post request and this is the endpoint for assembly ai's api that you send the audio file to so that you can get a transcription of it uh headers json equals transcript request right now let's take a look at our response cool so you know you can see in our response we got the id so we're going to need this in a moment um the language muscle model acoustic model we can see that it's cubes and then text and words so this is what we're going to be looking at later we're going to get the text and then the words is going to show us the confidence score of each word and when it was said so let's copy this id because we're about to send a get request to the transcript endpoint of that id so that we can get this json response back again so polling response is equal to um requests. get accps um assembly v2 slash transcript slash there we go and we'll need to send the headers and we'll take a look at what's taking a while to send okay so okay there we go polling pullingresponse. json and we'll see what it looks like and okay so we're already done um

### [5:00](https://www.youtube.com/watch?v=qPHVIIXu0hM&t=300s) Segment 2 (05:00 - 05:00)

we're already done transcribing uh usually you can expect about 30 of the length of your uh mp3 um so this is pretty quick because our v3 file is like five minutes i think um so yeah you can see the text and then the words right text the confidence of it and um when it was when it was said basically so yeah if you like that like and subscribe and i'll see you later

---
*Источник: https://ekstraktznaniy.ru/video/13373*