Transcribe Twilio Phone Calls in Real-Time with AssemblyAI | JavaScript WebSockets Tutorial
22:42

Transcribe Twilio Phone Calls in Real-Time with AssemblyAI | JavaScript WebSockets Tutorial

AssemblyAI 23.02.2022 19 794 просмотров 246 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
Learn how to transcribe Twilio phone calls in real-time with AssemblyAI using JavaScript and websockets. We use Twilio Media Streams and AssemblyAI's Real-Time Streaming Transcription for this. Get your Free Token for AssemblyAI Speech-To-Text API 👇https://www.assemblyai.com/?utm_source=youtube&utm_medium=referral&utm_campaign=yt_pat_16 References: https://www.assemblyai.com/blog/transcribe-twilio-phone-calls-in-real-time-with-assemblyai/ https://www.twilio.com/blog/live-transcribing-phone-calls-using-twilio-media-streams-and-google-speech-text Docs: https://www.twilio.com/docs/voice/twiml/stream#websocket-messages https://www.assemblyai.com/docs/speech-to-text/streaming Code: https://github.com/AssemblyAI/youtube-tutorials ▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬▬ 🖥️ Website: https://www.assemblyai.com 🐦 Twitter: https://twitter.com/AssemblyAI 🦾 Discord: https://discord.gg/Cd8MyVJAXd ▶️ Subscribe: https://www.youtube.com/c/AssemblyAI?sub_confirmation=1 🔥 We're hiring! Check our open roles: https://www.assemblyai.com/careers ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Timestamps: 00:00 Demo 00:44 WebSockets + Server setup 02:56 Twilio Voice Streams 10:50 AssemblyAI Transcription 19:33 Create Website #JavaScript

Оглавление (5 сегментов)

Demo

hello are you there listen i have an important message for you my flight will arrive tomorrow at 2pm and i need someone who can pick me up at the airport can you do this for me please call me back bye hi everyone i'm patrick and in this video you will learn how you can transcribe your phone calls in real time for this we have to do two steps first we use twilio to stream the phone call to our end and then we send it to assembly ai and transcribe it and all steps happen over web sockets so this can be done in real time and then for example we can display it on our website like here so let's get started so as i said we use web sockets to stream the data so as first step we want to set up

WebSockets + Server setup

the websocket and we also need a web server to handle the requests so here we use javascript so we create one file index. js and then we require the packages we need so we need ws for the websocket and express for the web server and of course we need to install these so in the terminal we can say npm install ws and express and then we can set up the server and the websocket so we say we create an app with express then we also create a server and we create a websocket like this then we handle the websocket connection so we say wss on connection so this is the connection event and for now we simply lock that we now have a connection and then we also want to set up a get request so a route this will be our home route that will handle a request and for now we simply return hello world so this will display hello world on the web page and then of course we need to start the server so here we lock something and then we say server. listen and port 8080. and now we can save this and run this file and then we should see listening at port 8080 so we can go to the localhost part 8080 and we see hello world so this is working fine we can also already try the websocket connection so in your console you can create a websocket so down here we see the log from this file and in the server sorry in the browser we now create a connection with this so we say var connection equals new websocket at ws and then localhost port 8080. so now if i hit enter then we see in this console new connection initiated so this is working fine and now as next step we want to set up the twilio media

Twilio Voice Streams

stream so for this there exists this twee ml voice stream that we need and you find this in the documentation i will put the link in the description so for this we need to do two things so the first thing is to set up this twee ml response so this is a twilio or the twilio markup language and this is a set of instructions you can use to tell twilio what to do when you receive an incoming call or sms or fax and in case of a stream it looks something like this so we need to return a response and here we want to say we start a stream that should stream to this websocket endpoint so this is what we need to do and then of course we need to handle the incoming stream messages so for this we also find the documentation here websocket messages from twilio so i recommend that you go through this a little bit by yourself but um what we need to do is there are four important events so connected start media and stop event so this is what we have to take care of so let's do this in our code so in our websocket connection we now say ws so this is our websocket on message so when a message is arriving then we want to handle the four different events so first we can parse this message so this is the twilio data that we get and this is in json format so we can parse this and then we want to switch over the four different events so we say switch message dot event and then we handle the four different types so case connected case start case media and k stop and for now we simply um lock something and then um say break and also the way this stream gets started is um we call our twilio phone number then twilio will send a post request to an endpoint that we will handle here and then in this post request we want to send this twee ml response back that tells twilio to start a stream so um we will set up we have to set up that twilio handles this endpoint and sends a post request so we do this in a moment but for now let's first implement this post request so or this post endpoint in our case so here we say app. post so we need a post endpoint and this will be asynchronous and then here we need to return this tuiml response so we could use the node. js package like this and then build up our um response like this or we can simply directly return this as a string so this should look something like this and yeah so first we specify the content type and then we say result. sent and then we need a response and in our case we want a stream and the url will be a web socket and at this end point so this will be our host name so this will actually then be this address so yeah this is what this is doing and then we could use more instructions here for example we can also say um say this so this will do a automatic voice for us that gives the caller a instruction so we say start speaking to see your audio transcribed in the console and then we wait for 30 seconds so that the caller can speak so yeah this is what we sent at our post endpoint and now we need to tell twilio to send a post request to this endpoint when a call is incoming and of course this is now hosted on our localhost but we need to make this public so we need to have a public url and for this we use ingrock so this is a very cool tool that very easily allows you to have a public url that then links to your localhost so you have to install this and after you've installed this you can use this in your console and then you can use a simple command so you say ngrok http and then the port so 8080 and hit enter and then you see this forwarding url so this is now a public url that then points to your localhost and this is the end point that we need so um here we simply use a forward slash so we have no particular route so we can simply use this endpoint and then we can go to our twilio account so of course you need to sign up so you can do this for free and then get a free trial with a few dollars so i'm also still on the free trial and the first thing you need to do is you can get a phone number so this will be your phone number that you can use for testing and then in your dashboard you can click on active numbers and then here on the right you already see that there is a web hook with a post request that points to this endpoint so this is when i tested this before but now we want to use this for our new endpoint that we just created so if we have a look back at the terminal so this is the one we need so we grab this and say copy and then here we can click on this and then we can scroll down and down here for accept incoming voice calls a call comes in then we want to send a post request to this endpoint so now i did a copy and paste and used this new http address and this then points to our localhost so now we can click on save and this is basically everything we need to do in the twilio dashboard so now um we could already call the um the phone number that we get so if we go back up here this is my phone number in my case so i could call this and then it sends a post request to this address which then points to the localhost and in this case the post request will return or send this back so this is the twilio markup language and then twilio knows to start the stream and then the stream will arrive at this websocket that we handle here so this should already work now and now as last step we need to handle this of course so here is where we want to set up um assembly ai api and then send this to assembly ai and transcribe it also in

AssemblyAI Transcription

real time so to do this let's up here create a object that will be our websocket to assembly ai and then we also create an array to keep track of the different audio chunks and then in our post request here we set up the websocket to assembly ai so this only happens once when a call arrives so now we create our connection so we say assembly equals new websocket then here we get the end point and we also of course need to put in our api key so for this i just recommend to go to our docs so here you find the real time streaming transcription endpoint and all the further instructions so this is really cool and of course you need a assembly ai api key so you can start for free and then sign in and in my case i already have an account and then here you can grab your api key copy and paste and use this here so now we have the assembly websocket and now in our websocket connection here is where we want to handle this now so the first thing we do here is we check if we have a websocket and otherwise we lock an error and return and then the first thing we want to do is in the connected event first we set up that we want to lock an error to assembly if we have an error then we create a dictionary or a map to keep track of the different texts and now we create this on message callback and this will execute when there is a message so in this case what we want to do is we want to extract the message and then either lock it or maybe also send it to another website and display it so first we parse this data so this will be a json response again and again you can find a good documentation in here and the way this now works so um this will send partial result results as soon as it starts picking up a stream and once the stream is complete it will again go over this and maybe make small corrections so but of course we only want to display the results once so now basically we want to keep track of the different results so here we store the text in a dictionary and we give it the key audio start so this is also a key that we get here in from assembly ai so this will be the start time of audio sample relative to session start in milliseconds and then the corresponding text so we store this then here we simply sort this in ascending order and then we create our final strings so we iterate over all the different keys and then append it to our message so this way we have this in the correct order and only have it once and then of course we can also simply lock this for now and later we will see how to also stream this to the website then so yeah so this is a event listener so we only set this up once here in case of the connected event and then later whenever a message from assembly is streamed to us then it will be handled in here so this is the first part we have to do then we also want to send a end signal back in case of the stop signal so here we say assembly dot send and then we send this terminate session key with true back so this is the next part we have to do and now of course this will be now the important part so in case of a media event so when we get a audio signal a audio stream from twilio here we now want to send this to the assembly ai api so now let's handle this media part so let's have a quick look at the documentation again so there is this media message that we get and from this we get the media payload so this is the raw audio data encoded in base 64. and yeah there are some more relevant information here for example we find that the encoding is in this format and this will be a little bit tricky because we actually need to have this in a different encoding for the assembly ai api so if we have a look here we see that we need pcm encoding so this is basically a different way of encoding the data and so basically we need to access the payload and then convert this and there's a very handy module or package that we can use in node and this is the wav file package so here we require this and of course we also need to install this so we can say npm install and then wave file and then we have it and then let's go back so down here now we want to access this and convert it to the correct um encoding and then send this back to assembly so here we say const twilio data equals message. media. payload and then so this is the raw audio data and then we also removed the logging here because later in our event locker we also lock something and now here we have to convert this so we create a new wav file and then we say wav file from scratch with the correct parameters and then we have this handy method that says wav file from moo law so this co decodes a 8-bit moolar wav file into a 60-bit wav file and then we also want to get the raw audio data and basically remove the wav file header so we split this and then only take the second part and then we create a new buffer and we create this now again in base 64. and then we slice this so here we basically cut off the header that we don't need so now we only have the audio data so the audio information without the header and then um twilio sends the audio durations in 20 milliseconds but we need to have at least 100 for the assembly ai api so um that's why we keep track of those chunks here so now we want to push this to the chunks and then check if we have enough chunks so if this is big enough if this is over 100 milliseconds then we put this again together in one buffer and then again um encode this and then we send this to assembly so here we say json stringify and it needs the key audio data and now the encoded audio and then we empty the chunks again so yeah this might be a little bit tricky to have it in the correct format but um yeah here is the steps that you need so you can also find the code on github i will upload this and put the link in the description and yeah this is basically everything that we need so now this should already be working so one also important thing i want to mention is that here um in the end point we also can specify the sample rate so in our case this is only 8 000

Create Website

and yeah so now let's do one more step so instead of returning hello world we want to now display this on the website so here instead of returning hello world we want to send a file so we say results and file and then here this will be our index html file and then up here we also need path so we say const path equals require and then the path and then of course we also need a index html file so i already prepared this here so this is a simple html syntax where we give it a h1 tag a h3 tag and then in here and we paste in a script and here we add a event listener where we also create a websocket at this port 8080 and then we say also this on message event handler and then we extract the data and then we give it this event id interim transcription and then we set the inner html to data text so basically here we set the new text to the website and now in our index. js also when a new text arrives so in here we lock the data and now here we also want to send this now to the to this websocket event so let's do this so here we say um wss wss. clients for each client so we can basically also stream this to more clients so but in this case we only have one more which is the websocket client that we have here and here we say if client ready state equals websocket open then we send a message so we stringify this and we set the event to the same event that we use here and then we want to send a text key as message so the text is basically here what we extract at this site and yeah this is all that we need so now we can save this and then again let's stop this and run this again and now we can go to the localhost 8000 localhost 8080 and reload this and now we should see this index. html and now let's start or try calling our phone number that we see here and test this test is it working yeah the text is coming super cool so i hope you enjoyed this tutorial and i hope you subscribe to our channel and then i hope to see you next time bye

Другие видео автора — AssemblyAI

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник