🎙️Build An AI Voice Agent With DeepSeek R1 (Python)

13:17

🎙️Build An AI Voice Agent With DeepSeek R1 (Python)

AssemblyAI 11.02.2025 35 654 просмотров 771 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

🔑 Get your AssemblyAI API key here: https://www.assemblyai.com/?utm_source=youtube&utm_medium=referral&utm_campaign=yt_smit_28 Github repo: https://github.com/smithakolan/DeepSeek-R1-AI-Voice-Agent AssemblyAI Streaming docs: https://www.assemblyai.com/docs/getting-started/transcribe-streaming-audio?utm_source=youtube&utm_medium=referral&utm_campaign=yt_smit_28 💬 Build a Real-Time AI Voice Agent with DeepSeek R1 & ElevenLabs! 🎙️🚀 In this video, I’ll show you how to create a real-time AI voice agent using DeepSeek R1 (7B model), AssemblyAI for speech-to-text, and ElevenLabs for text-to-speech. This setup allows for seamless AI conversations, where the voice agent listens, thinks, and responds in natural-sounding speech! 🔹 What You’ll Learn: ✅ How to transcribe speech in real-time using AssemblyAI ✅ How to use DeepSeek R1 via Ollama for AI-generated responses ✅ How to convert text responses into realistic AI voices with ElevenLabs ✅ How to build a low-latency AI voice assistant for real-time interactions Timestamps: 00:00 - Intro 01:00 - Demo 01:49 - Installing AssemblyAI, Ollama for DeepSeek R1 and Elevenlabs 03:38 - Building the AI voice agent in python 12:35 - Demo ▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬▬ 🖥️ Website: https://www.assemblyai.com 🐦 Twitter: https://twitter.com/AssemblyAI 🦾 Discord: https://discord.gg/Cd8MyVJAXd ▶️ Subscribe: https://www.youtube.com/c/AssemblyAI?sub_confirmation=1 🔥 We're hiring! Check our open roles: https://www.assemblyai.com/careers ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ #MachineLearning #DeepLearning

Оглавление (5 сегментов)

Intro

in this video I'll show you how to build an AI voice agent in Python using deep seeks R1 model since the R1 model has reasoning capabilities you'll see in our demo that the model actually explains its Chain of Thought on how it came to an answer or conclusion this is extremely useful when it comes to solving difficult problems whether it's a coding challenge or a math problem because the model is actually able to understand how it came to a solution and also go back and think if it made any mistakes this actually makes it more smarter than regular large language models I'm traveling to Paris next week what are some things that I should see okay so the user is planning a trip to Paris and wants to know what to see H I need to provide a good list of must-see attractions without being too long first off Paris has iconic landmarks like the Eiffel Tower and the Lou Museum those are pretty much must visits for anyone there then there's Versailles with its Grand Gardens this AI voice

Demo

MoDOT is super easy to build the link to the GitHub repository will be in the description box below so let's get started the first thing we want to do is sign up for a free assembly AI API key you can find the link in the GitHub repository and also in the description box below for this YouTube video we'll be using assembly AI for realtime speech to text in order to transcribe what we're saying in real time after you've created your assembly AI API key the next thing you want to do is download AMA you can do that by going onto their website once you're on ama's website all you have to do is download it once you've downloaded olama onto your local machine we can then move on to the next step which is signing up for 11 Labs we will be using 11 labs for text to speech so that's why we need that next we want to install Port audio

Installing AssemblyAI, Ollama for DeepSeek R1 and Elevenlabs

now Port audio already comes pre-installed in Windows but if you have Linux or Mac you would need to install it so these are the commands for installing it for both of those devices what I've done is I've actually created a virtual python environment so what I'm going to do is just paste this into terminal after that we want to actually go ahead and install a few different python libraries so we want to install assembly AI of course we're going to be using it for real time speech to text Hama because we actually want to command and select different types of models and run it um and then 11 labs for of course speech to text so you can run all of these commands in your terminal lastly if you're on Mac you also want to install MVP for audio streaming so that we can play out our audio output from what 11 Labs gives us so you can do that by doing Brew install MVP in terminal as well once you have done all of this what you can do is now download or pull our deep seek R1 model from olama now in order to do this what you have to do is run the following commands keep in mind that for this tutorial I will be running the Deep seek R1 7B model but if you want to try something else feel free to do that so now I have successfully downloaded the Deep seek R1 model and then now we can get started writing our code the

Building the AI voice agent in python

first thing we're going to do is import all the library set we have downloaded so for example assembly AI 11 labs and AMA next we will create a class called AI voice agent the first thing we want to do is set up our assembly AI API key so right here is where you would include your own assembly a API key which you can find in your user dashboard once you log into assembly ai's website next we'll also Define our 11lbs API key similarly you would replace this entire thing with your 11 Labs API key as a string after that I'm going to create an empty transcriber object and I'm going to sent it to none and what this eventually will contain is our realtime transcriber after that I'm also creating a transcript object which will contain multiple transcripts as we continue talking with the AI agent but essentially the initial part of the transcript is containing a prompt to uh deep seek R1 just saying that you are a language model called R1 created by Deep seek answer the questions being asked in less than 300 characters and as we're chatting with this AI voice agent what we're going to do is append what we're saying to this full transcript as well as what the AI agent is saying so that way our AI agent or large language model always has a full context of what has been said by it and us next we're going to define a start transcription method which will create a realtime transcriber object using assembly AI as well as communicate with assembly ai's API so the first thing we do is Define a realtime transcriber object in that we're going to set a couple of parameters first off we're setting the sample rate to 16,000 now this is the default value after that we're also setting a couple of different methods on data on error on open and on close essentially all of these are methods which Define how our real-time transcriber should be behaving when each of these events happen so we'll be defining that in a short while next what we're also doing is within this trans uh start transcription method what we're doing is we are connecting the transcriber so once we have defined it we're also connecting it and then we are creating a microphone stream from assembly ai's API after the start transcription method we're also going to define a stop transcription method all this does is it's that's the transcriber object that we initially created To None back again back to its original state we will be calling this stop transcription method anytime we want our real time transcriber to stop listening after that we're going to define a series of methods which I talked about earlier which is all of these four methods which are required for the transcriber object so we're going to start off with on open all we're doing here is returning we're not doing anything but the default behavior is printing out the real time transcriber session ID the on data method contains a very important part of the logic behind our AI voice agent as you speaking this method is constantly receiving uh partial transcripts as well as final transcripts from assembly ai's API first off what are partial transcripts so partial transcripts are just words as you're talking so they're individual words meanwhile a final transcript is often times an entire sentence or essentially whatever you have said without taking a break of longer than 700 milliseconds the moment you take that 700 milliseconds break when you pause that is going to be considered a final transcript and that's going to be sent to this method as well this method behaves very differently based on what type of transcript it has received if it receives a partial transcript basically a single word as you're talking it's not going to do anything it's just going to print it but the moment it receives a final transcript which kind of signals that the user has stopped talking and has taken a long pause it also signifies that hey the question has been completely said and now you can send that question to the large language model at this point what we do is we call the generate AI response method which will be defining later on now this generate AI response method is where we will send this transcript rpt that full transcript of whatever you have said of that entire question and then we send it to deep seeks R1 model before we Define the generate AI response method we also want to define the final two methods which are required for our real-time transcriber which is on error and on close next we're going to start defining our generate AI response method the first thing we want to do in this method when it's called is actually stop transcription so we want our realtime transcriber to stop listening momentarily while we send our real-time transcript to the large language model so now back to our full transcript which we Define at the very top what we're doing is now we're appending our current real-time transcript basically the question that you the user has said or asked and then we are just printing that out now at this point we're going to send this transcript to deep seek R1 using Ama so in order to communicate with deep seek R1 we're going to be making use of AMA we're calling the chat method and to that we have to pass a couple of parameters first off the model name so in this case deep seek R1 7B and then the messages so in messages we're going to be passing the full transcript that we defined and for stream we're going to set that to true now what we're doing is taking the AMA string which contains the response from the deep SE R1 model and then we're actually going to pass that to 11 Labs so what LLS is going to do is generate an audio stream by converting this text into speech once we have that audio stream we're going to start playing that out so that is what this method right here does and this is the point where you hear the text to speech at the end of all this we're also going to be appending what deep seek R1 said back to the full transcript which contains the Full context of this conversation and then we're also going to start transcription again so that the realtime transcriber can start listening to what we're saying next at the end of all this we're just simply initializing the AI voice agent class that we created so we're going to initialize that so that we can actually start this process and then we start off by starting transcription so that's where the loop starts and then now you can actually hit run and start

Demo

demoing it I'm traveling to Paris next week what are some things that I should see okay so the user is planning a trip to Paris and wants to know what to see H I need to provide a good list of mustsee attractions without being too long first off Paris has iconic landmarks like the Eiffel Tower and the Lou Museum those are pretty much must visits for anyone there then there's Versailles with its Grand Gardens if you found this tutorial helpful click like and comment in the description box below your thoughts or any improvements that you have and to check out more videos like this click on this video right here

Другие видео автора — AssemblyAI

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник