Build an AI agent with LiveKit for real-time Speech-to-Text 🤖 | Full Python tutorial

10:07

Build an AI agent with LiveKit for real-time Speech-to-Text 🤖 | Full Python tutorial

AssemblyAI 10.03.2025 37 854 просмотров 457 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

NOTE: A new version of LiveKit agents has been released since the recording of this video. Our updated blog has the new code blocks - https://www.assemblyai.com/blog/livekit-realtime-speech-to-text 🔑 Get an AssemblyAI API Key: https://www.assemblyai.com/dashboard/signup?utm_source=youtube&utm_medium=referral&utm_campaign=yt_ry_9 🧑‍💻 GitHub repo: https://github.com/oconnoob/realtime-stt-livekit-assemblyai 📃 Blog post: https://www.assemblyai.com/blog/livekit-realtime-speech-to-text/?utm_source=youtube&utm_medium=referral&utm_campaign=yt_ry_9 🟦 LiveKit Docs: https://docs.livekit.io/home/ Learn how to add an AI Agent for real-time Speech-to-Text to your applications in this comprehensive tutorial! I'll walk you through creating a LiveKit agent that instantly transcribes audio streams using AssemblyAI's Streaming Speech-to-Text API. Perfect for developers looking to enhance their real-time communication apps with AI capabilities. 🧠 What You'll Build: ✅ A complete LiveKit server setup connected to a web application ✅ A Python agent that processes audio streams in real-time ✅ Instant transcription delivery to all participants ✅ A working demo you can test and modify 🛠️ Technologies Covered: LiveKit Cloud & Agents Python Async Programming AssemblyAI's Streaming API WebRTC Fundamentals This tutorial takes you from basic concepts to a fully functioning application. By the end, you'll understand how to implement real-time transcription in your own projects and have the code to prove it! Whether you're building a video conferencing app, creating accessibility features, or exploring AI integration, this guide has everything you need to get started. ▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬▬ 🖥️ Website: https://www.assemblyai.com 🐦 Twitter: https://twitter.com/AssemblyAI 🦾 Discord: https://discord.gg/Cd8MyVJAXd ▶️ Subscribe: https://www.youtube.com/c/AssemblyAI?sub_confirmation=1 🔥 We're hiring! Check our open roles: https://www.assemblyai.com/careers ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Timestamps: 00:00 - Intro 00:37 - How LiveKit works 01:20 - Step 1: Set up the LiveKit server 03:04 - Step 2: Set up the frontend application 03:58 - Step 3: Build the AI Agent 08:44 - Application demo! 09:43 - Build a chatbot in Python with Claude 3.5 Sonnet #MachineLearning #DeepLearning #LiveKit #AssemblyAI #SpeechToText #AITranscription #WebRTC #PythonTutorial #RealTimeAI

Оглавление (7 сегментов)

Intro

AI agents are one of the hottest use cases for AI right now this is partly because they're a great example of composing currently existing Technologies using tools like speech detects to pull many modalities together into a central reasoning and action engine powered by llms the flexibility and generality of AI agents means they have potential use cases in many Industries but they have a particular potential to transform real-time experiences being able to delegate tasks to an AI agent that can understand and act on your commands in real time is a GameChanger whether you're in a business meeting asking it to schedule a follow-up call or your live streamer ask asking an agent to set up a poll in your chat in this video we're going to show you how to build an AI agent to perform real-time speech to text in your web application building performant realtime

How LiveKit works

applications is notoriously difficult we'll be using live kit to make this easier which is a platform for building real-time audio and video applications that abstracts away many of these messy details additionally live kit provides a flexible agent system which allows you to rapidly build AI agents and incorporate them into your application here's how live kit works at the core of a live kit application is a live kit server users called participants connect to the server and join a real-time session which is called a room participants can then publish streams of data called tracks to the room as well as subscribe to tracks published by other users these tracks are commonly audio or video streams but any arbitrary data stream will do and by centralizing all of the data streams in one place and forwarding them only to who needs them live kit can provide low latency high quality real-time Communication in order

Step 1: Set up the LiveKit server

to build your realtime application you'll need three essential components first you'll need a live kit server which manages the realtime session second you'll need a front-end application that your users will interact with and third you'll need an AI agent that will connect to the server and transcribe incoming audio streams let's start by setting up the live kit server live kit is open source which means you can self-host your own live kit server this is a great option if you want full control over your infrastructure in this tutorial we'll use live kit Cloud which is a hosted version of live kit that's managed by the live kit team this will make it easy for us to get up and running quickly and it's free for small applications go to live kit. and sign up for a free account you'll be met with a page that prompts you to create your first app name your app streaming St and click continue after answering a few questions about your use case you'll be taken to the dashboard for your app your dashboard shows information about your live kit project which is essentially a management layer for your live kit server you can find usage information active sessions as well as what we're interested in which is the server URL and API keys go to settings keys and you'll see the default API key that was initialized when you created your project in a terminal create a project directory and navigate into it inside your project directory create a m file to store the credentials for your application and add the following text back on the keys page in your liveit dashboard click the default API key for your app this will display a popup model where you can copy each of the values and paste them into yourm file note that you'll need to click to reveal your secret key yourm file should now look something like this note that yourm file contains your application credentials so make sure to keep it secure and never committ it to Source control now that our server

Step 2: Set up the frontend application

is set up we can move on to building the front-end application Live kit has a range of sdks that make it easy to build in any environment in our case we'll use the live kit agents playground which is a web application that lets you easily test the agent system using this web application will allow us to quickly test out our speech to text agent that will build later the AI agent playground is open source so feel free to read through the code for inspiration when you're building your own project additionally we don't even have to set up the agents playground ourselves live kit has a hosted version that you can use go to agents playground. kit. and you will either be automatically signed in or met with a prompt to connect to live kick Cloud sign in if prompted and then click on the streaming stt project to connect to it you'll be taken to the agents playground which is connected to the live kit server for your streaming stdt project on the right you'll see the ID of the room you are connected to as well as your own participant ID you can disconnect for now by clicking on the button on the top right it's time to

Step 3: Build the AI Agent

build our speech to text agent before we start writing code you'll need to get an assembly AI API key you can get one by following the link in the description the free offering currently includes over 400 hours of asynchronous speech DET text as well as access to audio intelligence models but it doesn't yet include streaming speech DET text so you'll need to set up billing before you continue once you've done so you can find the API key on the front page of your dashboard copy it and paste it into yourm file now we're ready to start coding you'll need to have python installed on your system if you don't already so go do that first if you haven't already back in your project directory create a virtual environment you can use the appropriate commands for your system as listed here next install the required packages this command installs the live kit python SDK the assembly AI plugin for live kit and python. M which we'll use to load our environment variables from the M file now it's time to build the agent which is based on an example from the live kit examples repository create a new file in your project directory called stt agent and add the following code we start with our Imports load our environment variables and then instantiate a logger for our agent now we can move on to writing the main agent code we start by defining an entrypoint function which is executed when the agent is connected to the room the entrypoint function is an asynchronous function that accepts a job context to start we just log a message when the transcriber is connected to the room and instantiate an assembly ai. St object this object is responsible for handling the speech to text and it satisfies the live kit agents st. st interface next still within the entry point function we Define an inner function that tells the agent what to do when it subscribes to a new track The Decorator indicates to what event this function should be bound in this case track subscription the function creates a new asynchronous task that transcribes the audio track using the transcribe track function that we'll add next add the following inner function to your entrypoint function this function first creates an audio stream object from the track and then creates an assembly AI speech stream object using the stream method of our assembly ai. St object the speech stream object represents the bilateral communication stream between your liveit agent and assembly AI audio segments are forwarded to assembly Ai and transcripts are received next the function creates an stt segments forwarder object which is responsible for forwarding the transcripts to the room so they can be displayed on the front end to transcribe the track we need to do two things in parallel first we need to receive the audio track from the live kit server and send it to assembly AI for transcription and second we need to receive the response transcript from assembly Ai and forward it back to the live kit server we do this using the asyn io. gather function which runs these two tasks in parallel we will Define these tasks next first we Define handle audio input add the following inner function to the entry point function this function listens for audio frames from the audio stream object and pushes them to the speech stream object the audio stream object is an asynchronous generator that yields audio frames from the subscribed track which we forward to assembly AI using the push frame method of the stt stream now add this inter function to the entrypoint function this function does the converse of the previous function it listens for speech events from the speech stream object and forwards them to the stt segments forwarder object which in turn forwards them to the live kit server when it receives a final transcript event it prints the transcript to the console you can also add additional logic to for example print out interim transcripts you can learn about the difference between interim or partial transcripts and finals transcripts and this section of our blog on transcribing twilio in real time finally add the following line to the entry point function at its root level to connect to the live kit room and automatically subscribe to any published tracks so to summarize the entry point function is connected when the agent connects to the live kit room the agent automatically subscribes to every audio track published to the room for each of these tracks the agent then creates an asynchronous task which simultaneously one pushes audio frames to the assembly AI speech detect stream and then two receives transcription events from the assembly AI speech detect stream prints them to the agent server console if their final transcripts and then forwards them to the live kit room so they can be sent to participants in our case to power the chat feature on the front end so your entry point function should now look like this finally we Define the main Loop of our agent which is responsible for Connect to the live kit room and running the entrypoint function add the following code to your stt agent. file when the script is run we use live kit c. runapp method to run the agent specifying the entrypoint function as the entry point for the agent now it's

Application demo!

time to run the application go back to the agents playground in your browser and click connect remember the playground is connected to your live kit project now go into your terminal and start the agent with this command ensuring the virtual environment you created earlier is active the agent connects to your livek project using the credentials in yourm file in the playground you'll see the agent connected status change from false to True after starting your agent begin speaking and you'll see your speech transcribed in real time after you complete a sentence it will be punctuated and formatted and then a new line will be started for the next sentence in the chat box on the playground in the terminal where your agent is running you'll see only the final punctuated and formatted utterance is printed because this is the behavior we defined in our stt agent. file and that's everything you've successfully built a real-time speech DET text agent for your live kit application remember you can self-host any part of this application including the live kit server and the live kit front end check out the live kit docs for more information on building live kit applications and working with agents

Build a chatbot in Python with Claude 3.5 Sonnet

otherwise you can feel free to check out our blog or our YouTube video for other videos like this video on building a chatbot in Python using Cloud 3. 5 Sonet here's a quick demo of the application we're going to be building it first takes in an audio or video file in this case I'm using an audio recording of a phone call

Другие видео автора — AssemblyAI

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник