How to Build a Podcast Summarization Web APP in Python and Streamlit
30:28

How to Build a Podcast Summarization Web APP in Python and Streamlit

AssemblyAI 05.06.2022 6 062 просмотров 112 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
In this tutorial, we will build a web app using Streamlit that summarizes podcast episodes into chapters. Get the code here: https://github.com/AssemblyAI-Examples/python-speech-recognition-course Get your Free Token for AssemblyAI Speech-To-Text API 👇ttps://www.assemblyai.com/?utm_source=youtube&utm_medium=referral&utm_campaign=yt_mis_course2 ▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬▬ 🖥️ Website: https://www.assemblyai.com 🐦 Twitter: https://twitter.com/AssemblyAI 🦾 Discord: https://discord.gg/Cd8MyVJAXd ▶️ Subscribe: https://www.youtube.com/c/AssemblyAI?sub_confirmation=1 🔥 We're hiring! Check our open roles: https://www.assemblyai.com/careers ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ #MachineLearning #DeepLearning

Оглавление (6 сегментов)

<Untitled Chapter 1>

all right now it's time to build a podcast summarization app and we're also going to build a web interface for this application in this project we are again going to use assembly ai's api that offers the chapterization summarization features and we are going to get the podcast from the listen notes api so let's get into it here is what our app is going to look like once we are done with it so we will get a episode id from listen notes api i will show you how to do that and when we click this button it will give us first the title of the podcast and an image the name of the episode and then we will be able to see different chapters and when they start in this episode and if we click these expanders we will be able to read a summary of the chapter of this episode this is all quite exciting to start building a web front end for our application too so let's start building it so in this project like in the previous ones we are going to have a main script and supporting script api communication where we have all of our supporting functions that we want to use over and over again we built this before so this is the exact same one from the third project the project that we did before and we will only need to update this and change some things to start doing podcast summarization the first thing that i want to update here is that we will not actually need the upload endpoint anymore so i'm just going to go ahead and delete that one because um the transcripts are going to be or sorry the podcasts received from the listen notes api so it's going to be somewhere on the internet we will not download them to our own computer so we can immediately uh tell assembly ai hey here's the audio file here is the address of the audio file that i want you to transcribe and it will be able to do that so there will be no download or upload needed that's why i also don't need the upload function also the chunk size not relevant anymore all right so that's good for now and the next thing that we want to do is to set

Set Up the Listen Notes Api Communication

up the listen notes api communication so we are going to use assembly ai to create the summaries of the podcast and we will get these podcasts from listen notes if you've never heard of it before listen notes is basically a database of podcasts i think nearly all of the podcasts so you can search for any podcast for example one of my favorites is 99 invisible and you will be able to get all of its information plus the episode so you can search for episodes here if you would like to uh what we're going to do with listen notes is that we are going to send it a episode id a specific episode id that we will find on um the platform itself so let's say i want to get the latest episode of 99 invisible if i go to the episode page and go down to use api to fetch this episode i will see a id so this is the id of the specific id of this episode and using this id i will be able to get this episode and send it to assembly ai and this is exactly the id that we need on our application so um to get that first of course we need the listen notes endpoint listen note has a bunch of different endpoints but the one that we need is the episode endpoint to get the episode information so i will just name this listen notes episode endpoint and it is this one and of course we also need the header again to authenticate ourselves and in the header we're going to need to put a api key so all you have to do is go to listen notes create an account and get an api key for yourself and we are going to go and paste it here and here as you know we are importing the api key for assembly ai now i'm also going to import the api key for listen notes and we are going to send it with our requests to listen notes so i'll call this the listen notes headers and this is the assembly ai headers and for listen knows this is named x listen api key all right the first thing that i want to do is to build a new function that is able to get the episode id and give us the url to the podcast's audio file so i will call this one get episode audio url and it is going to get an episode id and we're going to send a get request to listen notes let's build the url first the url is going to consist of the listen notes episode endpoint and a dash plus the episode id and we are going to send a get request to this url i will call the response we get response for now and the last thing that we need of course is the headers for authentication and that one is called listen notes headers so after we do this we should be able to get a url for the episode id and the information is going to be sent us in the json format so this way we'll be able to see it so maybe uh let's try this at first and see that it works so to do that i am just going to again import from api communications import everything i'll just make this a simple python script for now and i'm going to call get episode audio url and i will use the episode id that i found here this one to keep things simple and as a result we will print the response that we get from listen note so let's run this and see what happens all right this is a really long so maybe i'll i will use a pretty print to make it more readable so this pretty print here and instead of this just use pretty print okay let's do it again all right that is slightly better let's see what kind of information we are working with um nice we get the audio url here this is the url of the audio let's see where that takes us yeah this is just the audio of this podcast you can hear it ferocity that the roman advance was halted nice all right so this is exactly what we need but if you want you can also get some extra information about the podcast if you want to display it in some way we will definitely just display it this is a description of the episode whether there is explicit content or not um the image of this episode uh and some extra information about the podcast like facebook handle google handle etc so you get a lot of information so if you want to make your web application and your interface even more interesting more interactive you can of course include more of this in your application so if we just return data audio from here we will actually just return the audio url but you know now that we have all this information might as well extract some more of it so some of the things that we can get as a thumbnail of this episode um name of the podcast and title of this episode for example like we said we will display here so let's do that this will be the audio url we will also get the episode thumbnail we can get the podcast title that would be in podcasts and then podcast specific information and then we get the title and lastly episode title i think it is just title and we can just pass all of this information back episode thumbnail episode title and podcast title so we don't really need to change much from the rest of the functions for example transcribe poll get transcription result that we already built beforehand the only thing that we need to change is now we're not going to do sentiment analysis we need we want to do use auto chapters features of assembly ai so i am just going to rename these to all the chapters this is just the name of a variable so it is not that important you can keep it the same but for readability it's probably better to change it to all the chapters but here in this variable we need to change this name to auto chapters because we are sending this request to assembly ai and it needs to know that we want other chapters what else we also just updated the name of the header so it's not only headers now it's assembly ai headers same here um and the polling we do not need to change anything we are only asking if the transcription is done or not again in get transcription result url we want to change the two other chapters one other thing that i want to change is it's a very small but normally we were waiting for 30 seconds but now i want to wait for 60 seconds because podcast episodes tend to be a little bit longer so we want to wait in between asking assembly ai if the transcription is ready or not this is another change but the main work is going to happen in the save transcript function so the main change we're going to need to do in save transcript function is that before we were uploading our audio to assembly ai and then we were getting the result back but instead this time we are going to only have a episode id and then we are going to get the url from listen notes and then we are going to pass that to assembly ai to start the transcription uh so what i want to do here is to instead of url and title i will just give save transcript the episode id and then i will run the get episode audio url from oops from inside the save transcript and as a result what we're getting is order url episode thumbnail episode title and podcast title again we are not doing sentiment analysis we are doing order chapters and we need to pass the order url to get transcription result url um transcription result url gets the auto url as url and other chapters but it is not defined so you know this is what we want to do so let's just call it true here the next thing that we want to do is to deal with the response that we get from assembly ai so let's first see what the response from assembly ai looks like when we're doing auto chapters and then let's deal with it but let's fix some of the problems here so i will not save it into a file for now i can comment these out this will be other chapters the main thing that i want to do is see what the result looks like right so i will pretty print the data and the data is already in json format um transcribe yes it is yeah so i will just uh show that so i'm just going to comment these out for now just so that you know we have an idea of what the response looks like to run this i will just pass the episode id to save transcript oh we're still printing um this one so i will actually stop printing the response from listen notes let's start it again all right so we got the results let's see what it looks like it's a lot of information let's scroll to the top and what we wanted was the chapters basically so let's see what the chapter information includes so as you can see this is one chapter and this is another chapter so for each chapter we have the starting point and then we have the ending point the gist of the chapter so really quickly what is this chapter about we have a headline for this chapter and a summary so in a couple of sentences what is happening in this chapter what is the presenter talking about what we want to do is to show this information on our application right on our web interface so that's why what we want right now is to extract this information from the response we get from assembly ai and then save it somewhere and then we can visualize it on our stream that application so i will undo the commenting here also here so i will call this file with the episode id it will be episode id id. txt and as we always do i i'm just going to save the transcript you know we don't have to touch this so much but i will start another file and let's call this chapters file name and this one will be episode id um plus maybe let's call like chapters. txt all right so the chapters will be another file so i'm going to keep all the chapter information somewhere else and in here i'm going to write some of the information i got from assembly ai specifically chapter information and i'm also going to include some of the information i got from the listen notes api uh one mistake here i do not want it to be a text file i json file so that it will be easier to parse easier to read later for me the first thing that i want is the chapters and i'm going to get that from the data variable it's called chapters so let's check this section is called chapters yeah so let's start it i will say episode data at first let's include the chapters again i will call it chapters and then inside this episode data what do i want the episode thumbnail title and i want the podcast title so that i have all of this information in one place saved on my file system i can just read it whenever i want and display it to the user and finally dump that to the file episode data and i'll let the user know that the transcript is saved um this part we don't need anymore and again if there's an error we will just say that there is an error and we will return true now that we've got this far ready up till now what we do is get the url from based on the episode id from listen notes and then send it to this url to assemble the ai gets audio chapters information and then save it to a file so let's see that this works well and while it's running we will start the streamlet application so i will just run this again but in the main we of course need to call safe transcript okay we're already doing it so i will just run the application and let's also start building our stream with application now

Streamlit

so if you've never heard of streamlit before it is a really easy way to start building web interfaces for your application specifically for python it's very simple to use it is has a very simple api it's a very simple library so what you have to do is you call you import streamlet as sd if you wanted to use it simply and let's say if you want to you know put a title in your application all you need to do is sd title and then you can show that it is a title so i will run this separately to show you how it works and to run through with applications you just need to say extremely run main the pi um stream it is installed on your computer like any other python library so you just need to use pip uh say pip install streamlet and you will be good to go unless you make a mistake and call stream with a capital s which is not the case it needs to be a lowercase s so let's do that again right so this is actually an application it's the only thing we're showing right now is a title and we know what you want it to look like is this so i will start building the um elements of in this application so the first thing that you know strikes us is that we have a sidebar we have a title that says podcast summaries and then we start showing the information from the um information we got from the apis that we've been using so let's put a sidebar maybe let's fix the title first we want to say podcast summaries title says podcast summaries i can even say welcome to our or to my application that creates podcast summaries let's see maybe that will be too long but we'll see and let's create the sidebar it's quite simple you call streamlight sidebar dot texts input yeah and then you know you can say please input a and episode id and i can also have a button at the end of the sidebar that says get podcast summary maybe with the exclamation point too so let's run it again okay this is looking more like it says welcome to my application that creates podcast summaries um i can put an episode id here and then i can say get podcast summary so you see that it is running because i forgot to um comment out this one so it's actually running the whole application i'll just stop it for now because we don't have any way of displaying whatever we get back from the apis so i'll stop this now and now that we have the application looking more or less like what we want it to look like let's wait for um the chapter results to be printed on our file and then we will see what it looks like and then we can start parsing it and then showing it to the user on our streamled application okay so the transcription is saved our autochapter creation is done let's take a look at what it looks like we have the chapters section we have the episode thumbnail episode title and podcast title um all good in the chapters we have chapter numbers and inside each chapter we have the summary headline just start and end so it looks good let's start showing this uh the first thing that i want to show of course like we did in the beginning that we showed in the beginning is the name of the episode or maybe name of the podcast plus the name of the episode and then the episode thumbnail so how i'm going to show that is again using streamlit and that is going to be the header for me and i will include the podcast title maybe the dash in between and the episode title but as you can see we do not have it yet so first we need to open the file that includes these things and those things is the episode id that underscore chapters at so let's jason that again uh file name would be episode id uh underscore chapters jason and where do i get the episode id from the text input so the user is going to import an episode id and then i am going to save it here in this variable and that way i will have the file name so then i just need to open this file and let's call it data for example i need to import json of course and load it into the variable data so in this variable data what do we have the chapters so first let's get the chapters data chapters and then what we want to get is the podcast title and then the episode title let's change the names episode title and we also want the thumbnail um and what did we call the thumbnail we can see here uh episode thumbnail all right episode thumbnail so we are already showing the podcast title and episode title stream with header and then we can show the image uh thumbnail about the streamlet image function and from this point on the next thing that we want to show is the chapters of course uh one thing we can do is for example we

For Loop

can use a for loop i could say for chap in chapters you know you can just say streamlet write or just show the chap but that's one way of doing it but then you're going to have a lot of text one after another and it's not really nice what we want is like in the original one i showed you at the beginning we want expanders so it's

Create Expanders with Streamlight

quite easy to create expanders with streamlight again you just say streamlet expander and then you want you write what kind of information you want to be in your expander so as the title of the expander i will write here what i want in title and whatever i want inside the expander i'm going to write inside so i do not need to use a streamlet thing again because this is going to be inside the expender and inside the expander what i want is the summary so i think it was called summary let's just check again here in our json file in chapters we have summaries called summary yes so i want the summary to be in there and as a title of the expander i want there to be the gist of each chapter so for each chapter it's going to show me um the expanders for each chapter there will be expanders and the title of the expander will be the gist of this chapter and inside the expander we are going to have the summary of this chapter so let's run this and see how it looks but let's first make sure that everything works so i have the title and then i ask for a episode id from the user there is a button that starts this process and for that to happen i'll just call this button so we this information this button uh variable has information of whether this button has been pressed or not and i only want this part to happen this visualization the display part to happen if the button has been pressed so i'm going to wrap this all in a condition so otherwise it's not going to happen yes but right now if someone presses the button nothing really happens so we also need to add an action to this button and how we're going to do that is we're going to say on click if this button is clicked what we want to happen is the save transcript file to be run so i'm going to call it here in the onclick argument and we also have arguments right and here is how you pass arguments to your function that you call from your button this is a tuple that's why you write the variable or the argument that you're passing to the function and the first one and the second one is empty now when the button is clicked this one should run and we should be able to see all the information on our application so let's run it again and see what happens yeah we need to run the streamlight application this time streamlight run main. pi i'll close the old ones so we know the difference and which one is which this is just the example from the beginning all right so we want to get a podcast and we want to display it i will get this one again

Podcast Summary

let's get the podcast summary and here it is uh we have the title welcome to my application that creates podcast summaries okay maybe that's a bit too long i will shorten it the name of the podcast name of the episode number of the episode also the missing middle and here are my chapters so apparently there are one two three four five six seven chapters assembly ai's api was able to um find and in each chapter we have the gist of the chapter as a title of the expander and the chapter somewhere here one last thing that i want to add is the start and end point of the or just the start point of the chapter here because i want to show like how long a chapter is maybe so let's do that so for that i want to see in this json file how it looks so the start looks like this so these numbers might look a bit random to you but basically they are milliseconds so i want to turn it into um minutes and seconds and if applicable hours there is already a function that can do that here it is we don't need to you know work on it for a long time basically uh you get the milliseconds and when you can get the seconds out of it how many seconds there are how many minutes there are and how many hours there are so basically you're counting the hours and everything that is on top of the hour is mentioned as a minute if it doesn't add up to an hour and everything that does not add up to a minute is pointed out as seconds and here is what we will return so we'll say start time is either hours minutes and seconds and if there is no hours we don't have to say zero something so we just show minutes and then seconds and how i'm going to show it is within the expander title and i can you know show it with a dash in between i'll say get clean time and in there what i want is a chapter start let's see what it was uh it's just start okay all right let's run it one more time and then see what our application looks like awesome okay this is our application on the sidebar we can input a episode id that we get from listen notes we can say get podcast summary it will show a nice title of the podcast title of the episode show us a thumbnail of this episode and for each chapter we showed the gist of the chapter kind of like a headline when this chapter started and when you click the expander when you expand it you get the summary of this chapter so this is what we set out to do when we achieved it i hope you were able to follow along again don't forget to go grab the code from the github repository

Другие видео автора — AssemblyAI

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник