Summarize and analyze videos with @streamlitofficial and AssemblyAI

38:46

Summarize and analyze videos with @streamlitofficial and AssemblyAI

AssemblyAI 04.11.2022 3 759 просмотров 111 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

Let’s learn how to build a @streamlitofficial web app to analyse the content of YouTube channels. In this app, the user will have the option to upload a TXT file with links to YouTube videos, select a video from the displayed thumbnails and see the analysis of this video in the form of: ⭐️ a summary ⭐️ whether the video includes any sensitive content ⭐️ topics discussed in the video ✍️ Streamlit article: https://blog.streamlit.io/make-a-video-content-analyzer-app-with-streamlit-and-assemblyai/ 🚀 Get your AssemblyAI API key: https://www.assemblyai.com?/utm_source=youtube&utm_medium=referral&utm_campaign=streamlit01 👩‍💻 Project GitHub repo: https://github.com/misraturp/Content-Analyzer 🤖 Learn model about the state-of-the-art AssemblyAI models: https://assemblyai.com/docs/ ▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬▬ 🐦 Twitter: https://twitter.com/AssemblyAI 🦾 Discord: https://discord.gg/Cd8MyVJAXd ▶️ Subscribe: https://www.youtube.com/c/AssemblyAI?sub_confirmation=1 🔥 We're hiring! Check our open roles: https://www.assemblyai.com/careers ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ #MachineLearning #DeepLearning #NLP #Python #Streamlit

Оглавление (8 сегментов)

Segment 1 (00:00 - 05:00)

let's learn how to make a streamlined app that can analyze YouTube videos given links here's how this app is going to look like once we're done with it so of course we have a title for this app we have a little explanation and there's a specific format that the users need to submit the links in so we note that here too if people just want to play around with it they do not have specific links they want to analyze they can just check the use default example file checkbox and then once you check that or once you upload your own file you see a list of thumbnails that correspond to the links that you uploaded and if you choose one of them what you get is first the title of this video and then the audio of this video so you can listen to it here stop using notion for everything and then we get a little bulleted summary of what's being said in this video we point out whether there is any sensitive content that is detected in this video and lastly all the topics that were discussed in this video so these three things are the things that we're going to analyze from the videos just to show you the other functionality if you do not choose a default example file you can also upload another file and if you choose one of these videos again you get the summary sensitive content for example here apparently the sensitive social issues are being discussed and again the topics discussed in this video this project is a collaboration between streamlit and assembly AI so if you want to see the written version of this tutorial you can go and check out the blog post we prepared for streamlits website through the link either in the description or somewhere in one of these corners all right I have my assembly AI sweater and my streamlined water bottle I am ready to code so let's get started so of course I'm going to start with importing streamlit as St uh and then what I want to do is to have a little title for this project our coldest the um YouTube content analyzer and of course that's not enough we just need to have a little bit of explanation of what's going on in this app so I've already prepared it beforehand um we just have a couple of markdown lines where we explain that with this app they can audit YouTube channels and their videos all they have to do is to pass a list of links to the videos of this Channel and then you get a little list of thumbnails and in this thumbnails you can select one by clicking and then what you get is a summary of the video the topics that are discussed in the video and whether there are any sensitive topics discussed in the video and what they are and a little warning for the user that they the links that they pass need to be in this format and not in this format so youtube. com and not youtube. be that one doesn't really work with the libraries that we're using all right once that's done what I need to do is to use a streamlit file uploader to get the list of links from people so I'll say upload a file that includes the links and I'll just say txt here to make sure that people understand they need to upload a txc file and this will be the file that we're going to read and so I'll just keep running it to you know so that we can see how the app is developing so let's run this first mm-hmm all right so what I have is a title and the explanations and now we can upload a file here or not and but if you remember from what I showed you in the beginning of this video we also want to give the people the option to upload a default file so for that I'm going to have a streamlit check box and in this checkbox you're going to say use a default file or not and let's call this the default pool so then I can say here if default pool is true then we're going to read a file so I'll just say open and there's a file that I already prepared and that one is called links. txt I'll show you what it looks like here is links. txt and it's basically a if this can go away it's basically a txt file where in every new line there is a new YouTube link um but if that is not the case then we're going to expect the file from the user do this again oops rerun okay now if we use a default file yeah nothing happens yet because we haven't implemented anything but if we want to upload a file then we can select

Segment 2 (05:00 - 10:00)

here let's say I want to use the White House links then it's here it's uploaded all right the next thing that we want to check for is whether the file whether there is a file that's been uploaded so for that I'll say if file is not none and that's where the actual functionality of the app is going to happen if file is none then we're not going to do anything we're just going to keep showing user this screen right what I want to do once I have this file is to get all these links that are in a new line in the cxt file to be in a python list and one way I found to do that is by using pandas data frames so what I do is I read this file with read CSV file into a pandas data frame for that I also need to say header false otherwise it assigns a random column name an index column that I don't want so I call it data frame but of course need to import pandas SPD and then what I do is I change the column name of this data frame just so that it's easier to read and there's you know maybe one time you read it a size the column name is zero another time you read it sends a column name as something else so maybe that'll break the code so for that not to happen I'll just give the column a name I'll just call it links or urls and then I will get a URL list and that would be the data frame URLs column into a list so use two lists now I have all the links that are from the txt file that was being uploaded or the default one into the URLs list python list so now what I want to do is download the audio files from YouTube for all of these videos that we have the links to so for each video say a video URL in url list cool URLs list sounds better to me let's go through allows list uh well what I need to do is to download audio um and with for that I'm going to create a separate um function because I think it will be neater that way so I'll do that even before the titles and everything so for that I'll call this one the function save audio and I'm going to pass it a URL and how I'm going to do it is like this so I'm just going to copy and paste the code here um what I need to do is import uh no from Pi tube import YouTube and also import OS so I'm not explaining this line by line because this is basically the standard way of how to download audios from YouTube videos using pi tube so this is just code that I found online I did not write it line by myself but just to go through it and explain to you how everything works is basically you pass it the URL and then you specify that you only want the audio and not the whole video and after you download it you can get the name of this video and using this name you can create a file name that you're going to save it in by adding an extension like MP3 because we're using uh we're working with audio files and once that's done using this YT you can basically extract a bunch of things for example the title of this video and file name is basically the title plus the extension that we added and also the thumbnail of this video the URL to which is super useful there are also a bunch of other things that you can tracked but these are all the things that we need so let me go call this from inside my for Loop here so I'm going to pass it the video urls and what I'm going to get in exchange is video title and file name so it'll be the save location where this audio is saved and the thumbnail I'll call this video thumbnail just to make it kind of more complete uh all right so once I get these things though what I want to do is to create different lists where I'm going to have a list of titles I already URLs and the list of thumbnails save locations so I'll create those three here say titles just to kind of initialize them here locations and thumbnails so when I want to read them uh show them display them to the user I can easily access them so very simply we're going to say titles

Segment 3 (10:00 - 15:00)

append video title and then locations append the save location here and then thumbnails append the video thumbnail just to see that everything works as we expected maybe I can do a streamlined right of the titles um yeah and then we'll see what happens once we upload a file so I'll do rerun or maybe use a default file first oh yeah I made a little mistake when I was reading the CSV file it's supposed to be Heather none and not false all right let's do it again all right nice so we have six videos apparently and we have the titles of these Apple notes power user tips stop using notion for everything etc and let me see if it works if I also upload a file myself and not use a default one all right it works this way too so it looks like everything we coded so far will work no problem so we can go further next I want to instead of showing the names of these videos to people I want to show the thumbnails so that they can select one and we can start the analysis on that video so normally stream it does not have by default any Widgets or any components like that but Vivian a streamlit user made a custom component to be able to display photos in a grid and also to be able to select one which is super helpful when you have custom components that people can make their own components make their own widgets of things that have not been implemented yet by streamlit so what I need to do for that I'll actually just copy and paste the code here I'll just go through it with you um so it was it's called the component is called clickable images and what you need to give it is a list of the images so here I'm giving the list of uh thumbnails and if you want you can also pass it a list of strings that correspond to these images so when you hover over the image you'll be able to see that text so for us it's going to be the video titles so you can set the div styles to how big it's going to be how you want to wrap the images once it reaches the maximum width for example but one other thing that I wanted to point out here is that you can set the Overflow of Y to be Auto and what that means is when you have many images that maybe you would have to show in a lot of rows that will take a lot of space you can set this to Auto and then you will have a scrollable div there so I'll show you how that works in a second but first I need to import of course the stream it component too I'll import it here and this is how you import it but basically you'll be able to see it in the GitHub repository for this custom component I will leave the link for that also in the description below but basically streamlined underscore clickable images from the clickable and then we get the clickable images specifically from that um all right so that's done so let's see how that looks like here all right yeah I forgot to remove this so maybe let's do that we don't really want to show the titles anymore um and maybe I can also show which image is selected so I can have another markdown widget here um yeah and it'll show which image is selected so let's rerun and yeah if I select an image it runs again but we don't want that actually there is a way to stop this because right now when I click it when I click something well we can see that it says tumble thumbnail number one is clicked so if I click it it's going to say thumbnail number two is clicked but what's happening is it is running the save audio function from scratch so because as you know streamer applications they run from beginning to end every time you change something but we don't want that so what we want is to save the results of the save audio function so if we give it the same URL it's not going to be run again it's going to immediately return to you the results that were calculated last time so normally how we did that was with stream with cash um but apparently recently streamlined decided to divide the functionality of cash into a couple of smaller functions or decorators I guess these are called so the new one is called a XP very mental underscore memo so if I do this it's not going to run every time I click every time I select a new thumbnail so as you can see now it's running save audio because the user default file is

Segment 4 (15:00 - 20:00)

selected already and yeah okay so if I select this one it says thumbnail zero is clicked thumbnail 4 is clicked thumbnail 2 is clicked great so that functionality is working oops so if I do it with another list of links it will take a second to download all the audio files but once it's done is going to show me all the thumbnails and as you as I mentioned now that we have too many uh thumbnails we don't want it to keep going on and take up too much space in our app so we have the scrollable functionality so basically after six videos or something this is activated and you can just scroll to look at more thumbnails and if I select this thumbnail zero click thumbnail for clicked perfect okay it's working so now with the actual part where we're going to analyze the contents of these videos so once the video is selected we are going to have the number of the video that was selected in the selected video variable if nothing is selected it's just going to return -1 so that's a good thing to know so then we can just have a little condition here if selected video is greater than minus one then we can start the analysis the first thing that I want to get is the video URL and I can get that from the URLs list and that's going to be the selected video with URL from that list and then I want to get the video title and I also save location of where this um audio is saved already video title I'm going to get from titles and save location locations so same for these also I want a title because you know it's nice to show people that what they selected is what they wanted so I want to show the video title for that and also there is a really nice useful Audio widget from streamlit and I'm going to use that to show people the audio that they selected so let's see this quickly so if I select the default again and select the first one let's say it's going to give me the title and also you can see when we hover over them we can see the titles and you'll be able to listen to it here now what we want to do is to start the analysis so now we is the part that we're going to work with assembly AI so with assembly AI what we need to do is first we need to upload the MP3 file to assembly AI and then we're going to send another request to assembly AI to start analysis of the file and lastly after this analysis is started we're going to need to ask assembly AI a bunch of times to see if the analysis is done and if we can receive the results of the analysis so this is the three steps of things that we're going to do I'm going to write a separate function for all of these basically so the first one will be called upload to assembly AI so let's write that one first um so all the supporting functions I'm just going to write at the beginning of my file so I also wanted to for the results to be State stored in cash so I don't have to this function doesn't need to run every single time I'll paste the contents of this function to make it easier so it can go faster basically when you're uploading a file to assembly AI we use this little helper function and in this helper function what we do is to divide the file into chunks we first read it and then we divide it into chunks so that it's easier to upload to assembly AI to upload something to assembly AI there are a bunch of things that we need so first of all we need the requests Library so let's import that once we have the request Library we are going to send a post request to an endpoint of assembly AI so we need the upload endpoint for that so let's get to upload endpoint I'll just Define it here and lastly what we need is the headers are going to include the authentication information for assembly AI so what is the authentication information basically the API key well you're thinking maybe I don't have an API key but it's very simple to get an API key with assembly AI all you need to do is to go to assemblyai. com and then you can create an account by clicking get started I already have an account it's a free account you could just copy your API Key by clicking here and then you can paste your API key use your API key here so I'll show you how to define the headers too

Segment 5 (20:00 - 25:00)

here either you can paste your API key directly under the authorization or if you're going to put this upload this application to stream with sharing then you can in the advanced options you can specify an authentication key to be your key to your API key and then you only need to call streamlit. secrets authentication key but we're not going to deploy today so I'm just going to paste my API key as it is here that's going to be my header once we have these things we all we have to do is to send a post request by reading the file in the save location to assembly Ai and what we're going to get is an upload response so maybe let's see what this upload response looks like I can actually write this on the application so maybe it'll be easier to see so I select the first one oh I haven't called the function of course so let me call the function and of course we need to save location we need to pass it to save location so I need to call this from here after we select a video let's just call it for now all right now we see that upload to assembly AI is running and here is what we get an upload URL so this URL is basically where our audio has been uploaded to and using this we can start the analysis uh job so for that here as you can see in the function we get the upload URL from the response and we pass this back we return this as the audio URL so let's get this from our upload function all right so the next thing that we want to do is to start the analysis so I'll call this a start analysis function let's go ahead and Define it here too so I'll just collapse this one it doesn't take as much space start analysis again I want the results of this to be stored in cash so I don't need to run it over and over again let me get the contents of this one too it's also quite short um for this one as you can see we get a transcript endpoint so let me go ahead and paste the transcript endpoint here too together with the upload endpoint all right let's go through it now so for when you want to start a transcription job but assemble the AI well I call it a transcription job but we also call it analysis job but the reason I call it a transcription job is when you submit a file to assemble the AI audio or video it automatically gets transcribed anyway so either way even if you do not specifically ask for it your audio or video will be transcribed and on top of that you can specify what else you want to happen assembly AI has a bunch of different models and specifically NLP models or other audio intelligence models that can do a bunch of things analysis on your audio or video files the ones that we're going to use right now are called IAB categories so this one returns to us the topics that were detected in this audio there are nearly 700 topics and subtopics that can be detected with this model and we can also use the we're also going to use the content safety model this model is going to return to us whether there were any sensitive topics that were discussed so these sensitive topics could be drug usage alcohol violence or health issues anything that could be harmful to any group of people or it could be a sensitive topic to a group of people and we also get summarization we're using the summarization model can return a bunch of different summaries to us we're going to use the both bullet summary so it's going to give us a bulleted list of the summary of this audio file you can use bullets to wear bows for with that one you're going to get again a bulleted summary but a much longer summary you can use just with that one you're only going to get a couple of words of summary you can use headline will give you a one sentence summary of this whole audio or video file and lastly you can use paragraph option and with that option you will get a paragraph As a summary of this audio or video file so as I said we're using the IAB category so the topic detection content safety and the summary model the summarization model from assembly Ai and what we're going to do is set these settings in a variable let's call it data and we're passing it with our request and also we're again passing the headers to authenticate ourselves and we're passing all of this to the transcription endpoint so together with all of these settings we're also passing where our audio file lives and we're passing all of this to the transcript endpoint using our headers to authentic

Segment 6 (25:00 - 30:00)

educate ourselves as a post request to assemble the AI and from this request we also get a response a transcript response and let's see what transcript response looks like too maybe I can start the analysis again well we need to pass audio URL to this function so let's go ahead and call it down here I also need to print the transcript response so that we can see what it looks like yes I just need to make sure I parse it into Json and yeah here's what it looks like so this is the response we get from starting the transcription job or we can call it the analysis job um we get some details of what we set to be true or false for example the content safety model is turned on the categories iib categories but for example redacting personally identifiable information model is turned off for example so this is just kind of a summary of what we started and also we get a ID for this job so this is not going to run instantaneously of course we're going to have to wait for all of these models to complete Computing summarization transcription Etc all of the ones that we are working on and we are going to use this ID to see if the transcription job is complete or not so we're going to send guest requests to assembly AI but let's not get a header of ourselves here we get the ID from this response we call it transcript ID or we can also call the job ID and by adding this ID to the transcript endpoint that we've defined up here we get a polling endpoint and this polling endpoint is going to be the place that we're going to ask assembly AI to see if this job has been completed or not so that brings me to my last step well here we're going to read the polling endpoint and once we get the polling endpoint we're going to want to receive the results so I can call this get summarization results but basically oh that's called analysis results then a bit more uh comprehensive and to that we're going to need to pass the polling endpoint but let's go ahead and Define this one uh we're passing it's a polling endpoint we know that much and again I want it to be cached let me get the contents of this one too quickly so this one is very simple actually all we have to do is to keep asking assembly AI if the job is completed or not so for that I'm going to start a infinite Loop or seemingly infinite Loop and I'm going to have a status variable that can break this Loop basically the status is first going to be submitted because as a first status that you get from assembly AI I'm going to send a get request to the polling endpoint that we get using the previous function and again the headers to authenticate ourselves and as a response we're going to get something so maybe let's print what this response looks like before anything else so we can understand it a little bit better maybe I will not say while true just kind of run these once all right I also need to call this function of course so are we doing it already not properly at least all right let's run it and see now I also want to remove the prints from the other ones so it doesn't confuse us all right so this is a response that we get we again get the same ID for the job and information basically of what we turned off what we turned on so this is kind of like a summary of um or a review of the job that you started one difference is this time we get status and Status tells us if this job is complete or not when we first submitted it is submitted and then it can be queued it can be processing when it's not done and there are two outcomes it will either be completed or it will error out so we want to cover all of these possibilities here so let's get back yeah let's start get let's get back to our while loop again so if it's submitted or still processing that means that it's not done so we can wait 10 seconds before asking again because if we don't wait we're just going to keep asking it's not going to be useful so let's import sleep so that we can make this app sleep for a little bit if the status is completed on the other hand then it means that the

Segment 7 (30:00 - 35:00)

job of this function is done so we can just return the polling response or if it errors out or something else happens that we do not expect we can just return false now we will see how we deal with it so let me collapse these functions so we can start writing the part where we display the results of this analysis so I'll get the polling response from here that's what we're going to get and yeah but that we're nearly done so the last couple of things that we've been doing have been for the background so right now if we try to do the analysis nothing much really happens or at least it looks like nothing is happening so now that we have the results for the analysis what we want to do is to display these results of the analysis um following response is a bit confusing so I'll just call this results and now we want to get the summary and the topics and the sensitive topics from these results of course so we want to know what we should extract for so if we just quickly go and take a look at some of the ai's documentation we'll see for summary you'll see that with ID and everything well we just need to read the summary from it so I can say the summary is results dot Json and in there I'm going to look for the summary for topics let's see topic detection okay so it's a little bit more complicated this one because we get for each piece of text which topics are found in this or which topics were discussed in this given audio bit but maybe we don't want to see it for every single part of this audio file but we want to see and generally what was discussed all of the things that were discussed in this audio so then I can also just take a look at the summary but in this time I'm looking at iav category results and then summary so we'll say results. json and then I have category results and then the summary of these results and lastly sensitive topics let's call it for that again results that Json let's see content moderation what that one is called content safety labels again if there is anything that was found it's going to give us the results for everything so let's say if it's health issues it was it's going to give us the audio bit that was where the health issues were mentioned so the text of it and the confidence and the severity of it but maybe again we don't really want all of that we just kind of want a summary of what topics sensitive topics were discussed so it's going to be content safety labels and then summary now let's display these for the summary one it's going to be quite simple so I'm just going to give it a little header then I can call this a summary of this video and then simply I can say is the right the summary there is no interesting way that I need to show this and then let's show the sensitive topics but I'll just copy and paste my code here to make it faster so basically again I give it a header for the sensitive content and I check that if it's if I check if it's empty or not because sometimes maybe there is no sensitive topics that were discussed so then what we can show the user is a sub header of all clear there is nothing that was sensitive that was discussed in this video but if there is something that was discussed that we might deem as sensitive we can just show another sub header saying that mention of these following sensitive topics were detected and then I will write this sensitive topic summaries into a data frame to make it easier to deal with also show update The Columns of it to show the topic and the confidence because you know maybe it's a sensitive topics but if the confidence is very low maybe it's not going to affect your analysis of this video and May basically just show it at the end so let me show where we are so far if I rerun this we get the summary of this video and yeah for this one apparently there is no sensitive content detected maybe we can use instead the White House videos I'm sure there will be some sensitive topics that were discussed in there and we'll see that the summary is here and also there was something sensitive that was discussed and it says it's a sensitive social issues and the confidence is 0. 78 so that's good to know but it's also nice to know that it is being shown correctly and the last thing that I want to show is a list of

Segment 8 (35:00 - 38:00)

topics and it's going to be quite similar um to the sensitive content we're going to show it in a data frame but there's going to be a little bit of a Twist there so I'll show you what the results look like right now for the topic detection so I'll just call it these are the topics discussed um maybe I can already turn them into a data frame items in there and I call it topics data frame uh and I'll show you what that looks like with st data frame all right so this is what it's going to look like because we do not only get topics but also subtopics and subtopics as far as we can go together with the confidence so this is a perfectly okay way of showing the topics too of course you can say you can have people read okay news and politics war and conflicts right that's an okay way to show this but I kind of want to separate them using these little I don't even know what these symbols are called for each subtopic or topic to have its own column so for that we don't need to reinvent The View I'll just quickly paste my code here that will already worked on first I'm changing the names of the columns so I'm calling this topic and I'm calling this confidence and then I am separating the topics into subtopics as far as it goes using the symbol and once they're split I'm basically separating them to be on different columns so once that's done all we can do is actually just show it again with SD data frame well I'll just use this actually let's see and perfect yeah so this is basically the topics that are discussed this is kind of like the first level it's either travel or business and finance news and politics kind of like a higher level topic and then it goes into more details so if it's business and finance specifically business and then specifically executive leadership and management or here we have even more nuanced topics education and then college education postgraduate education and Professional School and we also get the confidences here in case the user wants to take a look at them but with that we can actually wrap this up this is the application that we built we are giving information to the user of how they can use this app we have an option to upload a default file or not our own file we show the thumbnails to the user and then once a thumbnail is selected we show the analysis of this video to the user through a summary whether there were any sensitive topics that were discussed and all the in this video and that's all thank you for building with me today I hope you enjoyed this project if you build this project or something similar don't forget to let us know either through comments here or maybe on Twitter you can tag us you'll find the handles or all our social Handles in the description below don't forget again that there is a tutorial version a written tutorial version of the same video on the streamless blogs I will again leave the link somewhere here so you can go take a look at that thanks again for building with me today I hope you have a great day and I will see you in the next video

Другие видео автора — AssemblyAI

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник