# Stop Reading Long Articles - Use This ChatGPT Trick

## Метаданные

- **Канал:** Great Learning
- **YouTube:** https://www.youtube.com/watch?v=uCr0FE99oXA
- **Источник:** https://ekstraktznaniy.ru/video/38856

## Транскрипт

### Segment 1 (00:00 - 05:00) []

So here you can see I've given a prompt Python code to perform text summarization on the given text. As you can see the text is actually quite long and we're going to learn how to perform text summarization on this text step by step. So let's understand that. So when I run this prompt what I get is an output right here. Here you can see that we have code generated based on the prompt what we shared in chat GPT. And as you can see, the first thing we do is we import all the required libraries. Then we use the text on which we're going to perform all the pre-processing tasks. After that, what we do is tokenize the text into sentences. We tokenize the sentences into words using this NLTK. word tokenize function. We convert the entire text to lowercase using the lower function. Then as part of pre-processing we need to remove all the unnecessary characters and you can see that we're doing that here. All these stop words so common words with little meaning. If you really want to understand more about stop words, you can just ask here as a prompt. What stop words basically are and chat GPT will give you a complete explanation of stop words as well. But in short, I can say that those are words that have little meaning and aren't really needed in the text. So we're removing all those words which we call stop words using this code. Then after that we can see the filtered words and we've applied the complete code for that here. Next what we're doing here is calculating the frequency of each word. So based only on the frequency we're going to decide which words should be included in the summary. After that the next step is to rank the sentences based on the sum of word frequencies. And this is the complete code for setting the summary length. that is how many sentences you want in your desired summary. So this is the complete code that we're getting as output through chat GPT. Now you can see it here. This is the full explanation. Also first as I mentioned we import the necessary libraries. Then after downloading the required NLTK resources using the NLTK. d download function, we assign the text to be summarized to the text variable. Next we tokenize the text into sentences. And for that we're going to use the sent tokenize function. After that tokenize the sentences into words. So each sentence's words will be tokenized. All the sentences will be tokenized into words using the word underscore tokenize function and then we convert them to lowercase. The next step is removing stop words from the words using a list comprehension. We're going to talk more about stop words, but first we're going to implement this code in a collab notebook to see whether we're getting the correct output using this code or not. Next, you can see here that after removing stop words, we calculate the frequency of each word. We use the frequency distribution function. That's the freakist function. Rank the sentences based on the sum of word frequencies and store the scores in the sentence score dictionary. Set the desired summary length. Summary length basically means the number of sentences you want in the like three sentences as a summary or if you want four sentences as a summary. So based on that we'll enter the number there and then we'll get the output. The third to last step here is to select the top rank sentences using largest from the heap Q and construct the summary by joining the selected sentences with spaces. And then after that we're going to print the entire summary. So basically what we're doing here is first importing the required libraries. Then we're going to apply pre-processing to the given text. And after that we'll summarize the entire text. After removing the stop words and unnecessary characters, finally we'll get the summary by using the summary function. The summarize function. So let's understand what code we're getting here. Now let's implement the code we seen here in chat GPT in a collab notebook. And when we implement that, we actually want to see whether the output we're getting is correct or not. So as you can see, I've copied the entire code here. And when I run this code, let's run this code. Here's what I'm getting. Summary. And how many sentences am I getting here? Here it's three sentences. Why? The reason is I've set the summary

### Segment 2 (05:00 - 10:00) [5:00]

length to three here. Let's say if I set it to four here and when I run this code I'll get four and those four sentences will be joined together to give you a proper summary based on four. So we've understood text summarization every part of the code and we've also implemented it in the collab notebook just to confirm whether we're getting the output or not. So now we're going to break down the combined explanation I discussed with you into steps. We'll understand everything step by step. This will give you better clarity about what we've done in text summarization. So I've provided the prompt here for that. Let's write the above code step by step with explanations. When you use this prompt, you'll get all the code and it'll be separated out like step by step. You'll get each part of the code and alongside you'll get the explanation for each step. So first import the required library. So we need to include all the libraries there because only after importing the libraries will you be able to pre-process the text and summarize it using these libraries. So first is the NLTK library which is used for what natural language processing tasks. Processing tasks mean things like tokenization, stop word removal and so on. Second is stop words from NLTK. corpus. So in NLTK. corpus corpus. It provides a list of common words to be removed from the text. So now we'll understand what stop words are because as you can see here stop words basically provide a list of common words. What are those common words? So when I ask chat GPT what are stop words? So you can see the output here is in the above quotes. Stop words refer to a set of commonly used words in a language that often carry little or no meaningful information when it comes to natural language processing tasks. For example, in text summarization, which is what we're doing, these words are like the, is, between, and to. These are types of English words that don't have much meaning if we want a summary of a particular text because generally in text summarization the purpose of text summarization itself is to condense the entire text and extract the meaningful information from the whole text. So in the code the stop words module which is part of the NLTK library basically provides a set of stop words specific to the English language called stopwords. words. words. Here you can see that a function. It returns a set of English stop words which actually have no meaning or importance when it comes to summarization. By removing these stop words from the text, you can see that before text summarization, we need to remove them and we can focus on the most important parts of the text. Now, next is the line of code filter word over there. And we're going to see what the meaning of that filter word is. This is to get the words that will be filtered out after removing the stop words. So when I go back to the code, as you can see here, chat GPT is giving you an explanation of stop words. After that, you can see that the next one is frictist. We'll get to that filtered word. But before that, let me explain to you. The next module we're going to use here is frictest which stands for frequency distribution. We've already discussed that but now we're going to understand it in more detail. So why is it used? It's used to calculate the frequency distribution of words. Next we're going to use heap q which is basically used to select the top ranked sentences based on their scores. So we've already imported all the libraries. What was the next step? Now the next step is to define the text that will be summarized. So we've already talked about the text as well. The text variable stores the input text that we want to summarize. The fourth step is to tokenize the text into sentences which will obviously break up all this text into sentences and split the text into individual sentences. Because at the beginning we actually had a combined paragraph. We're going to tokenize the paragraph into separate sentences and then we're going to store them in the sentences list. After that, once we have all the sentences separately, we're going to break those sentences into words. How are we going to do that? By using the word tokenize function. So that's exactly what we're doing here. We're going to implement the word tokenize function on the previous code basically on the sentences and we'll

### Segment 3 (10:00 - 15:00) [10:00]

tokenize those sentences into words. After doing that you can see that at the same time we're converting those words into lowercase using the lower function. So the lower function is basically used to convert all the words with capital letters whether at the beginning or anywhere into lowerase. Now where will the resulting words be stored? they will be stored in word list just like we were storing the sentences in the sentences list here we're storing the tokenized words in the word list after that now once we have all the tokenized words what are we going to do next the next step would be removing stop words we've already discussed what a stop word is and the next step after that the same code we're seeing here is for filtered words so filtered words we were talking about that and I told you we'd discuss it more. So here what are filtered words? After identifying the stop words, we'll get the complete set of stop words and then we're going to remove them. And once we've removed all the stop words from the text from the list of words where we have at the end the remaining words that we get as output will refer to as our result. Here we're calling those words filtered words. So the next step would be calculating the frequency of each word. I've already talked about this but now I'm explaining it step by step just to make it clearer. So the next step is to calculate the frequency of each word. As I discussed and mentioned earlier, practiced is a function that's used to get the frequency of each word. Based on that, we're going to figure out and decide which words we'll use in the summary. We're going to include those in the summary. Now rank the sentences based on word frequencies. The higher the frequency, the higher the chance that sentence will be included in the summary. So here the test underscore dictionary is initialized to store the score for each sentence. We're creating a dictionary here using curly braces. Basically that's what it is just a dictionary. And here we're going to append the score for each sentence. Now for each sentence in the sentences list, the words are tokenized and converted to lowercase. So we're doing the same thing here. And then if a word is present in the word frequency dictionary, so if it is present in that dictionary, then its frequency is added to the score of the corresponding sentence in the sentence dictionary. After that, we're going to set the desired summary length. That is how many sentences you want in the summary. Here the summary length is set to three. That's what we're specifying. It's up to you. As I mentioned earlier as well and finally we'll get the final summary based on the length we've decided here. So now as you can see we're going to implement that code step by step. Earlier we saw the output. We were getting the result but we applied the entire code all at once. Now we're going to apply it step by step. So you'll see what output you get at the end of each step. So we've imported all the libraries and provided the text that we're going to summarize. And as you can see, we're converting this entire text into sentences. When I run this code, I get all the sentences. The whole paragraph is split into sentences. Then the next step is to break those sentences into words. So when I run this code, it will break those sentences into words. So now we're getting all the sentences in the form of words. Basically we're tokenizing the sentences into words. Once we have all the words tokenized words, we're going to apply the stop word function which is used to remove stop words from these tokenized words. And in the end, we'll get the filtered words. When I run this code, you can see it here. We're only getting the usable meaningful words except for stop words like the, is, between, of, and so on. That has been removed from here. We only have the filtered words with us. meaningful words with us, which are actually needed for our summary. Next, we're going to calculate the frequency. So, when I run this code, I'll get the sentence score. Here you can see that the first sentence is this one. So when you look at it, the score of this sentence is 118. When I check the other one, the other one has 87. Similarly, the next sentence has a frequency of 73. The fourth one has how much? 70. So here we are getting the score of each sentence. frequency of each sentence. Now we have the frequency based on these ranks. We get the final summary and how will we get

### Segment 4 (15:00 - 17:00) [15:00]

the final summary? First we have to decide how many sentences you want in your desired summary. So here I have set it to three. And then after that we're going to figure out what to do next. We're going to select the top ranked sentences based on their frequency as I already showed you. And we're going to print those top ranked sentences as the summary here. So when I run this code, I'm getting the top ranked sentences as a summary. And here I'm using the join function because obviously if I'm getting three sentences, I don't want them as separate sentences. I want them as a paragraph. So I'm going to join those three top ranked sentences. And once they're joined, they'll be shown to you as a summary in the form of a paragraph. So here you can see the first sentence. Then the second sentence is this one and the third one is about the quarterly profit. So these three sentences are the top ranked sentences that I have which I joined using the join function. That's how you can do text summarization using chat GPT. And it's very easy to understand. Suppose you get stuck somewhere in your code, you can simply copy that code as a prompt and ask chat GPT for a complete detailed explanation. and you'll get a proper and precise explanation. So based on that, while you're understanding the theory, you're also putting that theory into practice, which actually saves you time as well. — If you enjoyed this video, you can enroll in Greatle Learning Academy for free. Choose from hundreds of courses across multiple domains and earn a certificate of completion along the way. And if you want to take it a step further, you can try Greatlearning Academy Pro Plus with a free trial. It provides access to additional premium courses with more advanced content from distinguished faculty along with features like guided projects, a resume builder, and mock interviews to make your learning journey even richer. Link is in the description.
