# Data Science Full Course - Learn Data Science in 12 Hours | Data Science For Beginners | Edureka

## Метаданные

- **Канал:** edureka!
- **YouTube:** https://www.youtube.com/watch?v=4cExalx09xU
- **Дата:** 13.03.2026
- **Длительность:** 11:00:52
- **Просмотры:** 5,026
- **Источник:** https://ekstraktznaniy.ru/video/29636

## Описание

🔥Integrated MS+PGP Program in Data Science & AI: https://www.edureka.co/dual-certification-programs/ms-data-science-pgp-gen-ai-ml-birchwood
🔥𝐄𝐝𝐮𝐫𝐞𝐤𝐚'𝐬 𝐃𝐚𝐭𝐚 𝐒𝐜𝐢𝐞𝐧𝐜𝐞 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐌𝐚𝐬𝐭𝐞𝐫𝐬 𝐩𝐫𝐨𝐠𝐫𝐚𝐦: https://www.edureka.co/masters-program/data-scientist-certification 


This Edureka Data Science Full Course video will help you understand and learn Data Science Algorithms in detail. This Data Science Tutorial is ideal for both beginners as well as professionals who want to master Data Science Algorithms. 
00:00:00 Introduction
00:01:31 What is Data Science?
00:03:26 Who is a Data Scientist?
00:35:36 Top 10 Reasons to Learn Data Science
00:40:12 Data Science Basics 
01:33:28 Data Life Cycle
01:37:03 Statistics and Probability
03:09:33 Hypothesis Testing Statistics
04:02:26 What is Machine Learning?
04:19:21 Linear Regression
04:45:59 Logistic Regression
05:34:29 Decision Tree Algorithm
06:19:50 Random Forest
06:47:02 KNN Algorithm
07:19:27 Naive Bayes Classifier
07:39:45 Support Vector Machine
08:

## Транскрипт

### Introduction []

Hello everyone and welcome to the data science full course. Data science has become one of the most important fields in today's datadriven world. Helping organization turn raw data into valuable insights and informed decisions. In this course, you will learn the core concepts of data science, including data analysis, data visualization, machine learning basics, and the tools used by data professionals. You will explore how data is collected, processed, and analyzed to uncover patterns and trends that drive business strategies. And by the end of this course, you will explore how data is collected, processed, and analyzed to uncover patterns and trends that drive business strategies. And by the end of this course, you will have a strong foundation in data science and a clear understanding of how data can be used to solve real world problems across different industries. So before we begin, please like, share and subscribe to Edureka's YouTube channel and hit the bell icon to stay updated on the latest content from Edureka. Also check out Edureka's integrated MS plus PGP program in data science and AI. Advance your career with a dual certification program designed to build strong expertise in artificial intelligence and data science. This program allows you to earn two industry recognized certification in just 12 months. You will begin with phase one where you will earn a PGP in

### What is Data Science? [1:31]

genetic AI and machine learning in just 6 months building a solid foundation in modern AI technologies. In phase two, you will pursue an accelerated MS in data science from BJO University, further strengthening your expertise in advanced data science concepts. The program combines live online classes with flexible self-paced learning modules. You will also work on real world AI projects to build a strong industry ready portfolio. For more details, check out the course link given in the description box below. Now let us get started by understanding what data science is. — What is data science? With the development of new technologies, there has been a rapid increase in the amount of data. This has created an opportunity to analyze and derive meaningful information from all this data. This is where data science comes into the picture. Technically, data science is defined as the process of extracting knowledge and insights from complex and large sets of data by using processes like data cleaning, data visualization. Almost all of us use Google Maps. But have you ever wondered how Google knows the traffic conditions between where you are and where you're trying to go? Or how does it determine the fastest way to your destination? The answer is data science. Google Maps collects data every day from a multitude of reliable sources that primarily includes smartphones. It continuously combines the data from drivers, passengers, and pedestrians. And then making use of machine learning algorithms, Google Maps sends real traffic updates by way of colored lines on the traffic layers. This helps you find your optimal route and even determine which areas should be avoided due to road work or accidents. Isn't that amazing? As data science continues

### Who is a Data Scientist? [3:26]

to evolve, the demand for skilled professionals in this domain is also increasing drastically. In order to uncover useful intelligence for their organizations, data scientists must master all the aspects of data science. This is a major reason why data science enthusiasts are rushing to get certified. So the first reason why you should learn data science is because of the impressive salaries. Data science is an area where there is a lot of high demand for professionals but a low supply of skilled labor. So anyone remotely good in the subject area can expect an impressive salary. And let me remind you that data science pays way better than traditional IT careers. As of making this video, there's an average salary of 10. 5 lakhs peranom, which can go as high as 26 lakhs perom for skilled and experienced professionals. Now, the second reason why you should opt for a career in data science is because of the great career opportunities. Data science has consistently been one of the best career options for a while now. As I said before, there is a lot of demand for skilled and experienced professionals, but there is low supply of them. That is why if you get yourself trained and become comfortable with handling data, you can expect incredible opportunities. You should know that there are thousands of vacancies opening up every day and thousands of placements are taking place. So, who says that you shouldn't be one of them? Third reason is multiple positions. Data science has a lot of different tasks and people are recruited to work on particular subject areas. If you generalize these areas then you can come up with three different position. The first one is an analyst. The analyst works with tasks like gathering data, cleaning and organizing data, making insights, visualizing information and much more. The second type is an engineer. The main focus of this specialization is to transform the raw data into something viable and readable before it is presented to the organization. But it is not just this. They are also required to design, build, test, blend, [clears throat] manage and optimize data with the help of numerous sources. The last one are the scientists. Most of the time here goes for researching, writing algorithms and writing code to answer questions about the data sets in question. Now the fourth reason why you should learn data science is because you get to work for the best organizations. Yes, data science is one of the fastest growing domains and with it comes a lot of opportunities to work for your dream company. You can find opportunities in companies like Meta, Amazon, Netflix, Google etc. All the way up to some super cool companies in startups. And need I tell you the fifth reason is less competition. Well, how much better can it get? The reason why I say less competition is because data science is relatively a new domain. So while there are a lot of opportunities, there just isn't enough people. Now you can pretty much guess the next point based on this. It is quick growth. You see, data science is not like a traditional IT job. The opportunities bought by data science is applicable from things like market research and logistics to water supply, telecommunications, and even restaurants. Since there is less competition, you can go to any field and manage a lot of tasks which result in a quick growth in your career. And may I add the ever growing amount of data to this list. The seventh reason we have is that data scientists are decision makers. As a data science professional, you will often find yourself at the center of decision-m activities and you'll be in charge of making decisions that will change how a business grows. You'll play an integral role in how your business develops. Our eighth reason is also an extension of this. All that decision making will lead to business optimization. You may develop the necessary skills to support the expansion of your business. You can be an employee or an entrepreneur. Data science can help you optimize the process efficiency and quality of your business and its offerings. Be it products or services. The ninth reason is that it is an easy subject to learn online. Now I promise you this is not an ad. Thing is systematic way of learning data science is super interactive and fulfilling. You can take up an online course and study it at your own pace whenever you are free. There are a lot of options to study this be it in edua or other platforms. All you have to do is Google it. The last thing that we must and should cover in this list is data science is a future oriented subject. Data is becoming an integral part of decision making in the 21st century. And if you learn AI, ML, DL, CV, big data, etc., you're preparing yourself for all these opportunities that are coming your way. While these opportunities are certainly enticing enough, reasons to learn data science is not limited to this. If you want to learn data science or just want to familiarize yourself with its concepts, then you should definitely visit our channel. We have a lot of videos covering various topics in this subject. So I guess I'll see you over there. The demand for data science professionals is skyrocketing with reports predicting 11. 5 million new data science jobs globally by 2026. In India, the data science education market is expected to grow by 58% to $1. 4 $4 billion by 2028, reflecting the increasing demand for skilled professionals. As AI and datadriven decision-m become essential for businesses, hiring in AI and data science roles is set to dominate the job market by 2025, especially in tier 2 cities. And Glasto reports that entry- levelvel data scientists in India earn 9 lak rupees per year while senior data scientists earn up to 23 lak rupees per year and lead data scientists earn up to 44 lak rupees per year. And in the United States, the starting salary for a data scientist is $79,000 per year, while a senior data scientist can earn up to $125,674 per year. And a lead data scientist can earn up to $198,398 per year. With competitive salaries and a rapid career growth, now is the ideal time to upskill and enter the thriving field of data science. But why data science? Before beginning the data science road map, it's important to establish a clear learning goal. Data science is one of the fastest growing fields with an estimated 11. 5 billion jobs worldwide by 2026. It pays well with professionals earning 20% to 30% more than those in other industries. As artificial intelligence and analytics continue to transform businesses, data science remains a promising field with limitless opportunities. Datadriven solutions are transforming industries ranging from healthcare to finance making this an exciting and impactful field in which to work. Let us understand what do data scientists do. A data scientist collects and analyzes complex data to inform business decisions. They collect, clean and analyze data before developing machine learning models and deploying them in the real world. Data scientists also monitor and maintain models, communicate findings to nontechnical stakeholders, and work across teams to achieve organizational goals. But how to become a data scientist? So here's a road map to guide you. Mastering key skills is essential for becoming a successful data scientist. So let's explore each one in detail. First, you can get started with Python. Python is the most popular programming language in data science due to its ease of use, versatility, and extensive ecosystems of libraries. Beginners can learn the fundamentals in about a month or two, making it a viable entry point into the field. Next, start with R. R is a popular programming language for statistical analysis and data visualization. While Python is more versatile, R is better at statistical modeling, hypothesis testing, and creating highquality plots using packages. It is widely used in academic research and industries that demand extensive statistical analysis such as finance and healthcare. Then move to Git, a version control system that enable data scientists to track changes, collaborate effectively and manage code across multiple projects. It allows you to work on different versions of your code without losing progress and make it simple to return to previous state if something goes wrong. You can spend around two weeks studying Git. Now let us look at the essential Git concepts for data science. First is the repository. You can save data science projects including scripts, data sets and documentation. Next commits and version control where you can keep track of changes to facilitate model debugging and refinement. Next, branches and collaboration. Work on different features or experiments without affecting the overall project. Followed by GitHub or GitLab integration. Share your projects, work with teams, and follow a structured workflow. Moving on to data structures and algorithms. So, understanding data structures and algorithms is crucial for improving problem solving skills, which is essential for tackling complex challenges in data science. Mastering these concepts helps optimize data processing, make models more efficient, and handle large data sets effectively. Tech giants like Google, Amazon, and Facebook frequently test candidates on DSA during job interviews, making it a key skill for landing top data science roles. Spending one or two months learning fundamental structures like arrays, link list, trees, graphs and algorithms like sorting, searching and dynamic programming will greatly enhance your ability to write efficient code and build scalable solutions. Next, moving on to structured query language. SQL is an essential skill for data scientist because it enable efficient interaction with databases. It is useful for accessing, organizing, and analyzing structured data stored in relational databases such as MySQL, Postgra SQL, and SQL Server. As a data scientist, you will frequently work with large data sets, which require SQL to extract meaningful insights, join multiple tables, filter records, and optimize queries for performance. SQL is relatively simple, and you can master it in a month or two. Mastering joins, sub queries, window functions, and indexing will significantly improve your ability to work with real world data. Next is mathematics and statistics. Data science requires a solid foundation in mathematics and statistics, which serve as the foundation for data analysis and machine learning. To understand how models work and correctly interpret data, focus on key areas such as linear algebra, calculus, probability, and statistics. Linear algebra is essential for working with data sets, transformations, and machine learning algorithms. Next, calculus is useful for understanding the optimization techniques used in training models. Probability and statistics are critical for making datadriven decision, hypothesis testing, and predicting modeling. Mastering this topics over two or 3 months will significantly improve your ability to analyze data, build models, and draw meaningful conclusions. And next what you need to learn is data handling and visualization. Data handling and visualization are critical skills in data science enabling you to clean process and present data effectively. You will need to master Python libraries like pandas and numpy for data manipulation handling missing values and transforming raw data into structured format. And once the data is prepared visualization helps uncover patterns and insights. Mattplot lib and seaborn are essential for creating detailed graphs while tools like PowerBI and Tableau can enhance reporting with interactive dashboards. If you already know Python and SQL, you can develop strong data prep-processing and visualization skills within a month or two. Next, machine learning. Machine learning is an essential component of data science. It allows systems to learn from data and make predictions. There are two types of machine learning, unsupervised and supervised. Supervised learning are the models learned from labelled data with each input having a known output. Whereas unsupervised learnings are the models use unlabelled data to identify patterns. And to implement these algorithms, you will need to be familiar with tools such as TensorFlow, PyTorch and Scikitlearn. To establish a solid foundation in machine learning, spend 3 to four months learning key concepts and training models and optimizing their performance. Next is deep learning. Deep learning is a subset of machine learning that utilizes neural networks with multiple layers to solve complex task like image recognition, speech processing, and natural language understanding. Key architectures include convolutional neural networks for image processing and recurrent neural networks for sequential data such as speech or text. Next, mastering tools like TensorFlow and PyTorch is essential for building deep learning models. And to strengthen your expertise in deep learning, spend two to three months understanding neural networks, fine-tuning models, and experimenting with different architectures. And to become a data scientist, you also need to learn big data. Big data focuses on effectively storing, processing, and analyzing large data sets that traditional methods cannot handle. Hadoop and Apache Spark enables distributed computing and real-time processing which aids in the management of large amounts of data and spending two to three months mastering these tools will enable you to work with large amounts of data at high speeds revealing deeper insights that smaller data sets would not allow. Understanding big data is critical for scaling data science solutions in sectors such as finance, healthcare, and e-commerce. Mastering these skills and following this path will help you land a highpaying job in the field of data science. So we have explored the essential skills for data science career including programming, statistics, machine learning and big data. While the journey takes time, consistent practice and dedication will help you succeed. So the demand for data scientists is increasing and businesses are constantly seeking qualified candidates. So maintain consistency and focus on real world projects. Why mathematics in machine learning? Aspiring machine learning engineers often tend to ask me what is the use of mathematics machine learning when we have computers to do it all. Well, that is true. Our computers have become capable enough to do the math in split seconds where we would take minutes or even hours to perform the calculations. But in reality, it is not the ability to solve the math. Rather it is the eye of how the math needs to be applied. You need to analyze the data and infer information from it so that you can create a model that learns from the data. Math can help you in so many ways that it becomes mindboggling that someone could hate this subject. Of course, doing math by hand is something I hate too. But knowing how I use math is enough to explain my love for math. So allow me to extend this love to you guys too because I won't be teaching you just the mathematics for machine learning but the various applications you can use it for in real life. So what math do we need to learn for machine learning? Here is a pie chart which comprises of all the needed math. Linear algebra covers a major part followed by the multivaried calculus. Statistics and probability also play a big role and you need to know the knowledge of algorithms and much more. This is the requirement that is needed to master machine learning. So now that we have developed this understanding, let's do some math. We shall kick off with linear algebra. Linear algebra is used most widely when it comes to machine learning. It covers so many aspects making it unavoidable if you want to learn mathematics for machine learning. Linear algebra helps you in optimizing data operations that can be performed on pixels such as sharing a rotation and much more. You can understand why linear algebra is such an important aspect when it comes to mathematics for machine learning. So let's move over to the first topic in linear algebra scalars. So what is a scalar? A scalar is basically a value. It represents something right? So scalars are just values that represent something. thing suppose we had a laptop on sale and it is priced at 50,000 rupees right so this 50,000 rupees is the scalar value of that laptop what are the operations that can be performed on scalars it is just basic arithmetic so for example we have addition subtraction multiplication division all of those operations can be applied on scalars okay for example over here we are buying a laptop and the accessories what is the total price it'll be the addition of both the prices. So 50,000 for the laptop and 5,000 for the accessories brings it up to 55,000 rupees. What happens if you're buying a laptop at a 50% discount? So it is half the price, right? It is 50,000 divided by 2 that becomes 25,000 rupees for the laptop. So this is just a brief introduction and all that is required from scalars. So once we are clear with this, let's move over to vectors. So vectors can get a bit complicated as they are different for different backgrounds. Let me tell you how computer science people can interpret vectors as a list of numbers that represent something. Physicists consider vectors to be a scalar with a direction and it is independent of the plane. Mathematicians take vectors to be a combination of both and try to generalize it for everybody. All of these standpoints are absolutely correct and that's what makes it so confusing for anyone learning about linear algebra for mathematics for machine learning. In machine learning, we usually consider vectors in the standpoint of a computer scientist where the data is in the tabular form consisting of rows and columns. Right? And when our data is in the form of pixels or pictures, we consider them as vectors that are bound to the origin and transform them to matrices and perform operations that we shall discuss later. So now that you have a brief idea about vectors, let's jump over to the operations that you need to know when working with vectors. Operations on vectors can be applied only when you know what kind of data you're working with. Suppose you have pixel data and you want to apply rotations but end up doing something wholly different. Your model will not work because it is doing all the wrong operations here. It's important that you make sure that you know what you're working with only then you will be able to apply the required operations. So the first operation that we have here is vector addition. So let's understand what vector addition does. So for example vector addition is also called as dotproduct. Vector addition is something that is completely different from operations that we've been learning now for scalas. Okay, it's not simple arithmetic. It is actually the total work that is done by both the vectors in a quantified form. So for example, let me say that I want to walk forward by 50 m and that is one vector. Okay. So me walking forward for 50 m is one vector and from there I go right for 25 m right. So that is the second vector. So what is the work that is completely done by both these vectors is me moving forward and then moving right. So let me say for example v_sub_1 is what I walked forward. Okay. And then v_sub_2 is something that I moved right. So what is the addition of this? The addition is basically putting both the vectors point to point and then finding the displacement or the work that has been done. If you look at it over here, it is v_sub_1 plus v_sub_2's work. That is in the quantified form of v_sub_1 plus v_sub_2. It is the displacement. So for example, v_sub_1 is a distance and v_sub_2 is a distance. v_sub_1 + v_sub_2 is the displacement. So that is basically what is a dot product. I hope you've understood this. So let's move over to the next operation that is scalar multiplication. So what is scalar multiplication? So whenever a vector is multiplied with a scalar value, it either grows or shrinks. What this means is that you have a particular value a scalar value which is either positive or it is negative and it may be greater than one or lesser than one. Whichever is lesser than one and it is negative, it'll always make it shrink or else it'll do the opposite. It'll make it grow. So let me just show you an example over here. So let's say I have a vector called v_sub_1. Now if I'm trying to multiply this with a positive scalar value say k positive scalar value into v_sub_1 will make the vector grow. Whereas if I am trying to multiply this particular vector with a negative scalar value say minus k it will be shrinking down down. So it is minus k into v1 gives me a shrink vector. So as you can see this was my normal vector and if I multiplied it with a constant k which was positive it grew and if it was negative it shrink down. So that is basically what is scalar multiplication. So the next vector operation is a projection. So what does projection help us with? So for example, let's say I have two vectors, okay, v_sub_1 and v_sub_2. Now I do not know much about v_sub_1 and I just know more about v_sub_2. So if I can try to find a way in which I can project the vector v_sub_1 onto v_sub_2, I'll be able to obtain information about v_sub_1. So let me just show you. So for example I have a vector v_sub_1 and I had another vector v_sub_2. So I know all about v_sub_2 but I do not know about v_sub_1 vector. So what happens over here is if I'm able to project the projection of vector v_sub_1 onto v_sub_2 I will be able to analyze and know the unknown features that vector v_sub_1 has. Okay, actually if you go with this into deep learning, right, you can be able to find unknown features of the vector which can help you modify your one image into so many images that you can basically simplify and modify that image into something that it is not. Okay, that comes under deep learning but we are not going to cover that. But this is a very important concept that you need to understand. It is basically like it's said over here. Projection is the shadow of a vector that it places on the other vector. So whatever information that v_sub_1 vector has, I'll be able to somehow extract it from the vector v_sub_2 because the projection falls onto v_sub_2. That basically brings us to the end of all the vector operations that you need to remember for machine learning. Let's move over to matrices. So what is a matrix? A matrix is the composition or it is the mixture of numbers, symbols, expressions which are in a rectangular array. It can be rectangular, it can be square, it depends on the order. So what do we use matrices for? We use matrices to convert our equations into the form of arrays. For example, if you've got an equation, you cannot simply put that into your computer saying that okay, solve this and give it to me. You need to convert it into a list or an array so that you'll be able to perform your operations on it. Right? That is the reason matrices are so much important to us and that is the reason it is much more easier for us so that we can convert our equations into uh lists and arrays and then perform our operations on them. So suppose for example you have two equations over here. So how do you convert these two equations into matrices. So what does a equation tell you? If you have 2x + 2 y is equal to 10. What this basically is trying to convey the information to you is that you have two constants x and y. In these constants if you keep giving different numbers you'll be able to find a different value for it. Let's say for example the scala value is 10 and you have 2x + 2 y which is basically a vector. I need to find the points of x and y. y meaning that I'm trying to find the direction of the vector and then if I'm able to substitute values and then find out how 2x + 2 y is equal to 10. I'm finding all the information I need from that vector so that I can basically find out what all the other information I need from it. That is how functions are very important. You need to understand what a function is trying to convey to you. Let's say for example you have the equation of a straight line. So what does a straight line be? It is y is equal to mx + c. What does this mean? It means that the y-coordinate is equal to some value m into x plus a constant. So I'm able to plot this. I'll be getting the y-coordinate, the x coordinate and I'll be able to get a straight line. So whatever numbers I put into these, I will always be getting a straight line. So that is the reason it is called the equation of a straight line. It is always going to be constant. So you have to understand what a function is trying to convey to you. Once we are done with that, let me just tell you how you convert equations into matrices. It's really simple. Take out all the numbers. So you have 2x + 2 y = 10, right? So 2x and 2 y has two numbers, two. So those two become one row. 4 and 1 in the second equation become one row. X and Y becomes one column and 10 and 18. So let me just show it to you. So as you can see we have a matrix 22 41 then we have x and y is equal to 10 and 18. If you multiply it accordingly you'll be getting back the same equation that we had earlier. This is how matrices are used in linear algebra. So once you know how the equations are converted into matrices let's move over to the matrix operations. So what is the first operation? Simple addition. What does addition help you with? It is just basically adding all the directions of two vectors. Why is it two vectors? Because I'm converting vectors into matrices, right? You are simply just going to add the corresponding elements of both the matrices. Simply And you have to remember that if you are adding two matrices, it has to be of the same order. What do I mean by order? It is basically the number of rows into the number of columns. So for example, if I have the first matrix that is 22 2 4 and 1. I have two rows and I have two columns. So that becomes a 2 +2 matrix. That is the order of the matrix. The next one also 2 3 1 4. It has two rows and two columns. That makes it an order of 2 +2. So if I had three rows and I had two columns, it would be 3 into two. So I hope you have understood what order is. Let's say I had these two matrices and I want to add them. So what I do? It is 2 + 2. So let's say I have two over here and here. So what is 2 + 2? It becomes 4. 3? It becomes 5. What is 4 + 1? it becomes five. What is 1 + 4? It becomes five. So it is adding the corresponding elements of the matrix. So I hope you've understood that. So let's move over to the next operation. Matrix subtraction. It's the same thing as how you did it for addition. You just subtract the corresponding elements. If you had 2 - 2, it becomes 0. 2 - 3 becomes -1. 4 - 1 becomes 3. 1 - 4 becomes -3. Simple. You can understand this very easily. Moving ahead from matrix subtraction is matrix multiplication. What do you do with matrix multiplication? You are basically multiplying the rows with the columns. The matrix of row one with the columns of matrix 2. What happens? You have to remember that the number of rows in matrix one has to be equal to the number of columns in matrix 2. Only then will you be able to perform the matrix multiplication. So for example, if I had a 2 +2 matrix, it would be a11 into b11, a12 into b21, a11 into b12 plus the a12 into b22 accordingly. So let me show you an example for matrix multiplication so that you can understand how it works. Suppose I have a 2 +2 matrix that I've already been using for matrix addition and subtraction. Now how am I going to perform the matrix multiplication is basically I'm going to you know multiply the first row into both the columns of the matrix 2. So I have 2 into 2 + 2 into 1. Then I have 2 into 3 + 2 into 4. The same thing goes for the second row also. So what do I get? I get 4 + 2 6 + 8 + 1 and 12 + 4. That accordingly gives me 6 14 9 and 16. So this is the matrix multiplication. It's really simple to understand. So if you have any doubts, leave them in the comment section. We'll get back to you as soon as possible. So once we are done with matrix multiplication, let's move over to the transpose. So the transpose is a really simple operation. All you do is you convert the rows into columns. That's it. But why is it so important? Transpose is really important when you want to change the dimensionality of your data. Suppose all your data is in a row. You can reiterate through there or all your data is in a column you can do it accordingly. So it is really important when you work with transpose because it helps you to change the dimensionality. It helps you flip the dimension. So for example if you're working with pixels right pictures if you change the rows or the columns of your pictures data you are basically changing the picture and then you can you know analyze it more get more information from it. So transpose plays a very important role. You need to understand that. So for example, I have 2 4 and 1. What happens is the first column becomes the first row and the second row becomes the second column. So as you can see here it is 2 4 and 1. And that is all transpose is. We have determinant of a matrix. By now all we have been doing is having matrices being added, subtracted. So what is all of this is? It is basically all the vectors being added, subtracted, multiplied, you know, then flipped with their dimensions. You will understand this when you do all of this practically because all of this is something that we do not do it in our daily lives. But

### Top 10 Reasons to Learn Data Science [35:36]

when you're learning with machine learning, when you're performing machine learning, all of these operations come into picture. So all that you've learned by now is basically adding values of vectors, subtracting vectors, multiplying two vectors, then transpose that is the flipping of vectors. And now you're going to learn about determinant. What is a determinant? All the matrices that we had till now, they are basically directions. They are basically directions and values of all the vectors. Now all these vectors are definitely going to have a scalar value in them. so that you can understand the weight. depth of that particular matrix. What is the determinant helping you with? With the determinant helps you with understanding the weight or the you know sensitivity that it can provide on the data set. That is the reason determinant is really important. It is the scalar value of the matrix and it can help you give the igen values of the matrix. What do I mean by values? I'll teach that to you in the next part because we are going to understand in depth what are igen values and igon vectors. That is the reason determinants are so important. So let's say for example I had this particular matrix a b c d f g hi. It would be a into ef hi and then it would be minus b into df gi plus c into d e gh. So what happens now is I'm going to get this particular equation. So it's a EI plus BFG plus CDH minus A FH minus BDI minus CG. This is basically what is going to give me the scalar value of the matrix. So I hope you've understood the importance and how determinants are important in machine learning. So next we will learn about the inverse of a matrix. So how do I explain inverse to you? It's a very simple example if you understand that. Suppose I am walking on a road and I walked straight for 50 m but then I remember that I had to get back something from the place I started. So I walked back 50 m. What is the distance that I traveled? traveled was 100 m. Why? Because I went 50 m forward and 50 m backward. But what is the work that I have done? it is zero. Why is that? It is because I'm at the same place that I started from. That is not the work done. If I move from one place to another place, that is the work done. That is some displacement that my body has achieved. But I have not achieved that over here. Why? It's because I have gone straight and come back straight for 50 m making my work done zero. So just the same way inverse of a matrix works. Suppose you have a vector that moves in the forward direction. you will have the inverse of that particular vector that comes in the negative direction and it makes all the work done zero. Sometimes there is no inverse of a matrix that exists. That's because the vector does not have all the information that is required to obtain its inverse. So I hope you have understood what is a inverse. Let's see how do you find an inverse of it. Okay, for a dross2 matrix this is how you're going to find it. It is finding the determinant into the transpose or you know finding the inverse of the matrix. So it is a and d that is going to be you know switched over and minus b and minus c. This gives you the inverse of the matrix. So for orders three and above what you do you find the determinant of a and then you accordingly find all the different determinants inside of it. So that is basically how you find the inverse of a matrix. Let's move over to how a vector can be used as a matrix. Okay. So by now that I've been telling that vectors can be easily translated into matrices and I've already shown it to you. And why is it so important? It's because they help you apply operations on the data very easily. And then you have certain well-known operations such as scaling, rotation, sharing and much more. All of this come under computer graphics. Okay. making these operations on your image or your vectors becomes really easy when you're working with matrices. So that's the reason matrices are so important. So for example, right now I've shown you that if there was an equation v1 as 3x + 4, it would be 3 and 4 and then you had x and y. Then you

### Data Science Basics [40:12]

would have x + 2 y which would become 1 and 2 and then x and y. That is how vector as a matrix works. What are some of the well-known operations? Right? I'm assuming that there is going to be a 2 +2 matrix. Whenever you're scaling, it is basically increasing the size. So, how do you increase the size? It is sx and sy, which are the scaling factors which you perform on your x and y coordinates. Then you have sharing which is basically moving or reshaping your particular object that you're working with. So, it can be m which is the sharing factor. And then you have the rotation. So how do you rotate your particular object in which direction? So all of that can be done using these particular matrices. So let's move over and understand how matrices can help you solve equations and you know obtain solutions much more easily. So that is basically vector as a matrix because we have two methods over here. They are the echelon method and the inverse method. For our tutorial I've been taking the row echelon method because it's much more comfortable for me. If you are much more comfortable with column echelon method, you can go ahead with that. But because I am comfortable with row echelon methods, I've used row echelon method. And next we have the inverse method. So we are basically going to solve the equations. We are going to find the coordinates at which our points give us that particular value. So let's move over with the first method which is the row echelon method. And I have some equations over here which go ahead like 2x + y - z = 2 x + 3 y + 2 z = 1 x + y + z = 2. So I'm going to convert all of this into a particular matrix. So let me show you how the matrix looks. It is 2 1 - 2 and 2 1 3 2 1 and 11 1 1. So all the information about the equations have been put into the matrix form and the answers that we're going to get are x= 2, y = minus1 and z is equal to 1. So this is the reason I'm going to get all these particular values that is 2 1 and 2 which I'm getting from the equations. Whenever I put the x, y and z values into them, I'm going to you know get those particular coordinates. Now let me just tell you why we are solving equations. Okay. So if I am telling that there is this particular vector that goes like this or comes in this direction or something like that. I want to find the x y and z coordinates so that I can obtain the information visualize it and then learn more about that particular vector. Now say for example this is just 2x + y - z equal to 2. Let's say for example if what if it was a two boxes plus one candle minus one chocolate is equal to giving me a profit of 2 rupees. Okay so as you can see these are just simple equations and these are the coordinates that we get x= 2 y= -1 and z is equal to 1. But what is the importance of this? Okay let me just give you a simple example so that you can understand what we are actually trying to find out over here. Okay, suppose I have a factory and I want to get a profit of 2 rupees and I have boxes, I have candles and I have chocolates. So if I take the first equation, I am saying that if I have two boxes, one candle minus the chocolate, it is going to give me 2 rupees profit. So what is X, Y, and Z exactly? are the investment costs. How much do I need to invest into them so that I can get back what I really required? So for boxes I need to invest 2 rupees. For uh chocolates I need to invest nothing. For candles I need to invest minus one or something like that. Okay. If I invest so much I'm going to get 2 profit. So this is like a real life example which is being converted into some equation and then we just need to solve it. That is the reason vectors are so important and how they can be converted into matrices and then you know solved very easily. How does the do echelon method work? I'm going to be taking the step one which is R1 / 2. R1 when we divide it by two we get 1/2 -/ and 1. So what's the step two? We will do R2 - R1. So we have R2 which is the minus of R1. I'm going to get the particular equation which is 1/2 -/ and 1 0 5x2 and 0 1 2 simple enough right so what is the next step it'll be R3 - R1 so it is uh 1/2 -/1 0 5x2 0 and 0/ 3x2 and 1 the next step it's 2x 5 by r2 so you get 1/2 -/ 1 0 1 0 and 0 1x 2 3x2 and 1 So what's the step five? Step five is basically R1 -/ of R2. So I get 1 0 - 1 and 1 0 1 0 and 0 1x 2 3x2 and 1. Step six is R3 is equal to R3 -/ of R2. So I have 1 0 - 1 and 1 0 1 0 and 0 1 1. Step seven is R1 + R3. So I have 1 0 02 0 1 0 and 0 1 1. Step 8 is R2 is equal to R2 - R3. So I have 1 0 2 0 1 0 - 1 and 0 1 1. If you put all of this back into the equation, you will find out that for the first row 1 02 is equal to x + 0 is equal to 2. The value of x y is equal to minus1. And the value of zed is equal to 1. So this is how the row echelon method works. And what do you have to actually perform in rorow echelon? It is basically the first row has to start with the value one. The second row has to have a zero and then somewhere in the middle it has to have a value of one. And then the third row should be starting with zeros and then getting some value where it starts with one only. So row echelon or column echelon whatever method you use you have to make sure that whichever element is the first starting element of the row it has to have a 1 or it has to be preceding with zeros. So as you can see here the first row has 1 0 02 and then you have 0 1 0 - 1 1. So that is how you are performing the rorow echelon method to solve your matrix. So if you remember we had x= 2, y= minus 1 and z is equal to 1. Those were the answers that we wanted. Once we are done with ro echelon method, let's move over to the inverse method. So suppose we have some equations 4x + 3 y is equal to -3 and - 10 x - 2 y = 5. What does the inverse method tell us? Let me show it to you to you. So this is the matrix that you come up with. So it is 4 3 - 10 and -2. Then you have x and y and which is equal to -3 and 5. So what does the inverse method actually tell us? For example, we know that a into a inverse is equal to zero work done. Right? It is the identity matrix I matrix. So if I have a into the a inverse. So it is like I'm doing the work and I don't do the work is equal to no work done. So a into a inverse gives me the identity matrix. If a into b is equal to c. So a is my you know all the numbers then b is my x and y and c is the output value. If I do a into a inverse into b is equal to c into a inverse. I am going to get b is equal to c into a inverse because a into a inverse gets cancelled. So b is equal to c into the a inverse of it. So that's what I'm going to do. So what is the inverse of the 2 +2 matrix? The inverse is found by this formula. And the inverse is actually 1x 22 - 2 - 3 10 and 4. So that's the inverse of the matrix. So let me show you how it works. What happens is I'm having a into a inverse into b is equal to c into the a inverse. So a into a inverse is obviously going to get cancelled out because you know it's an identity matrix. So I'm just going to be left with x and y is equal to 1x 22 into -3 and 5 - 2 - 3 10 and 4. So what I do later I'm going to just multiply and this is what I get. So it's basically 1x 22 and 11 by - 110. So if I do 1 by 22 into the particular matrix that I found, it is going to be x is equal to half and y is equal to minus 5. So that is how I'm going to be using the inverse method to find or solve the equations. I hope that was simple to understand. So once we've done all of that, we are now capable enough to understand what ag vector is. Igen vectors are really important when you're working with data. All of that what vectors matrices what all we've been doing by now think of it as the data you're working with in your machine learning algorithm right what is an igen vector does not change the direction even if a transformation is applied to it what does a hyen vector really give us so for example I have a data if I am trying to do any transformation on it should not change from what it was really initially. It should not become something that is totally irrelevant to me. If I have such data that even if I apply transformations to it, it does not change what it was supposed to be, that is an igen vector. What does an igen vector give me? They are the most sensitive parts of my data set and I can use them and I can trust them for my analysis purposes. So that is why IGEN vectors are so important. vectors help you transform and you know make your data much more careful. So for example, let me show you what an IG vector looks like. Okay, so I have a particular rectangle and I have a lot of vectors but for my example I'm just going to be using two vectors. Okay, I have a v_sub_1 and v_sub_2. These vectors are basically trying to explain all the data that is there in the particular rectangle. What happens if I perform the sharing operation? When operation, my V1 vector has changed its direction completely. It was at some other direction and after sharing or after performing some operation, it has completely changed its dimensionality, its direction. All the information that it held initially has now completely been changed. Where is my V2? It has not changed its direction. Yes, it has moved forward. It has multiplied. that it gives me the same information that it was giving me earlier also. So that becomes an igen vector. Think of this now in the form of a matrix. If my matrix is changing so drastically, how will I be able to perform operations on it? It becomes really difficult. Okay. So for example, think of the vector v2 as a you know list which is then just multiplied by two. It does not make a difference. Why? It's because even if I multiply by two, it's just going to double the value. But the operation that I want to do on it, the information that it is giving it to me that's the same. So that's the reason vectors are so important. And what are I on values? The values that we perform on these igon vectors are all that are igen values. It's that simple to understand. So that is all you need to know about vectors. All the data which is not transforming anything is basically the data that you should be learning with. What are the applications that linear algebra helps us with? You have the first and foremost which is the principal component analysis PCA. It is used for dimensionality reduction and it helps increase the quality of data. Then you are working with applying transformations. It helps in encoding. It helps in the single value decomposition where it is again reducing the dimensionality of a single value data and then it is used for natural language processing for latent semantic analysis optimization of your deep learning models. All of this can be achieved through linear algebra. So it is a very important aspect of machine learning. Once we are done with all of this, let's start our coding for PCA. We will now be coding for principal component analysis. Let me show you how it really works. So I am having all the programs already prepared for you guys. So we do not extend this big tutorial even more further. Let me go to the presentation mode mode. So what happens over here is I am having these important libraries which are import numpy as np. Then I have the pipot as plt and then from skarn. de decomposition import the PCA. So principal component analysis is already present. I'm just going to import it from skarn. So I'm going to get a range of around you know 200 300 how muchever it is. After I'm getting that particular range I'm going to get data. I'm going to get all of that data and then I'm going to put it into my particular list. So what happens over here? I have np dot and in the range of 2a 2 and it is 200. So in the range of 2a 200 whatever numbers that you can randomly generate please give it to me and then do t. Why is it dot? It's because all of this which I'm going to get is basically in the form of a column. I do not want it row. So that is the reason I'm using tt. dot t is used to transform it. So I'm just going to scatter that on the plot and then show it to you guys. That is what this particular function does. And then I am doing PCA is equal to PCA of n components is equal to two. And then I'm going to fit the data into it. And then I have a function which is to draw the vector. So I'm going to do arrow props dictionary style all of that. I've just given the styling over here. And then I'm going to annotate and draw it accordingly. So this is my draw vector function. And then I'm just going to scatter plot everything that I had. So I have the vector. How am I going to get a vector? It's using this particular formula which is the vector and three and np. sqrt. So what is PCA explained variance and components. Okay. So if you remember I had fit PCA with the data that I have generated, right? So explain variance is basically going to tell me how much is the variance between my data points and what are the components that are really required that I look at and perform my analysis on them. So these are what the required components are for me and after I do all of that I'm going to get another vector. I'll show that to you guys and then what I'm going to do is I'm going to find one component only over here and I'm going to fit the data. Okay. transform and perform the inverse transform so that I am going to get a particular linear line which I can just look at and find all the important aspects of the data that I wanted to from it. Let me run the program and let me show you all the outputs. So as you can see here this is all the data that I had generated using the dot function of the random generating thing. So this is all the random data that I have generated. Then what I did I had to find out two components. Right? Then what I had to do I had to find out two components. What can I understand from this particular information? This one is a small vector. So this small vector means that this is the only amount of information that it can give. Whereas this vector is so long. What this means is that the longer the length of the vector is, the more information that it can pass to me. Simple enough, right? Longer the length of the vector, the more information that it can pass to me. So what all information I have through this particular vector, all of this is going to give me the best for my analysis. I'm going to get all the data points that are according to this vector. So as you can see here all of these data points are very important for my analysis. So this is how I converted such a big huge data set into one particular line. And if I follow all these points I'm going to get almost the same amount of information the whole data set is going to give it to me. As you can see this was such a huge data set but it's just come up to this one line. It's so easy. I think if you're working with tons and tons of data, it becomes really difficult when you're working with it. That's the reason use PCA and it's just not use PCA. Make sure that you are able to reduce the data as much as you can so that you can get at least all the amount of information that you can from that particular data. You do not need all the data but you need the information from that data. That is basically how PCA works. I hope it was clear to you guys. So basically that was all we needed to learn from linear algebra and I hope it was clear to you guys. So now that we know all about vectors, let's move over to multivaried calculus. Multivaried calculus is one of the most important parts of the mathematics in machine learning. It helps us solve the second most important problem that we are facing in developing machine learning models. The first problem is obviously the pre-processing of the data. The next is the optimization of the model. Multivated calculus helps us to optimize and increase the performance of a model and give us the most reliable results. So how does something that almost half the class hated help us solve such a problem. So let's break all the ice that's surrounding this. But before that we need to understand the basics. So let's calculus. So the first topic in calculus is differentiation. Differentiation is basically breaking down the function into several parts so that you can understand every element and analyze it in depth. They're very helpful in finding the sensitivity of a function to the varying inputs. A good function gives you a good output which can be described using a rather easy equation. The same cannot be said for a bad function. So for example right here I have y is equal to vx. It is really easy to tell that okay if it is e x this is going to be the graph for it. But look at y= 1x. It is such a horrendous graph that I cannot explain it using a particular equation. Those are the types of equations. Those are the good functions and the bad functions. We know all of this but what are we really after? So let's do all of that right now. Okay. So most of us already know all of this but what are we really after? So let's understand that with a really simple example. So let's assume that we have a car moving in a single direction only and is already in motion. So if we plot a graph of its speed versus time, it communicates to us how the speed varies as the time keeps increasing and it halts after a certain point. Now if we want to know the rate at which the speed varies with respect to the time. It turns out to be that we are actually finding the acceleration. The acceleration of the car can be plotted as follows. So what this means is that acceleration is actually a derivative of speed because the speed we have here is just the magnitude and does not hold any more factors just for simplifying everything. So if there was no speed, there would be no acceleration. It's as simple as that. Now that we have the acceleration, we can justify whether the car had a varying or a constant change in its speed, whether it was moving or not and much more. But in this case, that is all we need. Now think that you want to find the change in acceleration between a certain range in a time span. Okay, we mark two points X and some variable which is a small portion more than X. We can denote this using X plus delta X. So let's denote this on the graph. So you can see it as it's shown over here. Now we know that this range has many values in between. But what if we want to know the rate of change only between one point and the next? We know that this is really kind of impossible because between a range there are countless numbers as the function is continuous. Thus we approximate that the limit or the step between two input variables is zero. Remember that this zero that we are assuming is only the smallest possible value that we can make up and it is not the absolute zero. If we ever work with absolute zero, we would never have had any functions at all. So if you are still confused, this number is basically some value of 0 0. It just keeps going and then some numbers ahead. But it is never really zero. That is the only way that we can put this as. Okay. So now that we have understood what we are really trying to find, let's make it a general equation. So let's derive the derivation formula. So the derivation formula is fdash of x is equal to limit delta x ts to 0 which is not exactly zero but we are assuming it to be 0 something. So we are just going to write zero and then we have f of x + delta x minus f ofx / delta x. That is how the formula comes into existence. This helps us find the rate of change between one point to the other. It is such an important concept as it plays a huge role in the optimization of machine learning models. You need to understand that this is the first order of derivation only. If we differentiate the output of the first differentiation, it becomes the second order differentiation and so on. But that is all the introduction we need from differentiation. So now that we have the basic idea and formula of derivation, let's move over and understand some of the important rules that we need in differentiation. So the first tool that we will be talking about is the power rule. So whenever you have a function which has a variable with some power, you can basically use this formula and solve the equation much more faster. So for example, let's say that I had a equation called as f is equal to 3x². So let me put this into the formula. So I have fdash of x is equal to limit x ts to 0 3 of x + delta x the whole square - 3x² / delta x. So x + delta x the whole square becomes a formula. Let me expand it. And then this is what I actually get. Plus x square and minus x square get cancelled. And then I have delta x which is being removed out of there. So delta x and delta x also get cancel. Then I have 3 into delta x + 2x. So I'm going to put the value of delta x is equal to 0 into the equation which is not exactly zero but it is so small that I can ignore it. So I have 3 into 2x which becomes 6x. So that's how I had to solve the whole entire equation. Right? So what about the power rule? Power rule is just 3 and whatever number I had. Right? So if it is 3x² let me take out 3 because 3 I'm going to treat it as a constant. So x² so x ^ of 2. So then what I do I bring 2 down which is n and then I have x of n minus 1. So it is 2 - 1. So it becomes 3 into 2x which is equal to 6x. So I get the entire same answer but I just had to do it in three steps. I can actually do it either in one step but just for simplification I've done it like this. The next rule that we are going to be studying is the rule of sum. So if you have a particular function which is the sum of two variables their differentiations sum also is the differentiation of the whole function. So if you have f of x1 + x2 it will be x1 dash + x2 dash. A simple example will be 3x² + 5x. Okay. So I'm going to solve the equation right now. 3x² I hope you remember. I just showed it to you before. And the same thing goes for 5x also. So 5x + 5 delta x - 5. And then I'm going to just, you know, cancel out 5 x's and then I have delta x being canceled out and then I have x² - x² being canceled out. So the whole same process goes on. Then delta x gets cancelled. Then I have 6 x + 5. So I'm going to use the same rule that we learned earlier. That was the power rule. And then I have 3 into 2x + 5. So I get 6 x + 5. This is basically how the sum rule works. Next I have something called as the product rule rule. So if I have a function which is in the form of x into 2x or something like that. The differentiation is very simple. This is how the differentiation looks. If it is fdash of x1 into x2 that is equal to x1 dash into x2 plus x2 ddash into x1 that is how you're going to find out the differentiation. And the next rule that we are going to look at is the chain rule. So basically if I have fdash of gdash of x it will be fdash of g into x dot gdash of x. This is a very simple example and it's really easy to understand. So if you want to learn more about the common functions, I have my blog which is also put up in the description and you can go ahead click over there and learn more about the common functions. Okay. So now that we have understood all the basics that we really needed from differentiation, we'll now be moving over to partial differentiation. Partial differentiation is an important concept that most of us have ignored throughout our academy. What has partial differentiation helped us achieve might be the question that most of you may be having right now. So let me give you an example so that you can understand its importance. Okay. Suppose you are a car designer who deals with the exterior of the car only. You have been given the task to maximize the performance of the car. So how do you do that? You do not go into the car. You do not take out the engine. You do not tune the engine. You are not supposed to do any of that. you just are going to change the particular exterior of the car. So what's happening over here? You know that even the engine and all the other tires and all that all those are variables also that you know come to making the car better in its performance. But you are not going to do that. You're not supposed to do that because you are only going to deal with the exterior of the car. Those variables of the engine and all of that those are kept constant. If those are kept constant that means that those variables will be performed by somebody else. You are just going to look at your particular variable that you have to change that is the exterior of the car. So this is how partial differentiation comes into play. You are just going to change one particular variable. All the other variables become a constant. Okay, it's that simple. So what do you do? You would change the windshields, change the body if necessary, air vents and many more. or I've just given whatever I could find. You are just changing what is necessary. That is basically what is partial differentiation. So what are the equations that partial differentiation goes according with now if you can see it is the fdash of x y and z. If you remember the previous slides we were just working on one variable that is x which is x1 x2 which may be x² 3x 5x. It was all on x. Partial differentiation is not like that. It is x, y, z, whatever variables it can find. So that is what makes partial differentiation much more realistic, much more easier to understand because you are just going to change your factors, your variables. You're going to keep everything else constant. So let's say for example, you have to find it with respect to y. So y and zed become a constant. Fdash of x. If you want to find it with respect to y, x and zed become a constant fdash of y. With respect to zed, x and y become a constant. Fdash of zed. So what is the complete differentiation? It is the addition of all the three differentiations that we've just found out. So this is how partial differentiation works. Let's show you an example. So if I have a function which is x² + 3 y + 4x z², how am I going to find the differentiation of this? So with respect to x, y becomes a constant. So it is zero. And then you have 2x + 4 z². So with respect to y, x² + 4x z² become a constant. So I'll just have 3. So with respect to z, it will become x² becomes a constant. 3 y becomes a constant. And then 4x z² is what I'm going to differentiate. So I will have 8 x z. So what is the total differentiation? It is a 2x + 4 z² + 8 x z + 3. So I hope you've understood this really simple example of it. So what are the applications that multivaried calculus comes into play. We have something called as the jacobian vector. Now what is the jacobian vector? It is basically differentiating the vector once. Jacobian once and then putting it into the form of a matrix that becomes the Jacobian vector. It helps in finding the global maximum of the data set. It is pointing to the maximum data that is there in the data set and it also helps in linearizing a nonlinear function to a linear at a particular point. So that's how the Jacobian helps. So differentiating the jacobian or differentiating the actual vector twice gives us the hessen and it is used in minimizing the errors. deep learning models. Gradient descent for optimizing all the weights and all of that. So that is how multivated calculus helps us in real life. Let me show you how the gradient descent works. I already have code ready for you guys. Let me go back to the presentation mode. I have a particular import which is the numpy and then I have the sigmoid function which is basically 1 / 1 minus e ^ of minus sop. So this is what I'm going to return back to it and then I have the error that is a predicted minus target whole square or x ^ of 2. This is what the error is basically and then I have the errors derivation which I'm going to use over here. So that is basically 2x or 2 into predicted minus the target part. And then I have the actual function over here which is sigmoid into 1 minus sigmoid which is the activation function which is basically trying to find if my particular model is working or not. And then I have the derivation of it which is just basically x and then I have the updating function of the weight. So I'm just going to get the weight. I'm going to get the gradient and I'm going to get learning rate. And I'm going to basically keep trying to you know find out the best way that I can find my weight. So here's my function. I have x is equal to 0. 1 which is my input and then I have the target which is 0. 3 and I'm going to have a learning rate of 0. 01. And then I'm going to get a random number which is the weight of it and then this is the initial weight. What I'm going to do then is I'm going to find the value of y and x and I'm going to find the errors also and then I'm going to find all the gradients and then I'm going to pass the gradient into it and then update the weight. Let me show you an example now how increasing the number of steps is going to make me reach my target much more better. So as you can see my initial weight was 050 thing and all that and I have just run this for 10 times right let me change this to 1,000 my input is 0. 1 and this is 0. 3 I have to make 0. 1 as 0. 3 right so let me run this again it is still at 050 let me add another zero which is 10,000 times let me run this so as you can see it has now become 0. 4 47 which is much more better than what we were actually doing. Let me add another zero over here. So this is going to take a lot of time because the amount of you know steps to be performed among but as you can see over there which was 0. 5 previously has now come up to 0. 36 which is much better than what we were actually getting. Let me add another zero. Let me run it again. You know 4. 6 4. 4 for 4. 0. As you can see, it is reducing. It's just reducing the input and our output is almost close enough. What we really needed the target was 0. 3 and as you can see here, we've already read 0. 30 and all the numbers that are succeeding it. Right? If you remember, we had 10 steps which would just give us 0. 50 50 or 51 something. Now for the amount of times it has learned it has become much much better than what we were previously getting. So now it is 0. 300 02 something. It's still reducing. It is still reducing. So this is how the gradient design works. It keeps repeating learning learning. This is the final output that we have achieved 03000577 and it is much better than what we were already getting from the output. So this is how the gradient descent works. It uses uh differentiation. Where is the derivation over here? This is my derivation function. So these derivation functions are what are helping me to get my new error. get my new error and then put that into the gradient design function and then basically find and make my weight much more better so that I can get the output which is from my input 0. 1 I have to get the target of 0. 3. So I hope you've understood how the gradient descent works. It keeps going in and it uses a differentiation lot. So I hope that was very easy for you guys to understand. So with that we have come to the end of all that was required from multivaried calculus. Let's move over to the next topic which is probability. So what is probability? Probability is measuring how likely an event will occur. What this means is that how much are you sure or how much quantity that you can give to your shorness that this particular event is going to happen. So that is what is probability. It is the ratio of the desired outcomes by the total outcomes. Right? So with that particular formula you will be able to understand what is the probability. It is the desired outcomes by the total outcomes. So always remember that probabilities always come up to one. So if you have 0. 6 that means you have a 60% probability that this particular thing will happen. If you have 0. 6 of something happening and 0. 4 four of something not happening they always sum up to one remember probabilities so some examples that I've given over here are rolling a dice there are six possibilities right so every possibility has one outcome out of the six outcomes so for example the probability of getting a number two is 1x 6 so that is the probability of getting a number 2 by two this is basically what is probability so what are the terminologies that you need to understand when it comes to probability. There are three things. Okay, you need to understand what is a random experiment, what is a sample space and what are the events. Let's understand all of these terminologies one by one. Now, so what is a random experiment? It is an experiment or a process for which the outcome cannot be predicted with certaintity. So for example, if I give you a dice, okay, and I tell you to roll it and tell me the number that you're going to get it, you are not sure because that's a random experiment. you have this uncertaintity whether this particular event will happen or not. Okay. So that process where you roll a dice and all of that is a random experiment. So what is the sample space? The entire possible set of outcomes of a random experiment is the sample space of that experiment. It's really simple, right? I tell you to roll a dice. There are six outcomes. So it can be either 1 2 3 4 5 6. That particular range of 6 is the sample space. It is all the possible outcomes for the random experiment that you've been doing. Simple enough. So what is an event? An event is one or more outcomes of the experiment. So if I told you roll the dice, you get a number one that is an event. You roll the dice again you get a number six that is an event. So what outcome you get from your random experiment is called the event. There are two types of events. There are joint events and disjoint events. Let's understand both of these. So joint events have common outcomes, right? Joint events can be together. That is what is a joint event. For example, you have a student who can get 100 marks in statistics and 100 marks in probability. The outcome of a ball that can be delivered can be a no ball also and it can be a six possible are called joint events. Disjoint events do not have common outcomes. So the outcome of a ball that is to be delivered cannot be a six and a hit wicket or something like that. Okay. So a single card cannot be a king and a queen together and a man cannot be alive and dead at the same time. So those are disjoint events. Those are not at all possible to happen. Now that we've understood all the types of events, we need to understand the distributions. So now we have three distributions that come under probability. We have the probability density function. We have the normal distribution and central limit theorem. So let's understand all of these distributions one by one. So the first one is PDF probability distribution function. So the probability now can be described using an equation. The equation that describes a continuous probability distribution is called the probability density function. I hope it's very clear to you. If it is not, let me just simplify it even more. You have a function which is going to describe the probability of something happening in the form of a graph. It's that simple enough to understand. You have a function which will give you the probability which you can plot a graph on on. So that is the reason it is a continuous probability distribution. Okay. As you can see here you have a and b between the range of a and b is the most likely that something is going to happen. So now that you've understood what a pdf is, let's understand the properties of the pdf. the graph of the PDF will always be continuous. It's simple enough to understand that. Then the area bounded by the curve of the density function and the x-axis is always going to be one. And the probability that a random variable assumes a value between a and b is equal to the area under the pdf bounded by a and b. What this means is basically if you have any particular value or any particular probability that is between a and b, it is going to be equal to the area that is bounded by a and b. the probability value. So that is basically what a probability density function is. So next we have the normal distribution. So what is a normal distribution? It is a probability distribution that associates the normal random variable X with a cumulative probability. So how do you find this particular cumulative probability T? It is using this formula which is y = 1x sigma into the square roo< unk> of 2< unk> i into e ^ of - x - mu into 2 divided by 2 sigma 2 where x is the normal random variable that we have just taken up. Then mu is the mean and sigma is the standard deviation. And this is how the normal distribution looks like. The graph of the normal distribution depends on basically two factors which is the mean and the standard deviation. So the mean determines the location of the center of the graph and the standard deviation determines the height of the graph. So if you have a very big deviation between you know your variables it becomes a very short graph and very wide graph. Whereas if you have a good deviation which is you know very less you will have a tall graph. That is basically what's going to happen over here. That is all we have from normal distribution. Then the central limit theorem. So this is basically a theorem which is stating that the probability is always going to be in the center of the graph. Let me tell you the statement of this. The central limit theorem states that the sampling distribution of the mean of any independent random variable will be normal or nearly normal if the sample size is large enough. So what this basically means is that the distribution of your mean or your samples are always going to be near the center or mean of your graph if your sample size is much large enough. Now so what this means is that if your sample size is very big it is going to be a normal distribution. It is as simple as that. So as you can see if I have n is equal to one component you can see the graph as it increases to three 10 as it increases to 50. So as you can see it almost becomes nearly identical to a normal distribution. So that is what the central limit theorem is. So once we done with that we will look at the types of probability. We have marginal probability joint probability and conditional probability. Let's understand each of this. So what is marginal probability? Marginal probability is the probability of occurrence of a single event. If I flip a coin, it is going to be either a head or a tail. That occurrence of that single event is called as a marginal probability. So for example, I have a deck of cards here and I randomly take out a card from there and I want it to be a hard card, right? So that will be the probability which is 13x2 and it can be expressed by this particular formula. So once we done with marginal probability, let's go to joint probability. What is joint probability? It is the measure of two events happening at the same time. So the same example over here I have a deck of cards out of which I want to take out the ace which is a hearts. So it has to be a ace card and hard card which will be the probability giving me 1 by 52. So now that we've understood what is joint probability, let's understand what is conditional probability. So the probability that depends on something already happened is called as the conditional probability. It is basically the outcome of the event is based on the occurrence of a previous event or an outcome. Conditional probability of an event B is the probability that an event will occur given that A event has already occurred. What are the two types of formula that we have in conditional probability? If A and B are dependent events, then the expression for the conditional probability is given by P of B by A is equal to P of A and B divided by P of A. If A and B are independent events, that means the expression will be P of B only. So it is just the probability of B occurring. Whereas if they are dependent events means that if you want to find the probability of B, A already has to be occurred. So it is probability of A and B divided by the probability of A. So that is basically all the introduction that was required from probability. We now have base theorem. So what is base theorem? It shows the relationship of one conditional probability and its inverse. What is base theorem? It shows the relation between one conditional probability and its inverse. So this is the formula that is p of a by b is equal to p of b by a into p of a divided by the p of b. So what does this particular thing mean? So what we are trying to find out is the probability of the occurrence of A given that B has already occurred is equal to the probability of B occurring when A is already there and the P of A which is the prior probability which is something that we already think of and then the probability of B occurring. So this is basically what the base theorem is. Let me give you an example for it. Okay, suppose you're a doctor and you are running a clinic where you test if a patient has a liver problem or not. All the previous patients that had come to you out of them 10% of your patients had liver problems. Okay. So all the patients that had come before they had around 15% probability that they were drinking. So now we have found out the probability of A and probability of B. The prior probability and the probability of B. So what was the probability that out of all of those patients even though they had a liver problem they were drinking? It is 5% which is P of A by B. Hence the base theorem says that the probability that a new patient whoever is entering will have a particular liver disease is calculated by 0. 05 into 0. 10 divided by 0. 15 is equal to 3. 33%. What this means is that if a new patient is going into the clinic there is a chance of 3. 33% that particular person is also having a liver problem. This is basically how the base theorem works. So what are the applications of probability? Probability helps you to optimize your model. Classification of our algorithms also requires probability. The loss can also be calculated using a probability. Then models are built on probability. So let me show you how the knife base classifier works over here. So let me go to knife base. This is the knife base classifier. So let me tell you what I have done over here. I have imported the data sets, the metrics and all the required knife based classifier and I've imported the gian knife based classifier. The gian is also another name for the normal distribution. Okay, so normal distributions are also called as the gian knifebased classifiers. And why I have done this is because I had visualized it and I came to know that it was a normal distribution. Okay, so I load the iris that is the load iris data set and I load a model and then I'm just going to fit the model. Okay, then I'm going to get the predicted model of it and then I'm just going to you know basically find all the metrics of it. So basically let me show it to you what is the precision recall of all of that. So this is all the accuracy. Then my model has an accuracy of 96%. And this is the confusion matrix which means that out of 50 it was able to classify all the 50 correctly. Then out of 50 it classified 47 correctly and three were wrong. And then out of 50 again over here three were classified wrong and 47 were classified correct. So this is the knifebased classifier. So as you can see probability is actually being used in all of this. The gian knife based classifier already does all of that. So we do not need to you know worry about it much more. So now that we have finished up everything that we needed to do from probability, let's move over to the last topic for today. Statistics. What is statistics? Statistics is an area of applied mathematics concerned with data collection, analysis, interpretation and presentation. What this means is that you are going to analyze understand all the data that you've collected and how you're going to present it. So for example, your company has created a new drug that may cure cancer. How would you conduct a test to confirm the drug's effectiveness? What are you going to do? You are basically going to collect a large amount of people around. You're going to take all of those people. You're going to, you know, give them the basically the new drug that you have created. And then you're going to interpret all the results that you get from it. And then you're going to present that out of let's say 1,000 people around 900 of them were cured or 950 were cured. So all of those parts come under statistics. Okay. So I hope you've understood that. What are the terminologies that you need to understand when you work with statistics? There is population and there is sample. So what is population? It is the collection or a set of individuals or objects whose properties are to be analyzed. All the data that you have collected is called as the population. So out of all the data that you've collected, you take out a set of data for your analysis. That set of data is called a sample. So what is a sample? A subset of the population is called as a sample. So a well-chosen sample will contain most of the information about the population parameter. So how do you choose the sample? the sample needs to be chosen such that all the data that the population is trying to convey can be conveyed through the sample itself. So that is what is population and sample. Now that we've understood the basic terminologies, let's look at some of the sampling techniques that we have in statistics. So sampling can be classified as probabilistic and non-robabilistic type of approaches. So in probabilistic sampling we have random sampling, systematic sampling and satisfied sampling. In non-probabilistic one we have snowball, kota, judgment, convenience and all of that. Okay. So, but for machine learning we do not need to take a look at the non-probabilistic approach. We just need the probabilistic approach. So, we will be now covering about random sampling, systematic sampling and stratified sampling. So, let's go ahead with that. So, what is random sampling? You take out a sample randomly out of the population. That is basically what is random sampling. each member of the population has an equal chance of being selected in the sample. So that is what is random sampling. So what is systematic sampling? Systematic sampling is basically you follow a particular order out of which how the sample has to be taken. As you can see here I have six particular groups. Now from this six particular groups I'm going to take all the even ones. Right? So that'll be 2 4 and six. That is basically how the systematic sampling works. I follow a system. I follow an order of how I'm going to take the sample. So once we are done with that, what is stratified sampling? What is a stratum? First of all, a straight is basically a subset of the population that shares at least one common characteristic which is the gender which I have taken over here. Here so as you can see here I have the whole population out of which I have found out one property that is common between all of them that is gender. They are either male or female. So I'm going to break the whole population based on that. I'm going to break my population into the male subset and the female subset. So after that I'm going to apply the random sampling and I'm going to take out all the samples that I am going to be needing. Okay. So once I have now broken down my particular subsets now I'm just going to be using random sampling and take out all the samples that are required by me. So this is what is stratified sampling. So what are the types of statistics that we need to infer from? From we have descriptive statistics and inferial statistics. So let's understand each of these right now. Okay. So what is descriptive statistics? Descriptive statistics is mainly used to focus upon the main characteristics of the data. It provides the graphical summary of the data. So as you can see over there descriptive statistics uses the data to

### Data Life Cycle [1:33:28]

provide descriptions of the population either through numbers or calculations or graphs or tables. Okay. So as you can see here I have my shirt and there can be either three particular sizes. There can be the most minimum one, the most average one and the most maximum one. And all of that is based on the weight of the person. Suppose for example if a person weighs 60 he's going to get the maximum one. If the person weighs 40 he's going to get the average one. The person weighs 20 he's going to get the minimum one. So that is how I am describing my data and I'm going to infer data from it. So that is descriptive statistics. So what is inferial statistics? Inferial statistics makes inferences and predictions about a population based on the sample that is being taken from it. So what this means is basically I'm going to generalize a large data set and apply probability and draw a conclusion to it. So it allows me to you know infer data parameters based on the statistical models which is using the sample data. I have a whole bunch of population out of which they can tell me they have either the large size or the medium size or the small size. So according to their sizes I'm going to supply them the t-shirts. So this is basically what are the two types of statistics. There are descriptive statistics and inferial statistics. Descriptive statistics what does it do? It describes the data. Whereas inferial statistics takes information from the data and then analyzes accordingly to however we want it to. So these are the two types of statistics. Let's go in depth with descriptive statistics. What is descriptive statistics? It is used to describe and understand the features of a specific set of data. So what is descriptive statistics? It is a method used to describe and understand the features of a specific data set by giving short summaries about the samples and measures of the data. In descriptive statistics, it is broken down into the measures of center and the measures of variability. So what does measures of center have? Measures of center has the mean, median and the mode. Whereas the measures of center have the range, interquartile range, variance and standard deviation. So let's understand all of this right now. So for example here you can see that I have a data set of a car which contains the variables the car the mileage the cylinder type the displacement the horsepower and the real ratio. So I'm going to find out the mean. So what is a mean? Mean is the average of all the samples. So for example, if I want to find the mean of the horsepower, I'm going to just add the values and divide it by the number of values. So I get 103. 625. This is the mean of the horsepower. So what is the median? It is the measure of the central value of the sample set. That is what is a median median. So let me show you how to find it for the miles per gallon. To find the order of the center value, you have to basically arrange it in the ascending order. Okay. So you have 21 21. 3 22. 8 23 and accordingly. Now I have to take the two middle values. Okay. So that'll be 22. 8 8 + 23 divided by 2 that becomes 22. 9. So that is the median of my sample. Next let's move over to mode. The value which is most recurrent in the sample set is called as the mode. So let me find the mode for the cylinder type. What is most recurring? The six stroke or the four stroke. I can see here that there are 1 2 3 4 5. So there are five

### Statistics and Probability [1:37:03]

times that the cylinder type of six stroke has been repeated and three times cylinder type four stroke has been repeated. So six becomes the mode of the data. Okay. So now that we have understood all that is required from measures of center. Let's move over to the measures of spread. Right? So a measure of spread is also called as a measure of dispersion is used to describe the variability in the sample or the population. Okay. So the first one is a range. Range is the given measure of how spread apart the values are in a data set. So it is the maximum value minus the minimum value which gives us the range. So what is an interquartile range? Quartiles tell us about the spread of the data by breaking the data into many quarters just like the median breaks itself. So for example, if I have eight datas over here, I can break it down into three quarters where I have 1 2 then I have 3 4 5 6 and 7 8 which become different samples accordingly. Okay, how do I find what is the interquartile range? So how will I find the intercortile range of this? Okay, so the first quartile Q1 comes between the 25th and the 26th. So it'll be 45 + 45 divid by 2 that becomes 45. The second one comes under 50 and 51st and that will be 58 + 59 by 2 that will be 58. 5. And the next one comes at 75 and 76 that is 71 + 71 divid by 2 which is always 71. So that is basically what is an intercortile range. It is the measure of variability based on the dividing set into the cortiles. So that is basically how you break down the intercortile range and it is the measure of variability based on dividing the set into quartiles. So cortiles divide a rankordered data set into four equal parts which would be Q1, Q2 and Q3 respectively and the intercortile range is measured by Q3 minus Q1. So if you can see this you have the whole data which is 100% and you have broken down the data into 25% each. So the cortiles come in between the first Q1, Q2 and Q3. So those are the intercortile ranges. So now that we've understood that, let's move over to variance. So variance describes how much a random variable differs from its expected value. It entails the computing squares of the deviations. So that is how you find the variance which is s² is equal to the summation of 1 to n is equal to 1 x i - xr the whole square n. So x i is the value and xar is the mean. I hope that's simple for you to understand. What does a variance mean? It means the difference between the data which it should have actually been. So what does the variance actually try to tell you? Okay, it basically means that the data which should have been something and what it really is. What is the difference between them? How much it differs from each other. So that is what is the variance. So what is a standard deviation? Deviation is the difference between each element from the mean. So I have XI and I have the mean. What is the difference between each of them is the deviation. So what is the population variance? The population variance is given by this particular formula that is sigma square is equal to 1 by N. The whole summation of I= 1 to N is equal to X I minus mu the whole square. Once we have found out the population variance, what is the sample variance? So it is s²= 1 / nus1 the summation of i = 1 to n and x i - xar the whole square that is the sample variance. So what is the standard deviation? Standard deviation is the measure of the dispersion of a set of data from its mean. What the standard deviation is trying to tell you is basically that how much this data is trying to disperse from its actual data. So how is that given by? It is given by 1 / n summation of i = 1 to n x i minus mu the whole square which is basically the variance is square root. So that is standard deviation. So how do you find the standard deviation? I'm just going to give you an example over here. So say for example Daenerys has 20 dragons and they have the numbers accordingly. So how do you work out the standard deviation? So the mean is given by adding all the numbers dividing it by 20. So mu is equal to 7 and then what is the standard deviation? So it'll be x i - mu the whole square. So it'll be 9 - 7 the whole square is equal to 2 square is equal to 4. Then you have 2 - 7 you know all accordingly. You keep on going and you get the following results which is 4 25 4 9 25 0 accordingly. Once you have done that put that into the formula. So you have this formula accordingly. So we get the variance squared is equal to 8. 99 and the standard deviation will just be the square root of 8. 9 which will be 2. 983. What this means is that with a value of 2. 983 the values are differing from the mean. So if you are trying to add 2. 983 to all the other values, you will be getting something which is much more similar and much more realistic and much more, you know, closer to the mean of the So that is how standard deviation works. Now that we've understood all of that, we have a very classic example of information gain. So what does information gain help us with? This is an ideology that comes into picture for decision trees. So let's understand the basic concepts of what we require in information gain. So the first topic that we need to understand is entropy. So what is entropy? Entropy measures the impurity or uncertaintity that is present in a data and it is given by the formula h of s is equal to minus summation of minus1 ^ of n p of i log base to p of i where s is all the instances in the data set. N is the number of the distinct class values and P of I is the event probability because entropy is the impurity that a data has. So what is information gain? Information gain is how much information a particular feature will give to the final outcome. So that is given by gain of a of S is equal to H of S which is the entropy minus summation of J is equal to 1 to V. S of J divided by J into H of S of J is equal to H of S minus H of A by S. H of S is the entropy of the whole data set. S of J is the number of instances with J and a value of attribute A. S is the total number of data sets. B is the distinct value of attribute A. H is the entropy of the data set with value A and entropy of attribute A. So these are all the requirements. So let me give you an example of information gain. You can see the data set over here which is for 14 days and the particular you know attributes that are given to it. So the forecast is to find out whether the match will be played or not according to the weather conditions. So this is what we are trying to find here. We have five nos and nine yeses. So we have the outlook and then we have a sunny, overcast and rainy. So in sunny we have three nos and two yeses. In overcast we have four yeses and zero nos. And in rainy we have three yes and two nos. So how are we going to find out which particular node is going to you know attribute the most to it. So let's find out the entropy first for the whole data which is nine instances see yes and five instances say no. So we have - 9x4 log base 2 of 9x4 - 5x4 log 2 of 5x4 is equal to 0. 940. So what this means is that the data that we have has 0. 940 * bad data means it cannot give us much information. There is much more impurity in the data. So let's move over. The first step in information gain is to find the root variable. So how are we going to find the root variable? We're going to take all of the attributes and find the information gain from them. We have outlook, we have windy, humidity and temperature. So let's find out the information gain. So for information gain of windy we have six instances true and 8 instance is false. Right? So h of s is equal to 0. 940 minus 8 into 14 and as you can see all the other variables. So the gain that we have achieved from windy is 0. 048 048 which is really less. So for outlook the same thing happens over here. We have the information gain that is being passed by it is 0. 247. Then we have the information gain from humidity that is 0. 15 which is also really good. And then we have temperature which is 0. 029. You can pause and look at the steps if you want to. Out of all the gains that we have the variable with the highest information gain is used to split the data. So outlook has 0. 247. Windy has 0. 048. Gain has 0. 151 and gain of temperature is equal to 0. 029. So what are we going to choose? We're going to choose outlook. Why is that? That it's because it has the highest information gain. So that becomes our root node. That is a simple example of how statistics is used in the information gain. Next, let's move over to the confusion matrix. So what is a confusion matrix? Confusion matrix is a table that is often used to describe the performance of a classification model. So it is a table that is used to describe the performance of a classification model on the set of test data. So how are you going to calculate the accuracy of this? It'll be true positives plus two negatives divided by true positive true negative false positive false negatives. So if you are confused with what is true positive, true negative, false positives, false negatives, let me explain that to you right now. Let's say for example there are two possible predictor classes which are yes or no. The classifier made a total of 165 predictions. That means 165 predictions were total. So out of the 165 cases, the classifier predicted yes 110 times and no 55 times. So in reality 105 patients only had the disease and 60 patients did not. So how are we going to put this into the table? So as you can see here in the table we have n is equal to 165 then we have predicted no and predicted yes actual no and actually yes. So if our classifier predicted it no and it was actually no that is the correct output. If it predicted yes and it was actually yes that is also the correct output. But it predicted it as yes and it was actually no. That means it has made a mistake. And it predicted no but it was actually yes. It has still made a mistake. What are the correct values over here? Which is 50 and 100. So 50 and 100 are the correct values and 10 and five are the wrong values. What is a true positive? True positive are which it was predicted as yes and they actually have the disease. True negatives are predicted no and they did not have the disease. False positives are classifier predicted it as yes but they did not have the disease and false negatives we predicted no but they also had the disease. So this is basically what a confusion matrix is and it is helpful in finding out the performance of the classifier that you are working with. So that is what is a confusion matrix. Next comes inferial statistics. It is a method where we infer and understand what the data is trying to communicate with us. So it is broken down into two points. It is a point estimation and interval estimation. So let's understand about point estimation. Point estimation is concerned with the use of a sample data to measure a single value which serves as an approximate value or the best estimate of an unknown population parameter. So you have such a huge population of India, right? Out of that you're just going to randomly take out some sample of it. you're going to find out the sample mean and this sample should be so good enough that it is going to estimate the whole mean of the population. So this is how you use point estimation and try to approximate it for the whole you know population. So what are the different methods we have methods of moment methods of likelihood base estimator and best unbiased estimators. So estimates are found out by equating the first k sample moments to the corresponding k population moments which is the method of moments. And then we have the maximum of likelihood which uses a model to maximize a likelihood function. Base estimator minimizes the average risk and it is an expectation of all the random variables. And then we have the best unbiased estimators which is used to find out the best depending parameter. So that is all that we require from point estimation. Let's move over to interval estimates. So an interval or a range of values is used to estimate the population parameters. So we have a point estimate and we have an interval which gives us the low confidence and the high confidence limit. How is this going to happen? There are three things that you need to remember. There is something called as the confidence interval and it is the measure of your confidence that the interval estimate contains the population mean meu. So statisticians use a confidence interval to describe the amount of uncertaintity associated with a sample. Technically a range of values are so constructed that it is specified probability of including the true value of a parameter within it. So for example let's take a sample where we have 10,000. So let's say that we have something called as a supermarket. So out in that supermarket we have collected a lot of data and in that data I'm saying that this particular items okay so item one and item two if they both are kept together they are going to be sold together and I say that with 90% confidence so that 90% confidence okay is what is my confidence interval it can be in between this range it can be between 90 to 95% and it is highly likely that particular thing is going to happen. So that is my confidence interval. So what is the sampling error? It is the difference between the point estimate and the actual population parameter which is the sampling error. Okay. So when mu is estimated the sampling error is mu minus xar. It is mean minus the point. What is the margin of error? So for a given level of confidence, it is the greatest possible distance between the point estimate and the value of the parameter it is estimating. What this basically means is that it is the greatest possible distance between what you have predicted and what it actually is and what level of confidence you are giving to it. Okay. So that is the margin of error. How much error you can allow for the predicted model is what is the margin of error. So I hope you have understood that. So the level of confidence is the probability that the interval estimate contains the population parameter. So as you can see here all the parameters that come under C right are what we are allowing okay that is the margin of error that is allowed but whatever comes in minus Z C and plus Z C we are not going to take that those are the interval estimates so finding these intervals is what is our probability so finding these interval estimates is what is our work over here okay so for example if the level of confidence is 90% %. This means that you are 90% confident that the interval contains the population mean. 0. 05 and 0. 05 of minus Z C and plus Z C is equal to + 1. 645 and minus 1. 645. Those are the zed scores. Anything minus or plus of 1. 645 is not allowed. So those are the values that we are concerned with when we are working with interval estimates. So now that we have understood everything we needed to from inferial statistics and descriptive statistics, let's move over to hypothesis testing. So what is a hypothesis? First of all, it is some event that you have made up which may have the probability of happening or not happening. Testing that hypothesis is what is hypothesis testing is you know formally checking whether this hypothesis is accepted or rejected. So how is the hypothesis testing conducted? You first state the hypothesis and then you formulate an analysis plan of how you are going to you know test this particular hypothesis. Then you get all the output. You analyze the data that you've got from all of it. So then you understand whether this particular hypothesis has failed or it is a good hypothesis. Okay. Suppose I have four boys over here which are Nick, John, Bob and Harry. So they were caught doing mischief in the class and they have to now serve detention for almost 2 months. Okay. The detention is basically they have to clean the classrooms. So John over here comes up with an idea saying that I'm going to write all the names of ours into chits and then put them into a bowl. Whoseever name comes out has to clean the classroom. So what happens here that we are going to assume that the event is free of bias. So our hypothesis is what is the probability of John not cheating? Let's find out what happens. Actually the probability of John not being picked up for a day is 3x4. Now the probability just keeps increasing on and on if he is not picked up for 3 days. Okay. So it is 3x4 into 3x4 is equal to 0. 42 approximately. And what is the probability of John not being picked up for 12 days? It is 3x 4 into 12 is equal to 0. 0322 which is equal to 3. 2% which is less than 0. 05 which is 5%. So what this means is that our hypothesis was that John is not cheating but actually John is cheating over here because it has come down and it has come below the threshold value of 5% which is 3. 2%. So now all of us have come to know that John is cheating because he has not even written his name in the chits. So this is how the hypothesis works. So what is the null hypothesis? The hypothesis that we have created in the beginning. Okay, that John was not cheating. That was a null hypothesis. It has no result which is different from the assumption. And we have an alternate hypothesis which means that our results disapprove the assumption. We had a hypothesis that John is not cheating. But actually through our results we found out that John is cheating. So this is what is the hypothesis testing and we have a threshold value. If the probability that we are testing goes below this threshold value it means that the particular hypothesis has failed. So this is what is the hypothesis testing. So now we need to know something about the P and T values. Okay. What is values? The p value and the t value help us in our hypothesis testing. Okay, we want to find the height of students who are greater than 5 ft and 7 in. Okay, so we take a sample of 100 students and find that the mean height was 5 ft and 9 in. Okay, we make a hypothesis that out of all the 100 students, there is going to be at least six students who are greater than 5 ft and 9 in. So this is our p value. It is the probability value that we are saying that this particular hypothesis is going to be correct. We say that at least out of 100 times six students will at least have a height which is greater than 5 ft and 9 in. Okay. So this is the p value. This is the probability value that we have given to our hypothesis. The t value is testing this particular hypothesis and finding the difference that we have found out from our assumption and what we have actually calculated from our results. So this is how the p value and the t value help in making the null and alternate hypothesis. You know you find the result of the null hypothesis and the alternate hypothesis. So this is how the p and the t values help in making and observing the results from the null and alternate hypothesis. Okay. Let me show you a code where we are going to find the mean median mode accordingly and how that becomes helpful to us. Okay. So let me go back to the presentation mode. I have my data over here which is this particular data. So I'm going to import statistics as S and I'm going to just find the mean, median, mode of it. So let me just uh run this program for now. So as you can see here the mean is 33. 43 and the median is 3 and the mode is 2. two has been repeated the most number of times. So that is the reason it is the mode and the variance is 5124 and the standard deviation is 73. That is how the mean median mode is calculated. Let me run the program now so that you can understand what we are trying to do. So let me close this out. What's happening over here is I have the iris data which is the load iris and then I am creating it into a data frame. Okay, simple enough. After that I am saying that the species is the target and the data of species is equal to I'm going to just apply all the lambda parts of it. Okay. And then I am going to get the description of the data. And after that what I'm going to do is I'm going to take this data and I'm going to pair plot it. And then I'm going to show the plot to you. So let me run the program. Let me show you the figure first. So I had four features over here, right? So that was the petal length, petal width, sele width. So if you can take a look at this data, you can see that there is a normal distribution here. There's a normal distribution here also. over here with some exclusions over here. And there is a normal distribution here too but with some exclusions here. So looking at this I will be able to understand if I want to you know apply the gian nbased classifier or accordingly to however I wanted to. So this is how statistics has helped me to understand what is my data looking like. Now let me go back over here. So as you can see this is the count of 150. Right? Simple enough. Then what is the mean? What is the standard deviation? What is the maximum length and maximum width that you can find over here. All of this is done using the describe part of it. Okay. And describe is a function that is already available in pandas. So that was a very simple example of how I was able to understand how I have to use the gian navebased classifier for my iris data which I showed in the probability part. Right now we are done with this introduction to data science. Now what exactly is data science? Now data science in simple terms is the process of deriving useful insights from data in order to solve real world problems or in order to grow a business. Now data science was introduced because we are generating an immeasurable amount of data. For example, there's a fact that we're generating more than 2. 5 quintillion bytes of data every day and at this pace it's only going to grow because right now everything runs on data. So the idea behind data science is to take up all this data and you know derive useful insights or derive knowledgeable insights from this data so that you can grow your business or you can solve a problem. That's what data science is all about. The data is the key in data science and since we are producing so much data is the perfect time for you to learn data science and data science basically covers artificial intelligence, machine learning, natural language processing. All of these processes are covered under data science. So that is one of the reasons why data science has become so popular because of the amount of data we're generating. We need methods and technologies that can handle so much data and they can you know derive something useful from the data. So that's a small introduction to data science and if you guys want to learn more about data science I'll leave a couple of links in the description box. Now let's move ahead to our topic of discussion. First we'll understand why we need SQL for data science. So like I said, data science is basically all about deriving useful insights from data. Data science involves extracting, processing and analyzing tons and tons of data. At present, what we need are tools that can be used to store and manage this vast amount of data. Now this is where SQL comes in. SQL can be used to store. It can be used to access and extract massive amounts of data in order to carry out the whole data science process more smoothly. So SQL as a querying language, it can be used to perform a lot of quering operations, a lot of search operations, extractions and editing and modifying your data. So we need a huge management system and along with that we need a language that can perform all the operations that we want to do on our data. That's where SQL comes in. Right now, let's move on and understand what exactly SQL is. Now, SQL first of all stands for structured query language, which is a quering language aimed to manage relational databases. But what exactly is a relational database? Guys, I'm just going to go through the basics here because I'm assuming that you have a good idea about SQL because this is a more advanced topic which is SQL for data science specifically. Right. So I'll just brush up a couple of things about SQL, the basic commands and a few other topics and then we'll move on to our interesting demo session. So what exactly is a relational database? Now a relational database is a group of welldefined tables from which data can be accessed or it can be edited and updated. You don't have to alter the database tables. Now that's an important point about a relational database. Now SQL is a standard API for relational databases. So SQL programming can be used to perform multiple actions on data such as quering, inserting, updating, deleting, extracting and so on. So basically it can be used to manipulate and analyze all your data in such a way that you derive something useful from data. Examples of relational database that use SQL include MySQL database, Oracle and so on. So guys, if you want to go in depth on SQL and if you're new to SQL, then I'll leave a couple of links in the description box and you can go through those. Like I said that this tutorial is mainly for SQL for data science and I will not be covering SQL because that is out of the scope of this session. So like I said, we'll be using MySQL today and let's understand why I've chosen MySQL. Now first of all, MySQL is very easy to use. You have to get only the basic knowledge of SQL, right? You can build and interact with MySQL with just a few simple SQL statements, right? And SQL statements are quite easy, like they're a lot like your English language. I feel like it's the most basic and the most understandable language there is quering language. Apart from this, it is also very secure. Actually, MySQL consists of you can say a solid data security layer which will protect all of your sensitive data or your confidential data from any intruders. Passwords are encrypted in MySQL. So, that's a good advantage of using MySQL. Apart from that, of course, it's open source, so it's free to download and to use. You can just go to the official website and download it in a matter of minutes. Then, of course, it is scalable as well, right? It can handle almost any amount of data, right? It supports large databases up to as much as 50 million rows or more, right? That's the amount of data that you can store in MySQL. And the default size limit is about 4GB if I'm not wrong. And you can also increase this number to a theoretical limit of around 8 TB of data. I think you'll have to pay a little bit of money for that. But I feel like 4GB is a lot first of all to have a data set of that size. Then another important point is that MySQL follows a client server architecture. So this is where you basically have your database server which is your MySQL and you have many applications and programs as your clients right and these communicate with the server. This is where they query data, they save changes, they update the data and all of that. Not only this, MySQL is compatible on many operating systems. It is easy to run on a lot of operating systems such as Windows, Linux, Unix and so on. Now, MySQL actually provides a functionality that the clients can run on the same computer as a server or they can even run on another computer. So, basically communication via a local network or the internet. Another important point is that there are quite a number of APIs and libraries for the development of MySQL applications. So basically it has support for multiple programming interfaces right so client programming you can use languages like C, C++, Java, Pearl, PHP, Python and so on right all these languages are easily compatible. They provide APIs so that you can integrate and you can perform or you can build applications using these languages. And since Python is one of the best languages for data science, MySQL is perfect for data analysis, storing data and quering data. You can easily work with MySQL and Python to build applications and query data and all of that. Right? Apart from that, it is also customizable like I said and it's also platform independent. Right? It's not only your client applications that can run under a variety of operating systems. But MySQL itself can be executed under a number of operating systems. The most important ones are your Mac operating systems, Linux, Microsoft Windows and so on. Apart from that we also have speed right? MySQL is considered a very fast database program and this speed is actually being backed up by a large number of benchmark tests and apart from that it's highly productive because it uses triggers it uses stored procedures and views and this basically allows the developer to give a higher productivity. So these are a few reasons why you should go with MySQL, right? I feel it's one of the most easiest to use and compatible databases that are there. Now let's move on and discuss a couple of basics about MySQL. So we'll discuss the data types which come under MySQL. We have numeric, we have character string, we have bit string, boolean, date and time stamp and interval. Right? Under numeric it includes integers of various sizes. There is a floating point of various precisions and formatted numbers under character strings. These data types either have a fixed or they have a varying number of characters. Now this data type also has a variable length string which is called character large object which is used to specify columns that have large text values. So it supports large text values as well. Apart from that there is bit string. Now these data types are either of a fixed length or varying length of bits. Now there is also a variable length uh bit string data type which is called binary large object. Now this is available to specify columns that have very large binary values such as you know maybe images and so on. Then we have the boolean data type. Now this data type as the name suggests itself it has true or false values. And since SQL has null values, a three valued logic is used which is unknown. That's the third value. Then we have date and time. So the date and time is like any other date variable. The date data type has year, month and day in the normal date form. And similarly the time data type has components like hour, minute and second. Now obviously these formats can be changed based on your requirement. Next we have the time stamp and interval data type. So the time stamp data type includes a minimum of six positions for decimal fractions of seconds and an optional with time zone qualifier in addition to your date and time fields. Right? So the interval data type will basically mention a relative value that can later be used to increment or you know decrement an absolute value of time or date or any of that sort or basically a time stamp. Right? So you can increment or decrement any absolute value using the interval data type. So guys, this was a little bit information about the different data types that are there. Now let's discuss a little bit of basics of SQL. Uh like I said, I'm just going to brush up a couple of topics because I'm hoping that you all have a good understanding of the quering language. So let's just brush up a little bit. So first of all, we have the first command which is create database. Now this is a very general purpose command, right? It basically creates a new database for you. So the syntax is just create database and the name of your database. Then once you create it, in order to initialize and use it, you just use the command use and the name of the database that you created. And remember that at the end of each command, you have a terminator which is a semicolon. And also your commands are usually written in capital letters so that you can differentiate your commands from your table columns or table names and all of that. So it's a good practice if you write all your commands in capital letters. Next we have create table. Now tables are the most important part of a database. Right? So create table is a simple command that will create a new table for you and it can contain a lot of data variables of different data types. Right? So the syntax is simple. The command is create table the name of the table you want to create and then within the table whatever variables you want you can mention the name of the variables with their respective data types like for example if your variable is age then your data type will be an integer or something like that similarly you can add another variable like name and you can have the data type as a character so it's as simple as that it's very understandable the language and then let's talk about insert into Now this command is used to insert new data into your table. Now usually what happens is when people insert values into the table, they forget the data type that they've defined under that particular variable. So remember that the values that are inserted must align with the defined data types. So for example, if your variable is age and instead of putting your numerical three, you've put your character three. Basically, you've typed out t h r e e, right? that's not going to work because you already defined your age variable as integer. So you'll get an error there. So make sure that the values that you're inserting into your variable are of the same data type as you've defined your variable as. Okay? So make sure that happens. Now these are very simple things guys. I'm sure you guys are aware of this. Now let's look at our next command which is select. So this is one of the most important commands when it comes to SQL for data science and because mainly it's all about extracting useful insights and extracting particular type of data from your database. Right? We'll see how often we'll use this command when we perform our demo today. This is one of the most simplest and in fact it's one of the most important commands in SQL. Okay. So basically select will select a specified table or a column and it'll extract the values from it. Okay. So select star from and table name is your syntax. Now we'll be using this in the demo. So don't worry if you do not know exactly what this does. Just remember that your select command is for extracting data from your table. Next we have the update command. So update will basically allow you to modify any values that are stored in your table. And the wear clause here will select the variable or value that you want to change. So it's basically it'll highlight the variable or the value or it'll try to identify the value that you want to change. Next we have the delete command. Now delete will basically delete data from your data set or from your table as the name itself suggests. So delete from the name of your table and where with a condition. It's as simple as that. Next, we have drop table. So, this command is basically used to delete a table and all the rows in the table. So, your table will get deleted from your database. That's what happens with drop table. These were a couple of commands that I thought you should brush up on. I'm sure you guys are already aware of these commands. But in case there are a few of you who don't know much about SQL, I'll leave a couple of links in the description box. you can go through those videos and then maybe come back to this if you're specifically looking for SQL for data science. Now let's get started with our demo which is quite interesting. So guys for this demo I'll be using the MySQL workbench. So it's quite easy to install. It'll just take like 15 to 20 minutes. I'll leave a link in the description box. We have a short video wherein we're showing you how to install the whole workbench. So once you've installed MySQL, we'll start by creating a database and we'll import a data set into our workbench. So I'm going to import a existing data set. Right? I'm doing this because we are doing a more advanced tutorial. This is not about creating tables and you know extracting values from tables. Instead this is about deriving useful insight from your data. So usually for data science, you don't have to sit down and create tables and you know do those basic commands. Instead, you have to explore the data variables. So, usually there's a humongous data set that needs to be explored and analyzed in order to derive something useful from it. Right? So, that's what we're going to do today. So, I've already imported a CSV file. Let me show you what the file looks like. Right? So, this is our CSV file and this basically contains details about employees. So basically the name of the employee title is the job title of the employee. Department is the department that the employee is working in. Then we have the annual salary. We have the hiring date. We have the start date in the present position. We have the salary. And we have the employment category whether you're full-time employee or whether you're part-time employee. So our data set has around 32,000 observations. Right? So this is a very huge data set and I basically downloaded from the internet. You can find a lot of data sets on the internet and you can perform all of the data analysis in no time. So if you don't know how to import a data set in your MySQL workbench and I'll quickly show you how it's done. So I've already created a database here. Now creating a database is as simple as this, right? You just have to do create database and name of the database. Let's say tutorial and then if you want to use it you just use the command use tutorial right so basically tutorial database gets activated now all you have to do to run is you have to use this right use tutorial so we've basically activated our database now for this tutorial I've already created a database so I'm going to use that database itself name of my database ASE is students right so let's activate this database right now in this I have a couple of tables and we are going to be focusing on this table employee details so this is the table that I imported so guys for those of you who don't know how to import a CSV file into your MySQL workbench all you have to do is go to tables right click on any table and go to table data import wizard Right. Click on this and browse for your CSV file. Okay. Just to take an example, I'll consider this CSV file. I'm not going to import the CSV file that we're going to be using right now because that is a very huge CSV file and it'll take a lot of time for me to import that CSV file into the workbench. To save up on some time, I'll import a small CSV file and I'll show you all. Right. So I'm selecting this file students marks. Let's open it. Right. And go to next. After that create a new table. Now basically what you're doing here is you're going to create a new table that corresponds to all the columns that are there in your CSV file. Either you can create a table on your own and click on use existing table or you can allow MySQL workbench itself to create a new table for you. That's what I'm going to do because obviously it's simpler. Then click on next. So this is basically all the fields that are there in my CSV file. It'll collect all the fields in my CSV file and it'll create a table out of it. Right? This is my CSV file. Basically it has name, gender, date of birth, math, physics, chemistry and all of that. These are just records of students and their marks. Then if I click on next and next again. Yeah, data gets imported. Next five records are imported. Click on finish and refresh. Here you'll see student marks. So this was the CSV file that I just imported into MySQL workbench. So guys, that's how you import a CSV file as a table into your MySQL workbench. So if you want to see this table right the CSV file that we imported as a table all you have to do is select star from and the name of the table it is student marks. So let's just run this. So here you can basically see the entire CSV file that we imported into a table. So guys, that's how simple it is to import a CSV file into your MySQL workbench in order to perform data analysis. So now let's clear this up. So for this demo, like I said, we'll be using employee details data set. This is a very huge data set and let's just get started. I've already imported the data set here as a table, right? If you take a look at this employee details. So these are the columns in my table. We have name of the employee. Title is the job title of the employee. Department is the department he works in. Salary, annual salary. Then we have hiring date, start date, salary basics, and we have employment [clears throat] category. All right. So let's start by performing some sort of data analysis. So first what we'll do is let's just view the entire table that we have. For that all you have to do is select star from the name of the table. Employee details is the name of the table in my case. So I'll just run this command. Right? So this is the entire table that just got loaded. Right? So this will just extract your entire table for you since I did select star from and name of the table. So my entire table is extracted here. Right? If you take a look at this, the whole table is being extracted here. Right? Now, let's do something a little more complex. Now, let's write a query to find the salaries of all the employees. For that, what you have to do is select again. Now, name of the column that has the salary. Okay, in our case, it is salary annual, right? So, let's write that down. Salary annual right from the name of your table employee details. Okay, so let's run this. So this query basically just gave me the salary column in my table, right? Salary annual. This is the salary of all the employees that are there, right? Now let's write a query to display the unique designations for the employees. Basically unique job roles will display. So for that again we have select because we're extracting. Then we have a keyword called distinct. Since we are trying to display the unique designations, it'll just take uh the designations that are unique. It won't take your repetitive designations. For example, there might be 10 employees who are working as data analysts. So we'll just consider that as one. So we're just extracting the unique designations for the employees. That's why we have the word distinct here. Next, our job title is in the variable title. So that's why I've written title here. And then we have from and the name of your table. Right? Let's run this command. So here you can see that there are different job titles in our data set. We have surgeent, we have police officer, we have chief contractor, expeditor, we have civil engineer, concrete laborer, we have traffic controller, pool motor, this is police officer and assigned as detective. Wow, that's interesting, right? So basically this query gave us the unique job roles that are there in our data set. Now let's try something else. Let's write a query to display the unique departments with their jobs. So let's extract that. So select again since we're trying the unique we're going to use a distinct keyword and we are going to extract the unique departments and the job titles and from the name of the table. So let's run this code. Now under police department we have different job titles. That's why this is repeated here. And we have police officer. Then there is fleet and facility management under which we have a chief contractor expeditor. Underwater management department. We have a civil engineer and so on. Right? So guys this query was pretty simple. It just gave us the distinct departments according to their job titles. Now let's perform another query. We'll write a query to list the employees who are working in a particular department. Let's say the fire department. Again, since we're extracting, we'll use the keyword select. After that, select star from and the name of your table where. So, this is where the where clause comes, right? We're trying to identify a particular department. So we'll put where department in the name of the department that we're looking for. Let's say fire department. That sounds interesting. So let's run this. All right. So here you can see that we've just got employee details of all the employees that are there in the fire department. These are their salaries. These are the dates that they were hired on. And apart from that, this is the employment category. Full-time, part-time, salary basis is whether you're getting paid hourly or on a daily basis. So these are the details of all the employees that are working in the fire department. If you see that here, it's entirely fire department. So that was also quite simple. So this is how SQL is. The language is very simple because it's pretty understandable like you can read the statement and you can understand what exactly I'm trying to extract or do. Next, let's try to do something interesting. Let's say that we want to write a query to list the employees who do not belong to a particular department. Let's say that we're trying to list out the employee details of employees which are not in the police department. So for that it's quite similar to our previous query. So you write select star from name of your table again where department not in that's the only difference okay not in and the name of the department that you want to exclude. So this time let's say police department. So we want only details of employees that are not there in the police department. So all the departments here will have everything apart from your police department. Okay. So we have fire, we have water management, we have law, we have streets and sand, we have finance department, we have fleet and facility management and all of that. So basically we excluded our police department and printed out the details of all the employees which are in every other department apart from the police. All right. So now let's write a query to list the employees who joined before a particular date. So for that again we'll start from select star from name of your table right and we'll add the where clause and here we'll mention our original hiring date since that's the name of the variable that has the higher dates in it. Let's keep it less than a particular date. Okay. Let's choose 1st of January 2000. Let's run this code. Right. So when you run this code, you'll find all the employees that were hired before 2000. So if you see original hire date, all of this is before 2000. Some of them were in 1994, 1996, 1984, and 1987. So there are only four employees who were hired before the year 2000 basically before the 1st of January 2000. Now let's try another query. Let's write a query that will display the average salaries of all the employees who work at a particular department. Okay, let's choose the surgeent department. So basically we're going to display the average salaries of all the employees who are working as surgeons. So for that again it starts from select. Since we have to calculate the average salaries, we'll use a function called average which is a predefined function in SQL. And average of which column? It is a salary column. Salary annual. And let's mention the name of the table from employee detail. Some error. Let's write down the whole statement and then we'll see what the error is. This and we're going to select the job title as a surgeent. Basically the average salary of a surgeent. Let's run this first and we'll find out what the error is. Okay, there is no error. I have no idea why I'm getting this. So the average salary is around 1 lak 6,000 and change right for a surgeent. So now let's try another query. What we can do is we'll write a query to display the details of a particular employee. Some name we'll select at random. Okay. So select star from name of your table employee details where name is equal to let's look for a particular name Ahmed Sed okay so this is the name of the employee let's paste that name over here right hopefully this should give us an output let's run this so now we have the details of this particular ular employee who is Ahmed Syad, his title, job title, his department, his salary, all the details about him. So we just printed a particular employee. Let's write a query to list the employees whose salary is let's say more than 3,000 after giving a 25% increment to their salary. Now let's do a little bit of math. So we list down employees whose salary is more than 3,000 after giving them a 25% increment. So let's write down a query for that. So again start off with select star from the name of your table where here we'll specify the condition basically after giving them a 25% increment 1. 25 into the salary. So salary annual and after giving them a 25% increment their salary should be at least 3,000. All right. So this is exactly what we're doing here. Let's run this code and let's see if we're getting anything. So here are the employees whose salary is more than $3,000 after giving them a 25% raise. So basically these guys are promoted. This is a list of all the employees. So that's quite simple guys. It's basic math implemented in one line of code. That's how simple SQL is. And a lot of data scientists require this kind of tool because quering even your complicated math and even complicated things from your database becomes very easy when you make use of SQL languages. It becomes extremely easy to even do the complex things. Now I just did this in one line. So now let's try something else. Let's write a query to list the employees whose salary is less than a particular number. Okay, let's say less than $3,500. So let's write a query basically for employees whose salary is less than 3,000 or 3,500 where salary annual is less than let's say 3,500. So let's run this line of code. All right. So we have a couple of people whose salary is less than 3,000. So we have 2756 like maximum of these people have 2,756. Okay. So basically this is a list of employees whose annual salary is less than $3,000. That's also quite simple to do. It's just one line of code. We'll write a query to list the names and a particular set of details about an employee who joined before a particular date. Let me write it down for you and then we'll understand what exactly I'm doing. So select. Okay. Now we'll define a variable so that we can basically extract a number of columns. Okay. So let me write this down and then I'll tell you what exactly I did here. Name and let's also get their salary. Right? original hiring date. Then we'll also get the salary from the name of the table where the hiring date let's say was lesser than I'll give a random number 0 1 02 2008. Now what exactly I'm doing here is I'm basically listing out the name of the employee, the hiring date and the salary of all the employees that have joined before a particular date. Okay, basically before this date. That's what I've done here. Let's run this line of code and let's see if it works. Right. So here we have the name, the hiring date and the salary of individuals that have joined before a particular date in 2008. So it's as simple as that. Now let's write a query in SQL to list all the employees who joined on a particular date. Okay, let's say we want the details of all the employees who joined on October of 2013 or something like that. For that again it starts off with select star from the name of the table. We'll add the where clause along with the hiring date original hire date a particular date. Let's say 08 2013. Let's run this line of code. So basically we list down employees that were hired on this particular date. Okay, these are details of all the employees that were hired on the same day. So we have around six employees that were hired on the 8th of 2013 some date, right? So guys, that's what you see. SQL is a very easy language. You just have to understand what you're trying to extract. That's all. You just have to know like five to 10 basic commands. And these basic commands can do wonders. they can extract and manipulate data in such a way that you know you can perform proper data analysis, data processing and so on. Now let's try one last query. Let's write a query to list the employees whose annual salary is within a particular range. For this again we have select star from name of the table employee details and let's add the where clause and here we will say salary annual is between is another keyword. So between two numbers. Okay, let's say between uh $24,000 and let's say 50,000, right? All right, so here you can see that all of these employees have a salary range between 25,000 and 50,000. So 43,201, we have 42,000, we have 44,000, we have 45,000, 48,000 is a maximum salary. All right. So this is basically how you perform quering using SQL. What is web scraping? The simplest way to describe web scraping is by thinking of it as a technique that makes extraction and manipulation of data stored on some website possible. One simple example of this would be something like extract the prices of INR against USD and save that price in your variable. The potential use cases for this technique is virtually limitless as it always depends on case to case basis. whatever the requirements of the problem statement may be. So how does web scraping work? First understand that web scraping is not a program itself. It is a code returning any programming language of choice like Java or Python. The program intends to make a connection with the target website with whom the program will be interacting and then downloads the page. After that, the program will take control of some elements and interact with them to achieve a certain task. Whether it may be to extract that element's value or click on a specific button on the website or anything else. This is usually done by interacting with HTML elements and locators. From here on, it depends on the problem statement about what needs to be done. You can store data and save it on your local drive or send it to your email account using other functions and so on. You can also create robots or save data or manipulate as the problem statement says. Now let's look at the tools or libraries required for web scraping. The first is probably the most important one and that is Selenium. Selenium is a project that essentially automates browsers. That's it. Whatever you do with that power is entirely up to the problem statement. Whether your requirement is to automate browser to capture data or feed data or any kind of data manipulation requirements you may have, Selenium has got everything. Next, Pandas. Pandas is a library for Python that provides powerful data analysis and manipulation capabilities. Just for example, if you have a table on some website, you could manipulate that table by using pandas and you could control the rows, columns and any field using pandas and later even export that table in a raw CSV format. And how can we forget beautiful soup? Beautiful soup is a Python library for pulling data from HTML and XML files. It makes it easy to parse information from web pages. There are some differences between beautiful soup and selenium. So always remember this key comparison when working on projects. Beautiful soup is best suited for small project whereas selenium is ideal for complex projects. And lastly, I would like to repeat that web scraping is a technique and it is not limited to a single programming language. I personally developed many project on web scraping using Python but others have done that using UiPath and also Java. Some developers prefer PHP for building their web automation project. So it really depends on the developers to choose languages as they say fit. And now let's talk about some legal complications related to the art of web scraping. First, it may not be allowed by some websites that you scrape their data. So, it is ideal to refer to the terms of the website and fair usage policies before you start scraping the website. Also, if you don't find the clauses that you need in the website terms and conditions, it would be a good idea to ask the website owner if you can scrape their website. However, in most cases, web scraping is all right. As long as you are experimenting with the website and learning to scrape data for your own projects or learning purposes, it would be okay. However, if you intend to sell the data from the target website or make it accessible to public for free, then it might become a problem for you as you may be legally doing something that isn't right. So, when in doubt, remember web scraping is okay for personal usage, not commercial. Let us understand what exactly predictive analysis is. So what is predictive analysis? Predictive analytics or analysis encompasses a variety of statistical techniques from data mining, predictive modeling and machine learning that actually analyze the current and historical facts to make predictions about future or otherwise unknown events. So this is the basic definition from Wikipedia. So we basically use the previously collected data to predict an outcome or an event. So typically historical data is used to build a mathematical model. In our case we can call it a classifier or a predictive model or a regressor which actually captures the important trends and then the current data is used on that model to predict what will happen next or to suggest actions to take optimal outcomes. So let us take a look at various applications where we can actually use predictive analysis. So we can use predictive analysis for lot of things. First of all, we have campaign management. So let's say we have a campaign. We have to figure out what kind of audience will be there or what kind of our target audience is. So we can analyze the previous data of our previous campaigns that we might have managed previously and according to that we can figure out some suggestions or you know the course of action that we have to take. So this is one campaign management that we can do using the predictive analysis or for recent examples let's say an election campaign you know lot of people are gathering a lot of data of previous elections like how it happened and what are the major factors that led to the winning of some so and so person. So this is how we can use predictive analysis and campaign management. Then there is customer acquisition. So we can analyze the whole business and we can figure out different points to you know figure out what kind of task or events we can actually produce in order to make our business better so that we'll be able to make customer acquisition better. And then we have budgeting and forecasting as well. Similarly you know taking a look at previous data we can finalize some budget and forecast few uh related pointers. For example, we have stock prediction using Python or any other language such as R also. And then we have fraud detection. So we can you know manage a lot of data like for credit card companies. They make use of hundreds and hundreds of users and they analyze the data to predict or you know detect the fraudulent transactions in their data. And then there is promotions as well. So we can analyze you know the target audience. We can uh follow the trends like they are following you know the types of content they're actually going for. And then similarly you can make promotions according to that. Then there's pricing also like you can figure out let's say you have a supermarket somewhat like you know what Walmart does. So you have all the pricing and everything. So what you can do is figure out the price of a product after several time based on the recent purchases and also the recent scenario or the previous historical data upon which the price has been distributed accordingly and you can also plan for the demand as well using the predictive analysis. So these are a few applications that I can think of right now and these are only a few applications where you can use predictive analysis to predict the for example I'll talk about football guys. So let's say if you have a favorite player and in the next season you want to see how much price he might go for in certain other clubs. So you can make use of the data at your bay and depending on the purchases that happened in previous seasons or the windows you can actually figure out somewhat around what kind of prize your favorite player is going to go for. So that is one example I can think of right now. So these are a few applications of predictive analysis. Now let's move on to the next topic of the session guys which is steps involved in predictive analysis. So this is a very important concept in this session guys. So you have to fully understand what kind of steps that goes in while you're doing a predictive analysis. So the first step has to be a data exploration. So what you have to do is gather data, upload it into your program. Then you have to take a look at your data in a perspective which will clear certain things for you like you have to figure out what kind of data you're dealing with. What are the columns? What are the features that you have inside your data? What kind of data it is? How many numerical values are there? What kind of data types are there inside your data? Is it a CSV file or not? So on. So you have to figure out a lot of things while data exploration and after that you can uh figure out how to clean your data. By cleaning I mean you have to figure out the redundancies that might hinder your model. So for that you have to check for null values. missing values and then you have to figure out what kind of columns will be actually better if you put them inside your model. Then what are the redundant variables like what kind of columns that you can actually remove and will not make a difference in your model. So that covers the data cleaning part and then there is modeling where you have to model your or you have to select your predictive model guys. So there are a lot of models that you can go for but in this session I'm going to use the linear regression model because it's the very simple or the basic one so that the beginners also will be able to learn it properly. After modeling, you have to check for the evaluation or accuracy. You know how your model is actually performing. So let's talk about these steps in a little more detailed way. So we'll talk about data exploration. First of all, as I've already told you, data exploration is gathering your data and then taking a look at your data in a perspective that will clear a lot of things. For example, you'll be able to see the number of columns, number of rows. You will have a description of all the data types. What kind of variables are there? You will have the mean values, the average values, minimum values and uh you can also check for unique values in your columns as well. So this all comes in data exploration. And after this the second step is data cleaning. And I've already told you guys data cleaning is basically getting rid of redundancies in your data which includes the missing values which may hinder. And you have to make sure that your model is not going to cause overfitting or underfitting due to the noise and noise is basically irrelevant data that may be in the form of null values. So you have to make sure you get rid of them or replace them with average values in the column. And then there is redundancies like outliers which are not necessarily required in your model. So you can remove them as well. So this is all about data cleaning. And then we have the third step which is modeling. So for data modeling first of all you have to understand the relationship between the variables in your model so that you'll figure out what kind of model you're going to go for. So for example if let's say if you have a target variable in our case which will be a price of certain goods. So let's say you have to figure out the relationship between variables. So if you're going for linear regression, you have to make sure the relationship is continuous. And let's say if you are going for logistic regression, it is important that you go for continuous variables. The target variable although has to be dichomous or what do you call it categorical which is like let's say if I'm trying to predict something using the logistic regression the answer would probably be yes or no or be one or zero something like that but in case of linear regression we have to make sure that there is some continuous relationship between the variables which is my target variable and my independent variables taking a look at the fourth part or the fourth step and the final step is performance analysis. So after you're done making a model, so you have to perform certain analysis which is you know checking the accuracy of the model and making sure that it's above 70. I mean if you are a beginner and if you are trying to make your first prediction model anything above 70% accuracy score is very good guys but I would suggest that if you are working on a good model and if you want your model to be good the accuracy should be ranging around 0. 9 which is 90 more than 90% and if you get it the first time it's well and good but it solely depends on your data and the kind of model selection that you do. So let's take a look at the next topic in our session guys. So this is basically where I'm going to perform predictive analysis using Python on a data set. So I have a problem statement in which I have a data set which has certain values or certain variables which has columns like you know how many bedrooms does a house have and what kind of square feet it is grabbing and all these things that I'll show you in the data and using that data I am going to predict the house of a price. So let's take it up to Jupyter notebook guys and I'll show you what I'm going to do over there. So I have a Jupyter notebook over here guys and if you're not familiar with Jupyter notebook guys, I suggest you to check out our tutorial on YouTube. We have a Jupyter notebook tutorial. You'll be able to learn it properly. I mean there's not really so much to learn in Jupyter notebook. It's quite easy. That is why I'm using also. And we have a cheat sheet as well. So you can go for that. So first of all I'm going to import some dependencies. So for the first step that is data exploration I have to get the data. So for that I'm going to use the pandas library and I'll import a few other libraries as well like I'm going to use the seaborn to check the relationship between the variables basically for EDA exploratory data analysis and if you guys don't know what EDA is I suggest you to check out another tutorial which is exploratory data analysis that we have on our YouTube channel and then I'm going to import numpy as well just in case. All right. And you can see guys, I have to just press shift and enter. And uh this is why I'm using uh Jupyter notebook because the implementation is very easy and I can segregate my uh data or the code in different cells. So I'm importing this u and I can just make it okay I'll make it a little bigger so that it's visible to you everyone. What I can do is I can comment a part. let's say installing dependencies and it is in a separate cell. So that makes it quite descriptive when you're coding and when you are trying to figure out what's wrong in your code. It helps actually. So after this what you have to do is I'll check I'll have to import the data. For that I'm going to use the read CSV module which is basically going to go to the file and read my data. Guys, the name of the file is house. csv. We have a truncated error. All right. So guys, I have to show you something. So usually when you do this, when you copy the file location, you get that uh uni code error. But let's see if I change these back slashes to forward back slashes. What happens? Do I still get the error or not? Okay, we have a right. So I was doing something wrong here. So this is one exercise for you guys. Like earlier when I was using the backward backslashes, I was getting a uni code error. But when I change it to the forward backslashes, I'm not getting that error. So this is one question for you guys. Tell me why you think it happened in the comment sections below. Now moving on, I'll take the first look at my data guys. So I'll just use the head method to get my first data. So these are the first five rows in my data guys. So we I have ID, we have date, the price is there, bedrooms, we have bathrooms, square feet living and square ft lot, we have floors as well. Waterfront is zero. Okay, it has to be zero and one I guess there is view then grade square above. So these are the columns that I have inside my data guys. So I'll check the last five rows as well. For that I am using the tail method. So as you can see you can get the first look of your data using the data dot head and data. tail method. After that I check the columns of my data. And let's check the shape as well guys so that we'll know what we're dealing with. Okay it's not callable. All right. So we have 21,613 entries with 21 columns. Okay. It's a quite big data set and let me tell you guys this is one data set that I found on kegle and it's very easy to find the house data set and I'm using this example of house data set because it's very common and to find this data set is very easy you go on to kegle and you just look for house prediction uh data sets and it will show you a lot of data sets that you can download there from okay so you can find the data set on kegle guys now we have checked the shape as well okay I'll use one more method that is data So describe all right this callable. All right. So we have all these numerical values and using the describe method we can get the 50% minimum, maximum and the standard deviation. We can get the mean value and the count as well. So let's say for bedrooms the mean value is three. Uh the most common entry in the bedroom section is a three-bedroom house and then for bathrooms also is a twob house. square ft² is almost 2079 square ft² and then for maximum values we have even a 33bedroom house as well and we have a house with eight bathrooms and the square feet is 13,540 all right so minimum value is we have a zero bedroomedroom house okay that's going to be something else and square ft 290 so this is how you use the describe method and this is the first step guys I'm trying to do the data exploration now after this I think I'm pretty sure about what kind of data I'm dealing with. Now what I'm going to do I'm going to move on to the next step that is checking for the relationship between these variables. So for that I'm going to use the data visualization and I'm going to use a few plot points using the seabon library. And if you don't know about seaborn we have a YouTube uh tutorial on seaborn library as well. So you can find out different kinds of uh plots that you can use for data visualization and data visualization is nothing but it's a process where you can visualize your data and you can try to figure out the relationship between the variables. Before that I want to check for null values or missing values because I don't want to get any hindrance in my data set where I'm modeling. So first of all you have to do check for null values and let's get a sum as well. So we have zero almost. Okay. So we have no null values in this data set. So usually if you find a null value and if it's a big data set and let's say if all these values are let's say 2,000 and if you have 10 missing values you can just remove those 10 values. But if there are more null values I suggest you to replace them with the mean value. And to find out the mean value, you can just go over here. Let's say if you have uh let's say if you're checking for bedrooms, how many null values are there? And let's say there are 500 out of 21,000. So you can just uh replace the zero with this value which is three. And similarly for any other column, you can do the same using the mean value. So since there are no null values inside this data set because it's a very uh clean data set that I downloaded it from kegle. So we're going to move on to the next step which is visualization. And mind you guys, this is the step in my data exploration part and the data cleaning part, not the other step that we use for predictive analysis. All right. So I have no null values, but there are a few redundancies that I want to get rid of. I'll talk about that later, guys. First of all, I'm going to use a relation plot X. I'll use Okay. So I want to check or my basic aim is to predict the price of the house. All right. So for X I'll just take price and let's check the relationship between uh these variables. So I'll check for bedroom. All right. Kind is equal to Okay. We'll not use kind data is equal to data. Okay. All right. So I think we can like almost there are so many I mean so this is one uh relationship that I'm getting over here. price is not very clear but we are getting the relationship of the bedrooms or the most common bedrooms. So which is around over here that is 0 to 5 and similarly I can check for other variables as well like bathrooms see guys so the price is actually increasing with the number of bathrooms but it's not necessarily um same for everything so there has to be some other dependencies as well because as you can see since the bathrooms are pricing price is not actually rising that much okay I'll just copy this for bedrooms the price is actually increasing uh pretty much with each bedroom I mean not really uh if we take a look at 10 bedrooms also the price is pretty much the same so there's not one decisive factor I can think of for this all right so we'll check for some other as well squareq squareq squareq squareq square ft giving. So this is one uh linear relationship that I'm seeing over here guys. So with each uh square ft rising most of the prices actually in this area only like from 0 to 40 400,000 or 40 million actually but uh we can see that it's a linear relationship with each increasing um square feet the price is actually rising. So this is one thing that has to be there in inside our train set. I'll tell you what a train set actually is. Then there is floor as well. We can check for floors. All right. Okay. So floors is actually uh pretty descriptive over here. So for most of the values are in the two floors area and then we can check for waterfront as well. Okay. We can do one thing. I'll tell you one uh trick. So we'll add the hue over here and this is going to be water front. So with the houses uh which actually has a waterfront are in oranges and the other ones are in blue. So you can see the relationship between them and uh similarly I can use other. Okay. So let's say latitude and longitude. So you can figure out the relationship between the variables using uh the visualization. So for me I think in this data set to get the price out we're going to have to use bedrooms, bathroom there has to be square feet a lot of square feet has to be there. Floors also we can get and then what view front view we'll use square feet how we have to use and year build and year renovated we can leave it out from the train set and zip code also we don't actually need latitude and longitude are also not decisive in uh prediction because we can use it for uh visualization and we can actually get the picture over there and square feet living is actually important. So these are the redundancies that I was talking about inside your model. So now we'll move on to the modeling part guys. So what I'll do now is I'll import a few dependencies guys. So from skarn. So first of all I have to import the linear regression from linear models and I'm going to import linear regression. All right. So I'll import the model selection import t test split. All right. So first thing that I have to do is I have to segregate my data into a training set and a test set. So I'll do one thing guys. Right. So I'll get my data in this cell. So two things that I don't actually need inside my data are date because u I'll get the training data or I'll write it as train. So which is going to be data and I'm going to drop a few columns. So I'll have to drop so I have to drop a few columns. So first of all I have to drop price because it cannot be in my training set since I'm going to predict it. And then there has to be ID. We don't actually need it. And then I'm going to drop date as well. There's not really a pro a lot of things over here. So for now I'll just remove all these columns. And uh for our test set or the dependent variable let's say I'm going to take the variable as test. Let's say and for this I'm going to use data dot or we need actually just one column that is price right method object is not subscriptible I'm sorry I made a syntax error over here guys I have to add the axis as well now it's fine guys so now what I will do is I'll use I'm going to segregate my data into X train X test y train and y test and now I'm going to use the train test split method. So first we have train then we have test we have test size which is let's say 0. 3 and we have the random state is equal to let's say two right so we have made our x and x method now I'll use one variable let's say regr regressor and now I'm going to call my linear regression model it's made guys. So now I'm going to use the fit method to fit my X train and X Y train data guys. The training data I have to fit right we have no errors here guys. After this I can just uh okay I'll take one variable let's say predict and gr dot predict x test and y test. All right wait a second guys I'm sorry. So now the modeling part is done guys. So I'll explain what I've done over here. I took the linear model which is linear regression and then for segregating my data into training and testing set I'm using the train test split. Before that I segregated the data for my model inside which I'm using the training set which has all the values from the data set except price ID and date. So I've removed these three columns because I thought these are redundant for my model right now and the variable that I'm going to predict over here is uh the price. So I'm taking that alone. So after that I use a train test split method to actually separate the data into training and test set and then I call the linear regression model over here using the regression model I'm fitting the training data and after that I am using it to predict a value. So now comes the part where we have to check the efficiency of the model. So for regression models it is very easy guys. You can just uh check the score. For this you have to provide a few values X test and Y test. And we have the accuracy of 0. 70 which is not bad guys. If you're using the model or the data set this big it is quite uh predictable get this kind of accuracy but you can do something else to improve the accuracy. I mean uh you can look at the data and remove all the values that you find will help you into improving your accuracy like you can remove latitude, longitude, you can remove zip code, year renovated, year built that you can uh actually remove waterfront and view as well or keep only a few values that you actually need which is bedrooms, bathrooms I think and everything related to square feet that if you keep in your training set then it's going to be a higher accuracy for you guys. All right. So now that we are done with the session guys, I want to give you a exercise sort of which you can do for practicing this. So I have used the linear regression model over here. So I want you to do one thing guys. Check out other classifiers and regressive models that you can use to predict a value. I mean we have a tutorial on all of them on our YouTube channel and see if you can use the same data to make a prediction model using other classifiers like a random forest classifier. Then we have a decision tree. then you can use the logistic regression for this as well. I mean if you have continuous data then you can go for linear regression but if you find a categorical data let's say we have waterfront or not. So you can check for that as well. As you know we are living in a world of humans and machines. Humans have been evolving and learning from the past experience since millions of years. On the other hand, the era of machines and robots have just begun. In today's world, these machines or the robots are like they need to be programmed before they actually follow your instructions. But what if the machines started to learn on their own? And this is where machine learning comes into picture. Machine learning is the core of many futuristic technological advancement in our world. Today you can see various examples or implementation of machine learning around us such as Tesla's self-driving car, Apple Siri, Sophia AI robot and many more are there. So what exactly is machine learning? Well, machine learning is a sub field of artificial intelligence that focuses on the design of system that can learn from and make decisions and predictions based on the experience which is data. In the case of machines, machine learning enables computer to act and make datadriven decisions rather than being explicitly programmed to carry out a certain task. These programs are designed to learn and improve over time when exposed to new data. Let's move on and discuss one of the biggest confusion

### Hypothesis Testing Statistics [3:09:33]

of the people in the world. They think that all the three of them, the AI, the machine learning and the deep learning all are same. You know what? They are wrong. Let me clarify things for you. Artificial intelligence is a broader concept of machines being able to carry out task in a smarter way. It covers anything which enables the computer to behave like humans. Think of a famous Turing test to determine whether a computer is capable of thinking like a human being or not. If you're talking to Siri on your phone and you get an answer, you're already very close to it. So this was about the artificial intelligence. Now coming to the machine learning part. So as I already said machine learning is a subset or a current application of AI. It is based on the idea that we should be able to give machine the access to data and let them learn from themselves. It's a subset of artificial intelligence that deals with the extraction of pattern from data set. This means that the machine can not only find the rules for optimal behavior but also can adapt to the changes in the world. Many of the algorithms involved have been known for decades, centuries even. Thanks to the advances in the computer science and parallel computing, they can now scale up to massive data volumes. So this was about the machine learning part. Now coming over to deep learning. Deep learning is a subset of machine learning where similar machine learning algorithm are used to train deep neural network so as to achieve better accuracy in those cases where former was not performing up to the mark. Right? I hope now you understood that machine learning, AI and deep learning all three are different. Okay, moving on ahead. Let's see in general how a machine learning work. One of the approaches is where the machine learning algorithm is trained using a labeled or unlabelled training data set to produce a model. New input data is introduced to the machine learning algorithm and it make prediction based on the model. The prediction is evaluated for accuracy and if the accuracy is acceptable, the machine learning algorithm is deployed. Now if the accuracy is not acceptable the machine learning algorithm is trained again and again with an augumented training data set. This was just an highle example as there are many more factor and other steps involved in it. Now let's move on and subcategorize the machine learning into three different types. The supervised learning, unsupervised learning and reinforcement learning and let's see what each of them are, how they work and how each of them is used in the field of banking, healthcare, retail and other domains. Don't worry, I'll make sure that I use enough examples and implementation of all three of them to give you a proper understanding of it. So, starting with supervised learning. What is it? So, let's see a mathematical definition of supervised learning. Supervised learning is where you have input variables X and an output variable Y and you use an algorithm to learn the mapping function from the input to the output. That is y= fx. The goal is to approximate the mapping function so well that whenever you have a new input data X, you could predict the output variable that is Y for that data. Right? I think uh this was confusing for you. Let me simplify the definition of supervised learning. So we can rephrase the understanding of the mathematical definition as a machine learning method where each instances of a training data set is composed of different input attribute and an expected output. The input attributes of a training data set can be of any kind of data. It can be a pixel of image. value of a database row or it can even be audio frequency histogram. Right? For each input instance, an expected output value is associated. The value can be discrete representing a category or can be a real or continuous value. In either case, the algorithm learns the input pattern that generate the expected output. Now once the algorithm is trained, it can be used to predict the correct output of a never-seen input. You can see an image on your screen. Right? In this image you can see that we are feeding raw inputs as image of apple to the algorithm. As a part of the algorithm we have a supervisor who keeps on correcting the machine or who keeps on training the machine. It keeps on telling him that yes it is an apple and no it is not an apple. Things like that. So this process keeps on repeating until we get a final train model. Once the model is ready it can easily predict the correct output of a neverseen input. In this slide you can see that we are giving an image of a green apple to the machine and the machine can easily identify it as yes it is an apple and it is giving the correct result. Right? Let me make things more clearer to you. Let's discuss another example of it. So in this slide the image shows an example of a supervised learning process used to produce a model which is capable of recognizing the ducks in the image. The training data set is composed of label picture of ducts and non-ducts. The result of supervised learning process is a predictor model which is capable of associating a label duck or not duck to the new image presented to the model. Now once trained the resulting predictive model can be deployed to the production environment. You can say a mobile app for example. Once deployed it is ready to recognize the new pictures. Right now you might be wondering why this category of machine learning is named as supervised learning. Well, it is called as supervised learning because the process of an algorithm learning from the training data set can be thought of as a teacher supervising the learning process. We know the correct answers. The algorithm iteratively makes while predicting on the training data and is corrected by the teacher. The learning stops when the algorithm achieves an acceptable level of performance. Now let's move on and see some of the popular supervised learning algorithm. So we have linear regression, random forest and support vector machines. These are just for your information. We'll discuss about these algorithms in our next video. Now let's see some of the popular use cases of supervised learning. So we have Kotana. Kotana or any other speech automation in your mobile phone trains using your voice and once trained it start working based on the training. This is an application of supervised learning. Suppose you are telling okay Google call Sam or you say hey Siri call Sam you get an answer to it and the action is performed and automatically a call goes to Sam. So these are just an example of supervised learning. Next comes the weather app based on some of the prior knowledge like when it is sunny the temperature is higher when it is cloudy humidity is higher any kind of that they predict the parameters for a given time. So this is also an example of supervised learning as we are feeding the data to the machine and telling that whenever it is sunny the temperature should be higher whenever it is cloudy the humidity should be higher. So it's an example of supervised learning. Another example is biometric attendance where you train the machine and after couple of inputs of your biometric identity be it your thumb your iris or your earlobe or anything once trained the machine can validate your future input and can identify you. Next comes in the field of banking sector. In banking sector, supervised learning is used to predict the creditworthiness of a credit card holder by building a machine learning model to look for faulty attributes by providing it with a data on deloquent and non-eloquent customers. Next comes the healthcare sector. In the healthcare sector, it is used to predict the patients readmission rates by building a regression model by providing data on the patients treatment administration and readmissions to show variables that best correlate with readmission. Next comes the retail sector. In retail sector, it is used to analyze a product that a customer buy together. It does this by building a supervised model to identify frequent item sets and association rule from the transactional data. Now let's learn about the next category of machine learning. The unsupervised part. Mathematically unsupervised learning is where you only have input data X and no corresponding output variable. The goal for unsupervised learning is to model the underlining structure or distribution in the data in order to learn more about the data. So let me rephrase you this in simple terms. In unsupervised learning approach, the data instances of a training data set do not have an expected output associated to them. Instead, unsupervised learning algorithm detects pattern based on init characteristics of the input data. An example of machine learning task that applies unsupervised learning is clustering. In this task, similar data instances are grouped together in order to identify clusters of data. In this slide you can see that initially we have different varieties of fruits as input. Now these set of fruits as input X are given to the model. Now once the model is trained using unsupervised learning algorithm, the model will create clusters on the basis of its training. It will group the similar fruits and make their cluster. Let's take another example of it. So in this slide, the image below shows an example of unsupervised learning process. This algorithm processes an unlabelled training data set and based on the characteristics, it groups the picture into three different clusters of data. Despite the ability of grouping similar data into clusters, the algorithm is not capable to add labels to the group. The algorithm only knows which data instances are similar but it cannot identify the meaning of this group. So now you might be wondering why this category of machine learning is named as unsupervised learning. So these are called as unsupervised learning because unlike supervised learning ever there are no correct answer and there is no teacher algorithms are left on their own to discover and present the interesting structure in the data. Let's move on and see some of the popular unsupervised learning algorithm. So we have here K means a priori algorithm and hierarchal clustering. Again these are just for your information sake. Now let's move on and see some of the examples of unsupervised learning. Suppose a friend invites you to his party and where you meet totally strangers. Now you'll classify them using unsupervised learning as you don't have any prior knowledge about them. And this classification can be done on the basis of gender, age group, dressing, educational qualification or whatever way you might like. Now why this learning is different from supervised learning? Since you didn't use any past or prior knowledge about the people, you kept on classifying them on the go. As they kept on coming, you kept on classifying them. Yeah, this category of people belong to this group. that group and so on. Okay, let's see one more example. Let's suppose you have never seen a football match before and by chance you watch a video on the internet. Now you can easily classify the players on the basis of different criterion like player wearing the same kind of jersey are in one class player wearing different different class or you can classify them on the basis of their playing style like the guy is attacker so he's in one class he's a defender he's in another class or you can classify them whatever way you observe the things. So this was also an example of unsupervised learning. Let's move on and see how unsupervised learning is used in the sectors of banking, healthcare and retail. So starting with banking sector. So in banking sector, it is used to segment customers by behavioral characteristic by surveying prospects and customers to develop multiple segments using clustering. In healthcare sector, it is used to categorize the MRA data by normal or abnormal images. It uses deep learning techniques to build a model that learns from different features of images to recognize a different pattern. Next is the retail sector. In retail sector, it is used to recommend the products to customer based on their past purchases. It does this by building a collaborative filtering model based on the past purchases by them. I assume you guys now have a proper idea of what unsupervised learning means. If you have any slightest doubt, don't hesitate and add your doubt to the comment section. So let's discuss the third and the last type of machine learning that is reinforcement learning. So what is reinforcement learning? Well, reinforcement learning is a type of machine learning algorithm which allows software agents and machine to automatically determine the ideal behavior within a specific context to maximize its performance. The reinforcement learning is about interaction between two elements the environment and the learning agent. The learning agent leverages two mechanism namely exploration and exploitation. When learning agent acts on trial and error basis, it is termed as exploration. And when it acts based on the knowledge gained from the environment, it is referred to as exploitation. Now this environment rewards the agent for correct actions which is reinforcement signal. Leveraging the rewards obtained, the agent improves its environment knowledge to select the next action. In this image, you can see that the machine is confused whether it is an apple or it's not an apple. Then the machine is trained using reinforcement learning. If it makes correct decision, it get rewards point for it. And in case of wrong, it gets a penalty for that. Once the training is done, now the machine can easily identify which one of them is an apple. Let's see an example. Here we can see that we have an agent who has to judge from the environment to find out which of the two is a duck. The first task he did is to observe the environment. Next he selects some action using some policy. It seems that the machine has made a wrong decision by choosing a bunny as a duck. So the machine will get penalty for it. For example, minus 50 point for a wrong answer. Right now the machine will update its policy and this will continue till the machine gets an optimal policy. From the next time machine will know that bunny is not a duck. Let's see some of the use cases of reinforcement learning. But before that let's see how Pavlo trained his dog using reinforcement learning or how he applied the reinforcement method to train his dog. Pavlo integrated learning in four stages. Initially Pavlo gave meat to his dog and in response to the meat the dog started salivating. Next what he did? He created a sound with the bell. For this the dog did not respond anything. In the third part he tried to condition the dog by using the bell and then giving him the food. Seeing the food the dog started salivating. Eventually a situation came when the dog started salivating just after hearing the bell. Even if the food was not given to him as the dog was reinforced that whenever the master will ring the bell, he will get the food. Now let's move on and see how reinforcement learning is applied in the field of banking, healthcare and retail sector. So starting with the banking sector in banking sector reinforcement learning is used to create a next best offer model for a call center by building a predictive model that learns over time as user accept or reject offer made by the sales staff. Fine. Now in healthcare sector it is used to allocate the scars medical resources to handle different type of ER cases by building a mark of decision process that learns treatment strategies for each type of ER case. Next and the last comes the retail sector. So let's see how reinforcement learning is applied to retail sector. In retail sector, it can be used to reduce excess stock with dynamic pricing by building a dynamic pricing model that adjust the price based on customer response to the offers. Now let's explore different types of ML models. So not all data is structured the same way and different problems require different approaches. So for example, predicting stock prices requires the models that learn from historical trends and then identifying objects in images needs model that recognize patterns in visual data. Next, the chatbots and voice assistants rely on the models train to understand and generate human language. So to tackle these challenges as I discussed previously that ML is divided into different learning models such as supervised unsupervised and reinforcement learning and each has its own strengths and it is used depending on the problem at hand. Since we know why different ML models are needed let's see how they play a crucial role in generative AI. Well, generative AI is one of the most exciting applications of machine learning. And unlike traditional ML models that make predictions or classifications, generative models create entirely new content. And here's how ML enables AI to generate. So first here we have text. A language models like GPT generate humanlike text for chatbots, content writing, and coding. Next is the image. So AI powered tools like Dali can create realistic images from textual descriptions. Next is videos. So advanced ML models synthesize lifelike video content, transforming media, marketing and even film making. So these advancements in generative AI are reshaping creativity and automation proving that machine learning is not just about making decision, it's about creating new possibilities. So now that we have seen how ML models enable AI to create new content. So now let us briefly understand the different types of machine learning models. So here the first type of machine learning model is supervised learning. Supervised learning trains a model using label data where each input has a corresponding correct output and this makes it ideal for task where historical data can be used to predict future outcomes. For example, let's say spam detection. Email services like Gmail uses supervised learning to classify emails as spam or not spam by learning from past labelled example. The next example is the price predictions. So real estate platforms use regression models to predict house prices based on the features like location, size and amenities. Now let us see some of the popular algorithms. So first let's discuss on decision trees. These models break down the data into a treel like structure where each node represent a decision based on a feature. So they are easy to interpret and work well for both classification. For example, deciding if an email is a spam or not and regression example predicting horse price. However, they can become overly complex. Next is the support vector machines. So SVMs are powerful for classification task as they find the optimal boundary also called a hyper plane and that best separates different clauses in the data. They work well for highdimensional spaces and cases where the distinction between categories is clear such as handwriting, facial recognition or medical diagnosis. So now that we have seen how label data is used. So now let's explore how unsupervised learning finds pattern without labels. Well, unsupervised learning works with unlabelled data identifying hidden patterns and relationships without predefined categories. So here we have some of the popular algorithms. So first is the K means clustering. This algorithm partitions data into a predefined number of clusters by grouping similar data points based on their attributes. It works well for tasks like customer segmentation where businesses can group customer based on purchasing behavior. However, it assumes clusters are spherical and may struggle with irregular shaped data. Next, we have autoenccoders. So, these are specialized neural networks designed to learn efficient data representations by encoding and reconstructing input data. Let us see some of the examples. So, first example here we have is customer segmentation where e-commerce platforms group customer based on their shopping behavior to offer personalized recommendations. The next example is market analysis. Businesses analyze purchasing trends to find association such as which products are frequently brought together. Now we have covered both labeled and unlabelled learning. So let's see how semi-supervised learning combines the best of both worlds. So semi-supervised learning bridges the gap between the supervised and unsupervised learning by using a small amount of labelled data along with large amount of unlabelled data. So for example, let's say AI assistant medical diagnosis. Labeled medical images such as X-rays with diagnosis are scarce, but large amounts of unlabeled images exist. Semi-supervised learning help AI learn patterns from both labeled and unlabelled data improving accuracy in disase detection. All right. Now let's explore the reinforcement learning where AI learns through trial and error. Well, reinforcement learning is inspired by the concept of learning through trial and error. So, models interact with an environment, receive rewards or penalties for actions, and refine their strengths over time. For example, let's say gaming. Mario AI developed using reinforcement learning, learns to navigate levels by optimizing actions through trial and error. The next example is robotics where robots learn to work, balance or perform task through reinforcement learning by maximizing positive outcomes. Also, reinforcement learning uses agents, actions and rewards to improve decision making, making it ideal for task requiring continuous learning and adaption. So now that we have covered all the types of machine learning models, so let's go over some of the key tips to help you choose the right one for your needs. So here are the tips. when it comes to supervised learning, classifying emails as spam or not and diagnosing disases from patient data. Next is the unsupervised learning. So unsupervised learning is best when you're grouping shoppers by behavior and detecting fraud in banking. Next we have semi-supervised learning and this is best when you're improving speech recognition with limited label data and identifying fake news. And finally the reinforcement learning. This will be best when you're training self-driving cars to navigate, optimizing AI in video games like Mario. So whether it's supervised, unsupervised, semi-supervised, or reinforcement learning, each model plays a crucial role in shaping AI's future. So as generative AI continues to evolve, this models are driving innovation in text, images, and video generation. So which machine learning model do you find the most fascinating? Let me know in the comments below. — What is regression? Well, regression analysis is a form of predictive modeling technique which investigates the relationship between a dependent and independent variable. A regression analysis involves graphing a line over a set of data points that most closely fits the overall shape of the data. A regression shows the changes in a dependent variable on the y-axis to the changes in the explanatory variable on the x-axis. Fine. Now you'd ask what are the uses of regression? Well, there are major three uses of regression analysis. The first being determining the strength of predicators. The regression might be used to identify the strength of the effect that the independent variables have on the dependent variable. For example, you can ask question like what is the strength of relationship between sales and marketing spending or what is the relationship between age and income. Second is forecasting and effect. In this the regression can be used to forecast effects or impact of changes. That is the regression analysis help us to understand how much the dependent variable changes with the change in one or more independent variable. Fine. For example, you can ask question like how much additional sale income will I get for each $1,000 spent on marketing. Third is trend forecasting. In this, the regression analysis predict trends and future values. The regression analysis can be used to get point estimates. In this, you can ask questions like what will be the price of Bitcoin in next 6 months. Right? So, next topic is linear versus logistic regression. By now, I hope that you know what a regression is. So let's move on and understand its type. So there are various kinds of regression like linear regression, logistic regression, polomial regression and others. All right. But for this session we'll be focusing on linear and logistic regression. So let's move on and let me tell you what is linear regression and what is logistic regression. Then what we'll do we'll compare both of them. All right. So starting with linear regression. In simple linear regression we are interested in things like y= mx plus c. So what we are trying to find is the correlation between X and Y variable. This means that every value of X has a corresponding value of Y in it if it is continuous. All right. However, in logistic regression, we are not fitting our data to a straight line like linear regression. Instead, what we are doing? We are mapping Y versus X to a sigmoid function. In logistic regression, what we find out is Y 1 or zero for this particular value of X. So thus we are essentially deciding true or false value for a given value of X. Fine. So as a core concept of linear regression you can say that the data is modeled using a straight line. Where in the case of logistic regression the data is modeled using a sigmoid function. The linear regression is used with continuous variables. On the other hand the logistic regression it is used with categorical variable. The output or the prediction of a linear regression is the value of the variable. On the other hand, the output or prediction of a logistic regression is the probability of occurrence of the event. Now, how will you check the accuracy and goodness of fit? In case of linear regression, you have various methods like measured by loss, R squar, adjusted R squar, etc. While in the case of logistic regression, you have accuracy, precision, recall, F1 score, which is nothing but the harmonic mean of precision and recall. Next is ROC curve for determining the probability threshold for classification or the confusion matrix etc. There are many. All right. So summarizing the difference between linear and logistic regression. You can say that the type of function you are mapping to is the main point of difference between linear and logistic regression. A linear regression maps a continuous X to a continuous Y. On the other hand, a logistic regression maps a continuous X to the binary Y. So we can use logistic regression to make category or true false decisions from the data. Fine. So let's move on ahead. Next is linear regression selection criteria or you can say when will you use linear regression. So the first is classification and regression capabilities. Regression models predict a continuous variable such as the sales made on a day or predict the temperature of a city. their reliance on a polomial like a straight line to fit a data set poses a real challenge when it comes towards building a classification capability. Let's imagine that you fit a line with a training points that you have. Now imagine you add some more data points to it. But in order to fit it, what you have to do? You have to change your existing model. That is maybe you have to change the threshold itself. So this will happen with each new data point you add to the model. Hence the linear regression is not good for classification models. Fine. Next is data quality. Each missing value removes one data point that could optimize the regression. In simple linear regression, the outliers can significantly disrupt the outcome. Just for now, you can know that if you remove the outliers, your model will become very good. All right, so this is about data quality. Next is computational complexity. The linear regression is often not computationally expensive as compared to the decision tree or the clustering algorithm. The order of complexity for nraining example and x features usually falls in either bego of x square or bego of xn. Next is comprehensible and transparent. The linear regression are easily comprehensible and transparent in nature. They can be represented by a simpler mathematical notation to anyone and can be understood very easily. So these are some of the criteria based on which you'll select the linear regression algorithm. All right. Next is where is linear regression used. First is evaluating trends and sales estimate. Well, linear regression can be used in business to evaluate trends and make estimates or forecasts. For example, if a company sale have increased steadily every month for past few years, then conducting a linear analysis on the sales data with monthly sales on the y-axis and time on the x-axis. This will give you a line that predicts the upward trends in the sale. After creating the trend line, the company could use the slope of the lines to forecast sale in future months. Next is analyzing the impact of price changes. Well, linear regression can be used to analyze the effect of pricing on consumer behavior. For instance, if a company changes the price on a certain product several times, then it can record the quantity it sell for each price level and then perform a linear regression with sold quantity as a dependent variable and price as the independent variable. This would result in a line that depicts the extent to which the customer reduced their consumption of the product as the price is increasing. So this result would help us in future pricing decisions. Next is assessment of risk in financial services and insurance domain. Well, linear regression can be used to analyze the risk. For example, a health insurance company might conduct a linear regression algorithm. How it can do? It can do it by plotting the number of claims per customer against its age. and they might discover that the old customers tend to make more health insurance claim. Well, the result of such an analysis might guide important business decisions. All right. So, by now you have just a rough idea of what linear regression algorithm is like what it does, where it is used, when you should use it. All right. Now, let's move on and understand the algorithm in depth. So, suppose you have independent variable on the x-axis and dependent variable on the y-axis. All right. Suppose this is the data point on the x-axis. The independent variable is increasing on the x-axis and so does the dependent variable on the y-axis. So what kind of linear regression line you would get? You'd get a positive linear regression line. All right? As the slope would be positive. Next is suppose you have a independent variable on the x-axis which is increasing and on the other hand the dependent variable on the y-axis that is decreasing. So what kind of line will you get in that case? You'll get a negative regression line in this case. as the slope of the line is negative. And this particular line that is line of y= mx + c is a line of linear regression which shows the relationship between independent variable and dependent variable. And this line is only known as line of linear regression. Okay. So let's add some data points to our graph. So these are some observation or data points on our graphs. Let's plot some more. Okay. Now all our data points are plotted. Now our task is to create a regression line or the best fit line. All right. Now once our regression line is drawn, now it's the task of prediction. Now suppose this is our estimated value or the predicted value and this is our actual value. Okay. So what we have to do our main goal is to reduce this error that is to reduce the distance between the estimated or the predicted value and the actual value. The best fit line would be the one which had the least error or the least difference in estimated value and the actual value. All right? Or in other word we have to minimize the error. This was a brief understanding of linear regression algorithm. Soon we'll jump to its mathematical implementation. All right. But for then let me tell you this. Suppose you draw a graph with speed on the x-axis and distance covered on the y-axis with the time remaining constant. If you plot a graph between the speed traveled by the vehicle and the distance traveled in a fixed unit of time, then you'll get a positive relationship. All right? So suppose the equation of line is y= mx + c. Then in this case y is the distance traveled in a fixed duration of time, x is the speed of vehicle, m is the positive slope of the line and c is the y intercept of the line. All right. Suppose the distance remaining constant. You have to plot a graph between the speed of the vehicle and the time taken to travel a fixed distance. Then in that case you'll get a line with a negative relationship. All right. The slope of the line is negative. Here the equation of line changes to y = minus of mx + c where y is the time taken to travel a fixed distance. X is the speed of vehicle, m is the negative slope of the line and c is the y intercept of the line. All right. Now let's get back to our independent and dependent variable. So in that term y is our dependent variable and x that is our independent variable. Now let's move on and see the mathematical implementation of the things. All right. So we have x= 1 2 3 4 5. Let's plot them on the x-axis. So 0 1 2 3 4 5 6. All right. And we have y as 3 4 2 4 5. All right. So let's plot 1 2 3 4 5 on the y-axis. Now let's plot our coordinates one by one. So x= 1 and y= 3. So we have here x= 1 and y = 3. So this is our point 1 3. So similarly we have 1 3 2 4 32 44 and 55. All right. So moving on ahead let's calculate the mean of x and y and plot it on the graph. All right. So mean of x is 1 + 2 + 3 + 4 + 5 / 5 that is 3. All right. Similarly, mean of y is 3 + 4 + 2 + 4 + 5 that is 18. So 18 divided by 5 that is nothing but 3. 6. All right. So next what we'll do we'll plot our mean that is 3a 3. 6 on the graph. Okay. So there's 3a 3. 6. See our goal is to find or predict the best fit line using the le square method. All right. So in order to find that we first need to find the equation of line. So let's our regression line. All right. So let's suppose this is our regression line y= mx + c. Now we have a equation of line. So all we need to do is find the value of m and c where m equals summation of x - xr multiplied by y - y bar upon the summation of x - xr square. Don't get confused. Let me resolve it for you. All right. So moving on ahead as a part of formula what we are going to do? We'll calculate x - xr. So we have x as 1 - xr as 3. So 1 - 3 that is -2. Next we have x = 2 minus it mean 3 that is -1. Similarly we have 3 - 3 0 4 - 3 1 5 - 3 2. All right. So x - xr it's nothing but the distance of all the point through the line y = 3. And what does this y - y bar implies? It implies the distance of all the point from the line x= 3. 6. Fine. So let's calculate the value of y - y bar. So starting with y = 3 minus value of y bar that is 3. 6. So it is 3 - 3. 6 how much? Minus of 0. 6. Next is 4 - 3. 6 that is 0. 4. Next 2 minus of 1. 6. 4. Again 5 - 3. 6 that is 1. 4. All right. So now we are done with y - y bar. Fine. Now next we'll calculate x - xr square. So let's it is - 2 square that is 4 - 1 square that is 1 0 square 0 1 square 1 2 square 4. Fine. So now in our table we have x - xr, y - y bar and x - xr whole square. Now what we need? We need the product of x - xr multiplied by y - y bar. All right. So let's see the product of x - xr m * y - y bar. That is minus of 2 * - of 0. 2 minus of 1 * 0. 4 that is minus of 0. 4 0 * minus of 1. 1 * 0. 4 4 that is 0. 4 and next 2 * 1. 4 that is 2. 8. All right. Now almost all the parts of our formula is done. So now what we need to do is get the summation of last two columns. All right. So the summation of x - xr whole square is 10 and the summation of x - xr multiplied by y - y bar is 4. So the value of m will be equal to 4x 10. Fine. So let's put this value of m= 0. 4 in our line y = mx + c. So let's fill all the points into the equation and find the value of c. So we have y as 3. 6 remember the mean y m as 0. 4 which we calculated just now x as the mean value of x that is 3 and we have the equation as 3. 6 = 0. 4 * 3 + c. All right that is 3. 6 6 = 1. 2 + C. So what is the value of C? That is 3. 6 - 1. 2 that is 2. 4. All right. So what we had? We had M= 0. 4. C as 2. 4. And then finally when we calculate the equation of regression line, what we get is Y = 0. 4 * of X + 2. 4. So this is the regression line. All right. So this is how you are plotting your points. This is your actual point. All right. Now for given m= 0. 4 and c= 2. 4 let's predict the value of y for x= 1 2 3 4 and 5. So when x= 1 the predicted value of y will be 0. 4 * 1 + 2. 8. Similarly when x= 2 predicted value of y will be 0. 4 * 2 + 2. 4 that equals to 3. 2. Similarly x= 3 y will be 3. 6. X= 4, Y will be 4. 0. X= 5, Y will be 4. 4. So let's plot them on the graph. And the line passing through all these predicting point and cutting Y-axis at 2. 4 is the line of regression. Now your task is to calculate the distance between the actual and the predicted value. And your job is to reduce the distance. All right? Or in other words, you have to reduce the error between the actual and the predicted value. The line with the least error will be the line of linear regression or regression line and it will also be the best fit line. All right. So this is how things work in computer. So what it do? It performs n number of iteration for different values of m. m it will calculate the equation of line where y= mx + c. Right? So as the value of m changes the line is changing. So iteration will start from one. All right? And it will perform n number of iteration. So after every iteration what it will do it will calculate the predicted value according to the line and compare the distance of actual value to the predicted value and the value of M for which the distance between the actual and the predicted value is minimum will be selected as the best fit line. All right. Now that we have calculated the best fit line now it's time to check the goodness of fit or to check how good a model is performing. So in order to do that we have a method called R square method. So what is this R square? Well, R squared value is a statistical measure of how close the data are to the fitted regression line. In general, it is considered that a high R squared value model is a good model. But you can also have a low R squared value for a good model as well or a high R squared value for a model that does not fit at all. All right, it is also known as coefficient of determination or the coefficient of multiple determination. Let's move on and see how R square is calculated. So these are our actual values plotted on the graph. We had calculated the predicted values of y as 2. 8 3. 2 3. 6 4. 0 4. 4. Remember when we calculated the predicted values of y for the equation y predicted equals 0. 4 for every x= 1 2 3 4 and 5. From there we got the predicted values of y. All right. So let's plot it on the graph. So these are point and the line passing through these points are nothing but the regression line. All right. Now what you need to do is you have to check and compare the distance of actual minus mean versus the distance of predicted minus mean. All right. So basically what you're doing you're calculating the distance of actual value to the mean to distance of predicted value to the mean. All right. So this is nothing but R square. In mathematically you can represent R square as summation of Y predicted values minus Y bar whole square divided by summation of Y minus Y bar square where Y is the actual value Y is the predicted value and Y bar is the mean value of Y that is nothing but 3. 6. So remember this is our formula. So next what we'll do we'll calculate Y minus Y bar. So we have Y as 3 Y bar as 3. 6. So we'll calculate it as 3 - 3. 6 that is nothing but minus of 0. 6. 6. Similarly, for y= 4 and y bar= 3. 6, we have y - y bar as 0. 4. Then 2 - 3. 6 it is 1. 6. 4 - 3. 6 again 0. 4. And 5 - 3. 4. So we got the value of y - y bar. Now what we have to do? We have to take it square. So we have minus of 0. 6 square as 0. 36 0. 4 square as 0. 16 minus of 1. 6 6 square as 2. 56, 0. 16 and 1. 4 square as 1. 96. Now as a part of formula what we need? We need our YP minus Y bar value. So these are YP values and we have to subtract it from the mean of Y. So 2. 8 - 3. 6 that is - 0. 8. Similarly we'll get 3. 2 - 3. 4. Then 3. 6 - 3. 4. 0 - 3. 6 6 that is 0. 4 then 4. 4 - 3. 8 so we calculated the value of yp minus y bar now it's our turn to calculate the value of yp - y bar square next we have minus of 0. 8 square as 0. 64 minus of 0. 16 0 square 0 0. 4 square as again 0. 16 and 0. 8 squared as 0. 64 64. All right. Now as a part of formula what it suggest? It suggests me to take the summation of ypus y bar whole square and summation of y - y bar whole square. All right. Let's see. So on summing y - y bar whole square what you get is 5. 2 and summation of ypus y bar square you get 1. 6. So the value of r² can be calculated as 1. 6 upon 5. 2. Fine. So the result which you'll get is approximately equal to 0. 3. Well, this is not a good fit. All right. So it suggests that the data points are far away from the regression line. All right. So this is how your graph will look like when R square is 0. 3. When you increase the value of R square to 0. 7. So you'll see that the actual value would lie closer to the regression line. When it reaches to 0. 9, it comes more close. And when the value approximately equals to 1 then the actual values lies on the regression line itself. For example in this case if you get a very low value of R square suppose 0. 02. So in that case what you'll see that the actual values are very far away from the regression line or you can say that there are too many outliers in your data. You cannot forecast anything from the data. All right. So this was all about the calculation of R square. Now you might get a question like are low values of R square always bad? Well, in some field it is entirely expected that a R square value will be low. For example, any field that attempts to predict human behavior such as psychology typically has R squar values lower than around 50%. Through which you can conclude that humans are simply harder to predict than the physical process. Furthermore, if your R squared value is low but you have statistically significant predicators, then you can still draw important conclusion about how changes in the predicator values are associated with the changes in the response value. Regardless of the R squar, the significant coefficient still represent the mean change in the response for one unit of change in the predicator while holding other predicators in the model constant. Obviously, this type of information can be extremely valuable. All right. So this was all about the theoretical concept. Now let's move on to the coding part and understand the code in depth. So for implementing linear regression using Python, I'll be using Anaconda with Jupyter notebook installed on it. So all right, this is our Jupyter notebook and we are using Python 3. 0 on it. All right. So we are going to use a data set consisting of head size and human brain of different people. All right. So let's import our data set percent matlib in line. We are importing numpy as np, pandas as pd and mattplotlib. And from mplotlib, we are importing piplot of that as pl. All right. Next, we'll import our data headbrain csv and store it in the data variable. Let's execute the run button and see the output. So this aster symbol, it symbolizes that it's still executing. So there's our output. Our data set consists of 237 rows and four columns. We have columns as gender, age range, head size in centime cube and brain width in gram. Fine. So this is our sample data set. This is how it looks. It consist of all these data set. So now that we have imported our data. So as you can see there are 237 values in the training set. So we can find a linear relationship between the head size and the brain weights. So now what we'll do we'll collect X and Y. The x would consist of the head size values and the y would consist of brainwidth values. So collecting x and y. Let's execute the run. Done. Next what we'll do? We need to find the values of b1 or b or you can say m and c. So we'll need the mean of x and y values. First of all what we'll do? We'll calculate the mean of x and y. So mean x equal np do mean x. So mean is a predefined function of numi. Similarly, mean underscorey equal np dot mean of y. So what it will return? It will return the mean values of y. Next, we'll check the total number of values. So m equal length of x. All right. Then we'll use the formula to calculate the values of b1 and b or mnc. All right. Let's execute the run button and see what is the result. So as you can see here on the screen, we have got b1 as 0. 263 263 and b as 325. 57. All right. So now that we have our coefficient, so comparing it with the equation y= mx + c, you can say that brain weight equals 0. 263 multiplied by head size plus 325. 57. So you can say that the value of m here is 0. 263 and the value of c here is 325. 57. All right. So this is our linear model. Now let's plot it and see graphically. Let's execute it. So this is how our plot looks like. This model is not so bad. But we need to find out how good our model is. So in order to find it, there are many methods like root mean square method, the coefficient of determination or the R square method. So in this tutorial I have told you about R square method. So let's focus on that and see how good our model is. So let's calculate the R square value. All right. Here SS_T is the total sum of square. SS_R square of residuals and R square as per formula is 1 minus total sum of squares upon total sum of square of residuals. All right. Next when you execute it you'll get the value of R square as 0. 63 which is pretty very good. Now that you have implemented simple linear regression model using le square method, let's move on and see how will you implement the model using machine learning library called scikitlearn. All right. So this scikitlearn is a simple machine learning library in python. Building machine learning model are very easy using scikitlearn. So suppose this is your python code. So using the scikitlearn libraries your code shortens to this length. All right. So let's execute the run button and see you'll get the same R2 score as let's understand the what and why of logistic regression. Now this algorithm is most widely used when the dependent variable or you can say the output is in the binary format. So here you need to predict the outcome of a categorical dependent variable. So the outcome should be always discrete or categorical in nature. Now by discrete I mean the value should be binary or you can say you just have two values. It can either be zero or one. yes or no. Either be true or false or high or low. So only these can be the outcomes. So the value which you need to predict should be discrete or you can say categorical in nature. Whereas in linear regression we have the value of y or you can say the value you need to predict is in a range. So that is how there's a difference between linear regression and logistic regression. Now you must be having a question why not linear regression. Now guys in linear regression the value of y or the value which you need to predict is in a range. But in our case as in the logistic regression we just have two values. It can be either zero or it can be one. It should not entertain the values which is below zero or above one. But in linear regression we have the value of y in the range. So here in order to implement logistic regression we need to clip this part. So we don't need the value that is below zero or which is above one. So since the value of y will be between only 0 and one that is the main rule of logistic regression. The linear line has to be clipped at 0 and one. Now once we clip this graph it would look somewhat like this. So here you getting a curve which is nothing but three different straight lines. So here we need to make a new way to solve this problem. So this has to be formulated into equation and hence we come up with logistic regression. So here the outcome is either zero or one which is the main rule of logistic regression. So with this our resulting curve cannot be formulated. So hence our main aim to bring the values to 0 and one is fulfilled. So that is how we came up with logistic regression. Now here once it gets formulated into an equation it looks somewhat like this. So guys this is nothing but a scurve or you can say the sigmoid curve or sigmoid function curve. So this sigmoid function basically converts any value from minus infinity to your discrete values which a logistic regression wants or you can say the values which are in binary format either zero or one. So if you see here the values as either zero or one and this is nothing but just a transition of it. But guys there's a catch over here. So let's say I have a data point that is 0. 8. Now how can you decide whether your value is zero or one? Now here you have the concept of threshold which basically divides your line. So here threshold value basically indicates the probability of either winning or losing. So here by winning I mean the value is equals to 1 and by losing zero. But how does it do that? Let's say I have a data point which is over here. Let's say my cursor is at 0. 8. So here I'll check whether this value is less than my threshold value or not. Let's say if it is more than my threshold value, it should give me the result as one. If it is less than that then it should give me the result as zero. So here my threshold value is 0. 5. Now I need to define that if my value let's say 0. 8 it is more than 0. 5 then the value shall be rounded off to one and let's say if it is less than 0. 5 let's say I have a value 0. 2 then should reduce it to zero. So here you can use the concept of threshold value to find your output. So here it should be discrete. It should be either zero or it should be one. So I hope you caught this curve of logistic regression. So guys this is the sigmoid s curve. So to make this curve we need to make an equation. So let me address that part as well. So let's see how any equation is formed to imitate this functionality. So over here we have an equation of a straight line which is y = to mx + c. So in this case

### What is Machine Learning? [4:02:26]

I just have only one independent variable. But let's say if we have many independent variable then the equation becomes m1 x1 + m2 x2 + m3 x3 and so on till mn xn. Now let us put in b and x. So here the equation becomes y = to b1 x1 + b2 x2 + b3 x3 and so on till bn xn plus c. So guys your equation of the straight line has the range from minus infinity to infinity. But in our case or you can say in logistic equation the value which we need to predict or you can say the y value it can have the range only from 0 to 1. So in that case we need to transform this equation. So to do that what we had done we have just divide the equation by 1 - y. So now if y is equals to 0. So 0 1 - 0= to 1. So 0 / 1 is again 0. And if we take y is equals to 1. Then 1 / 1 - 1 which is 0. So 1 / 0 is infinity. So here my range is now between 0 to infinity. But again we want the range from minus infinity to infinity. So for that what we'll do? We'll have the log of this equation. So let's go ahead and have the logarithmic of this equation. So here we have just transformed it further to get the range between minus infinity to infinity. So over here we have log of y / 1 - 1 and this is your final logistic regression equation. So guys don't worry you don't have to write this formula or memorize this formula in Python. You just need to call this function which is logistic regression and everything will be automatically for you. So I don't want to scare you with the maths and the formulas behind it but it's always good to know how this formula was generated. So I hope you guys are clear with how logistic regression comes into the picture. Next let us see what are the major differences between linear regression versus logistic regression. Now first of all in linear regression we have the value of y as a continuous variable or the variable which we need to predict are continuous in nature. Whereas in logistic regression we have the categorical variable. So here the value which you need to predict should be discrete in nature. It should be either zero or one or it should have just two values to it. For example, whether it is raining or it is not raining. Is it humid outside humid outside. Now does it going to snow or it is not going to snow. So these are the few example where you need to predict where the values are discrete or you can just predict whether this is happening or not. Next linear regression solves your regression problems. So here you have a concept of independent variable and a dependent variable. So here you can calculate the value of y which you need to predict using the value of x. So here your y variable or you can say the value that you need to predict are in a range but whereas in logistic regression you have discrete values. So logistic regression basically solves your classification problem. So it can basically classify it and it can just give you result whether this event is happening or not. So I hope it is pretty much clear till now. Next in linear regression the graph that you have seen is a straight line graph. So over here you can calculate the value of y with respect to the value of x. Whereas in logistic regression the curve that we got was a scurve or you can say the sigmoid curve. So using the sigmoid function you can predict your y values. So I hope you guys are clear with the differences between the linear regression and logistic regression. Moving ahead let us see the various use cases wherein logistic regression is implemented in real life. So the very first is weather prediction. Now logistic regression helps you to predict your weather. For example, it is used to predict whether it is raining or not, whether it is sunny, is it cloudy or not. So all these things can be predicted using logistic regression. Whereas you need to keep in mind that both linear regression and logistic regression can be used in predicting the weather. So in that case, linear regression helps you to predict what will be the temperature tomorrow. Whereas logistic regression will only tell you whether it's going to rain or not or whether it's cloudy or not, whether it's going to snow or not. So these values are discrete. Whereas if you apply linear regression, you'll be predicting things like what is the temperature tomorrow or day after tomorrow and all those things. So these are the slight differences between linear regression and logistic regression. Now moving ahead we have classification problem. So Python performs multiclass classification. So here it can help you tell whether it's a bird or it's not a bird. Then you classify different kind of mammals. Let's say whether it's a dog or it's not a dog. Similarly you can check it for reptile whether it's a reptile or not a reptile. So in logistic regression it can perform multiclass classification. So this point I have already discussed that it is used in classification problems. Next it also helps you to determine the illness as well. So let me take an example. Let's say a patient goes for routine checkup in a hospital. So what doctor will do it will perform various test on the patient and will check whether the patient is actually ill or not. So what will be the features? So doctor can check the sugar level, the blood pressure, then what is the age of the patient, is it very small or is it a old person, then what is the previous medical history of that patient and all of these features will be recorded by the doctor and finally doctor checks the patient data and determines the outcome of his illness and the severity of illness. So using all the data a doctor can identify whether a patient is ill or not. So these are the various use cases in which you can use logistic regression. Now I guess enough of theory part. So let's move ahead and see some of the practical implementation of logistic regression. So over here I'll be implementing two projects wherein I have the data set of a Titanic. So over here we'll predict what factors made people more likely to survive the sinking of the Titanic ship and in my second project we'll see the data analysis on the SUV cars. So over here we have the data of the SUV cars who can purchase it and what factors made people more interested in buying SUV. So these will be the major questions as to why you should implement logistic regression and what output will you get by it. So let's start by the very first project that is Titanic data analysis. So some of you might know that there was a ship called as Titanic which basically hit an iceberg and it sunk to the bottom of the ocean and it was a big disaster at that time because it was the first voyage of the ship and it was supposed to be really strongly built and one of the best ships of that time. So it was a big disaster of that time and of course there's a movie about this as well. So many of you might have watched it. So what we have data of the passengers those who survived and those who did not survive in this particular tragedy. So what you have to do you have to look at this data and analyze which factors would have been contributed the most to the chances of a person's survival on the ship or not. So using the logistic regression we can predict whether the person survived or the person died. Now apart from this we'll also have a look with the various features along with that. So first let us explore the data set. So over here we have the index value. Then the first column is passenger ID. Then my next column is survived. So over here we have two values a zero and a one. So zero stands for did not survive and one stands for survive. So this column is categorical where the values are discrete. Next we have passenger class. So over here we have three values 1 2 and three. So this basically tells you that whether a passenger is traveling in the first class, second class or third class. Then we have the name of the passenger. We have the sex or you can say the gender of the passenger. Whether passenger is a male or female. Then we have the age. We have the sib SP. So this basically means the number of siblings or the spouses aboard the Titanic. So over here we have values such as 1 zero and so on. Then we have patch. So patch is basically the number of parents or children aboard the Titanic. So over here we also have some values. Then we have the ticket number, we have the fair, we have the cabin number and we have the embarked column. So in my embarked column we have three values. We have S, C and Q. So S basically stands for Southampton, C stands for Chairborg and Q stands for Cunestown. So these are the features that we'll be applying our model on. So here we'll perform various steps and then we'll be implementing logistic regression. So now these are the various steps which are required to implement any algorithm. So now in our case we are implementing logistic regression. So very first step is to collect your data or to import the libraries that are used for collecting your data and then taking it forward. Then my second step is to analyze your data. So over here I can go through the various fields and then I can analyze the data. I can check did the females or children survive better than the males or did the rich passenger survive more than the poor passenger or did the money matter as in who paid more to get into the ship? Were they evacuated first? And what about the workers? Does the worker survived or what is the survival rate if you were the worker in the ship and not just a traveling passenger? So all of these are very interesting questions and you would be going through all of them one by one. So in this stage you need to analyze your data and explore your data as much as you can. Then my third step is to wrangle your data. Now data wrangling basically means cleaning your data. So over here you can simply remove the unnecessary items or if you have a null values in the data set you can just clear that data and then you can take it forward. So in this step you can build your model using the train data set and then you can test it using a test. So over here you will be performing a split which basically split your data set into training and testing data set and finally you will check the accuracy so as to ensure how much accurate your values are. So I hope you guys got these five steps that we're going to implement in logistic regression. So now let's go into all these steps in detail. So number one we have to collect your data or you can say import the libraries. So let me just show you the implementation part as well. So I'll just open my Jupyter notebook and I'll just implement all of these steps side by side. So guys, this is my Jupyter notebook. So first, let me just rename Jupyter notebook to let's say Titanic data analysis. Now our first step was to import all the libraries and collect the data. So let me just import all the libraries first. So first of all, I'll import pandas. So pandas is used for data analysis. So I'll say import pandas as pd. Then I'll be importing numpy. So I'll say import numpy as np. So, Numpa is a library in Python which basically stands for numerical Python and it is widely used to perform any scientific computation. Next, we'll be importing seaborn. So, seaborn is a library for statistical plotting. So, I'll say import seaborn as SNS. I'll also import mattplot liib. So, mattplot library is again for plotting. So, I'll say import mattplot. plot as pl. Now to run this library in Jupyter notebook, all I have to write in is percentage math plot lib in line. Next, I'll be importing one module as well so as to calculate the basic mathematical functions. So I'll say import maths. So these are the libraries that I'll be needing in this Titanic data analysis. So now let me just import my data set. So I'll take a variable let's say Titanic data and using the pandas I will just read my CSV or you can say the data set. I'll write the name of my data set that is Titanic CSV. Now I have already showed you the data set. So over here let me just print the top 10 rows. So for that I'll just say I'll take the variable Titanic data dot head and I'll say the top 10 rows. So now I'll just run this. So to run this so I just have to press shift plus enter or else you can just directly click on this cell. So over here I have the index. We have the passenger ID which is nothing but again the index which is starting from one. Then we have the survived column which has the categorical values or you can say the discrete values which is in the form of zero or one. Then we have the passenger class. We have the name of the passenger, sex, age and so on. So this is the data set that I'll be going forward with. Next let us print the number of passengers which are there in this original data set. So for that I'll just simply type in print. I'll say number of passengers and using the length function I can calculate the total length. So I'll say length and inside this I'll be passing this variable which is Titanic data. So I'll just copy it from here. I'll just paste it dot index and next. So let me just print this one. So here the number of passengers which are there in the original data set we have is 891. So around this number were traveling in the Titanic ship. So over here my first step is done where you have just collected data, imported all the libraries and find out the total number of passengers which are traveling in Titanic. So now let me just go back to presentation and let's see what is my next step. So we're done with the collecting data. Next step is to analyze your data. So over here we'll be creating different plots to check the relationship between variables as in how one variable is affecting the other. So you can simply explore your data set by making use of various columns and then you can plot a graph between them. So you can either plot a correlation graph, you can plot a distribution graph. It's up to you guys. So let me just go back to my Jupyter notebook and let me analyze some of the data over here. My second part is to analyze data. So I just put this in header two. Now to two, I just have to go on code, click on markdown and I'll just run this. So first let us plot a count plot where you compare between the passengers who survived and who did not survive. So for that I'll be using the seaborn library. So over here I have imported seaborn as SNS. So I don't have to write the whole name. I'll simply say sns. ountplot. I'll say x is equals to survived and the data that I'll be using is the titanic data or you can say the name of variable in which you have stored your data set. So now let me just run this. So over here as you can see I have survived column on my x-axis and on the y-axis I have the count. So zero basically stands for did not survive and one stands for the passengers who did survive. So over here you can see that around 550 of the passengers who did not survive and there were around 350 passengers who only survived. So here you can basically conclude that there are very less survivors than non-s survivors. So this was the very first plot. Now let us plot another plot to compare the sex as to whether out of all the passengers who survived and who did not survive how many were men and how many were female. So to do that I'll simply say sns. countplot. I'll add the hue as sex. So I want to know how many females and how many males survived. Then I'll be specifying the data. So I'm using Titanic data set. And let me just run this. Okay, I've done a mistake over here. So over here you can see I have survived column on the X-axis and I have the count on the Y. Now so here your blue color stands for your male passengers and orange stands for your female. So as you can see here the passengers who did not survive that has a value zero. So we can see that majority of males did not survive and if we see the people who survived here we can see the majority of females survived. So this basically concludes the gender of the survival rate. So it appears on average women were more than three times more likely to survive than men. Next let us plot another plot where we have the hue as the passenger class. So over here we can see which class that the passenger was traveling in. Whether it was traveling in class 1, two or three. So for that I'll just write the same command. I'll say sns doc countplot. I'll keep my x-axis as sub only. I'll change my hue to passenger class. So my variable named as pclass. And the data set that I'll be using is Titanic data. So this is my result. So over here you can see I have blue for first class, orange for second class and green for the third class. So here the passengers who did not survive were majorly of the third class or you can say the lowest class or the cheapest class to get into the Titanic and the people who did survive majorly belong to the higher classes. So here one and two has more rise than the passenger who were traveling in the third class. So here we have concluded that the passengers who did not survive were majorly of third class or you can say the lowest class and the passengers who were traveling in first and second class would tend to survive more. Next let us plot a graph for the age distribution. Over here I can simply use my data. So we'll be using pandas library for this. I'll declare an array and I'll pass in the column that is age. So I plot and I want a histogram. So I'll say plot. hist. So you can notice over here that we have more of young passengers or you can say the children between the ages 0 to 10 and then we have the average age people and if you go ahead lesser would be the population. So this is the analysis on the age column. So we saw that we have more young passengers and more mediocre age passengers which are traveling in the Titanic. So next let me plot a graph of fair as well. So I'll say Titanic data I'll say fair and again I'll plot a histogram. So I'll say hist. So here you can see the fair size is between 0 to 100. Now let me add the bin size so as to make it more clear. So over here I'll say bin is equals to let's say 20 and I'll increase the figure size as well. So I'll say fix size let's say I'll give the dimensions as 10 x 5. So it is bins. So this is more clear

### Linear Regression [4:19:21]

now. Next let us analyze the other columns as well. So I'll just type in titanic data and I want the information as to what all columns are left. So here we have passenger ID which I guess it's of no use. Then we have seen how many passengers survived and how many did not. We also see the analysis on the gender basis. We saw whether female tend to survive more or the men tend to survive more. Then we saw the passenger class whether passenger is traveling in the first class, second class or third class. Then we have the name. So in name we cannot do any analysis. We saw the sex. We saw the age as well. Then we have s sp. So this stands for the number of siblings or the spouses which are aboard the Titanic. So let us do this as well. So I'll say sns. countplot. count plot I'll mention X as SIB SP and I'll be using the Titanic data. So you can see the plot over here. So over here you can conclude that it has the maximum value on zero. So you can conclude that neither a children nor a spouse was on board the Titanic. Now second most highest value is one and then we have very less values for 2 3 4 and so on. Next if I go above we saw this column as well. Similarly you can do for par. So next we have par or you can say the number of parents or children which were aboard the Titanic. So similarly you can do this as well. Then we have the ticket number. So I don't think so any analysis is required for ticket. Then we have fair. So fair we have already discussed as in the people who tend to travel in the first class usually pay the highest fair. Then we have the cabin number and we have embarked. So these are the columns that we'll be doing data wrangling on. So we have analyzed the data and we have seen quite a few graphs in which we can conclude which variable is better than the another or what is the relationship they hold. So third step is my data wrangling. So data wrangling basically means cleaning your data. So if you have a large data set you might be having some null values or you can say n values. So it's very important that you remove all the unnecessary items that are present in your data set. So removing this directly affects your accuracy. So I'll just go ahead and clean my data by removing all the N values and unnecessary columns which has a null value in the data set. So next I'll be performing data wrangling. So first of all I'll check whether my data set is null or not. So I'll say titanic data which is the name of my data set and I'll say is null. So this will basically tell me what all values are null and it will return me a boolean result. So this basically checks the missing data and your result will be in boolean format as in the result will be true or false. So false mean if it is not null and true means if it is null. So let me just run this. So over here you can see the values as false or true. So false is where the value is not null and true null. So over here you can see in the cabin column we have the very first value which is null. So we have to do something on this. So you can see that we have a large data set. So the counting does not stop and we can actually see the sum of it. We can actually print the number of passengers who have the n value in each column. So I'll say Titanic data is null and I want the sum of it. So I'll say dot sum. So this will basically print the number of passengers who have the n values in each column. So we can see that we have missing values in each column that is 177. Then we have the maximum value in the cabin column and we have very less in the embark column that is two. So here if you don't want to see this numbers you can also plot a heat map and then you can visually analyze it. So let me just do that as well. So I'll say snhe heatmap I'll say y tick labels false. So I'll just run this. So as we have already seen that there were three columns in which missing data value was present. So this might be age. So over here almost 20% of age column has a missing value. Then we have the caving columns. So this is quite a large value. And then we have two values for embark column as well. Add a C map for color coding. So I'll say C map. So if I do this so the graph becomes more attractive. So over here your yellow stands for true or you can say the values are null. So here we have concluded that we have the missing value of age. We have a lot of missing values in the cabin column and we have very less value which is not even visible in the embark column as well. So to remove these missing values you can either replace the values and you can put in some dummy values to it or you can simply drop the column. So here let us first pick the age column. So first let me just plot a box plot and they will analyze with having a column as age. So I'll say SNS dot box plot. I'll say X is equals to passenger class. So it's P class. I'll say Y is equals to age. And the data set that I'll be using is Titanic set. So I'll say data is equals to Titanic data. You can see the age in first class and second class tends to be more older rather than we have it in the third class. Well, that depends on the experience how much you earn or might be there any number of reasons. So here we concluded that passengers who were traveling in class one and class 2 are tend to be older than what we have in the class 3. So we have found that we have some missing values in M. Now one way is to either just drop the column or you can just simply fill in some values to them. So this method is called as imputation. Now to perform data wrangling or cleaning let us first print the head of the data set. So I'll say Titanic head. So it's Titanic data. Let's say I just want the five rows. So here we have survive which is again categorical. So in this particular column I can apply logistic regression. So this can be my Y value or the value that I need to predict. Then we have the passenger class. We have the name. Then we have ticket number fair cabin. So over here we have seen that in cabin we have a lot of null values or you can say the N values is quite visible as well. So first of all we'll just drop this column. So for dropping it I'll just say titanic data and I'll simply type in drop and the column which I need to drop. So I have to drop the cabin column. I'll mention the axis equals to 1 and I'll say in place also to true. So now again I'll just print the head and let us see whether this column has been removed from the data set or not. So I'll say titanic do head. So as you can see here we don't have cin column anymore. Now you can also drop the NA values. So I'll say Titanic data dot drop all the NA values or you can say N which is not a number and I'll say in place is equals to true. What's Titanic? So over here let me again plot the heat map and let's say the values which were before showing a lot of null values has it been removed or not. Heat map I'll pass in the data set. I'll check it is null. I'll say y tick labels is equals to false and I don't want color coding. So again I'll say false. So this will basically help me to check whether my values has been removed from the data set or not. So as you can see here I don't have any null values. So it's entirely black. Now you can actually know the sum as well. So I'll just go above. So I'll just copy this part and I just use the sum function to calculate the sum. So here that tells me that data set is clean as in the data set does not contain any null value or any N value. So now we have wrangled our data or you can say clean our data. So here we have done just one step in data wrangling that is just removing one column out of it. Now you can do a lot of things. You can actually fill in the values with some other values or you can just calculate the mean and then you can just fit in the null values. But now if I see my data set so I'll say Titanic data. head. But now if I see over here I have a lot of string values. So this has to be converted to a categorical variables in order to implement logistic regression. So what we will do? We will convert this to categorical variable into some dummy variables. And this can be done using pandas because logistic regression just take two values. So whenever you apply machine learning you need to make sure that there are no string values present because it won't be taking these as your input variables. So using string you don't have to predict anything. But in my case, I have the survived columns. So I need to predict how many people tend to survive and how many did not. So zero stands for did not survive and one stands for survive. So now let me just convert these variables into dummy variables. So I'll just use pandas and I'll say pd get dummies. You can simply press tab to autocomplete. I'll say titanic data and I'll pass the sex. So you can just simply click on shift plus tab to get more information on this. So here we have the type data frame and we have the passenger ID survived and the passenger class. So if you run this you'll see that zero basically stands for not a female and one stand for it is a female. Similarly for male zero stand for it's not male and one stand for male. Now we don't require both these columns because one column itself is enough to tell us whether it's male or you can say female or not. So let's say if I want to keep only male. Now I'll say if the value of male is one so it is definitely a male and is not a female. So that is how you don't need both of these values. So for that I'll just remove the first column let's say female. So I'll say drop first and true. So over here it has given me just one column which is male and has the value zero and one. Now let me just set this as a variable. Let's say sex. So over here I can say sex do head. I just want to see the first five rows. Sorry it's dot. So this is how my data looks like. Now here we have done it for sex. Then we have the numerical values in age. spouses. We have the fair and we have embarked as well. So in embarked the values are in S, C and Q. So here also we can apply this get dummy function. So let's say I'll take a variable. Let's say embark. I'll use the pandas library. I'll enter the column main that is embarked. So let me just print the head of it. So I'll say embark head. So over here we have CQ and S. Now here also we can drop the first column because these two values are enough whether the passenger is either traveling for Q that is Ctown S for Southampton and if both the values are zero then definitely the passenger is from Cherborg that is the third value. So you can again drop the first value. So I'll say drop and true. Let me just run this. So this is how my output looks like. Now similarly you can do it for passenger class as well. So here also we have three classes 1 2 and three. So I'll just copy the whole statement. So let's say I want the variable name let's say PCL. I'll pass in the column name that is Pclass and I'll just drop the first column. So here also the values would be 1 2 or three and I'll just remove the first column. So here we just left with two and three. So if both the values are zero then definitely the passengers traveling in the first class. Now we have made the values as categorical. Now my next step would be to concatenate all these new rows into a data set or you can say Titanic data using the pandas. We'll just concatenate all these columns. So I'll say p. concat and I'll say we have to concatenate sex embark and pcl and then I'll mention the access to one. I'll just run this. Okay, I need to print the head. So over here you can see that these columns have been added over here. So we have the male column which basically tells whether a person is male or it's a female. Then we have the embark which is basically Q and S. So if it's traveling from Cunestown the value would be one else it would be zero. And if both of these values are zero it is definitely traveling from chair. Then we have the passenger class as two and three. So the value of both these is zero. Then the passengers traveling in class one. So I hope you got this till now. Now these are the irrelevant columns that we have it over here. So we can just drop these columns. We'll drop in Pclass the embarked column and the sex column. So I'll just type in Titanic data dot drop and I'll mention the columns that I want to drop. So I'll say I'll even delete the passenger ID because it's nothing but just the index value which is starting from one. So I'll drop this as well. Then I don't want name as well. So I'll delete name as well. Then what else we can drop? We can drop the ticket as well. And then I'll just mention the access and I'll say in place is equals to true. Okay. So my column name starts from upper case. So these has been dropped. Now let me just print my data set again. So this is my final data set guys. We have the survived column which has the value zero and one. Oh, we forgot to drop this as well. So no worries. I'll drop this again. So now let me just run this. So over here we have the survive, we have the age, we have the SIB, SP, we have the parch, we have fair, male and these we have just converted. So here we have just performed data wrangling or you can say clean the data and then we have just converted the values of gender to male then embark to Q& s and the passenger class to two and three. So this was all about my data rounding or just cleaning the data. Then my next step is training and testing your data. So here we will split the data set into train subset and test subset and then what we'll do we'll build a model on the train data and then predict the output on your test data set. So let me just go back to Jupiter and let us implement this as well. Over here I need to train my data set. So I'll just put this in heading three. So over here you need to define your dependent variable and independent variable. So here my y is the output or you can say the value that you need to predict. So over here I'll write Titanic data. I'll take the column which is survived. So basically I have to predict this column whether the passenger survived or not. And as you can see we have the discrete outcome which is in the form of 0 and one. And rest all the things we can take it as a features or you can say independent variable. So I'll say Titanic data dot drop. So we'll just simply drop this survive and all the other columns will be my independent variable. So everything else are the features which leads to the survival rate. So once we have defined the independent variable and the dependent variable, next step is to split your data into training and testing subset. So for that we'll be using skarn. I just type in from sklearn. cross validation import train test split. Now here if you just click on shift and tab you can go to the documentation and you can just see the examples over here I'll click on plus to open it and then I'll just go to examples and see how you can split your data. So over here you have x train x test y train y test and then using this train test split you can just pass in your independent variable and dependent variable and just define a size and a random state to it. So let me just copy this and I'll just paste it over here. Over here we'll train test. Then we have the dependent variable train and test. And using the split function we'll pass in the independent and dependent variable. And then we'll set a split size. So let's say I'll put it at 0. 3. So this basically means that your data set is divided in 0. 3 that is in 7030 ratio. And then I can add any random state to it. So let's say I'm applying one. This is not necessary. If you want the same result as that of mine, you can add the random state. So this will basically take exactly the same sample every time. Next I have to train and predict by creating a model. So here logistic regression will grab from the linear regression. So next I'll just type in from sklearn dot linear model import logistic regression. Next I'll just create the instance of this logistic regression model. So I'll say log model is equals to logistic regression. Now I just need to fit my model. So I'll say log model dofit fit and I'll just pass in my x train and y train. All right. So here it gives me all the details of logistic regression. So here it gives me the class weight dual fit intercept and all those things. Then what I need to do I need to make prediction. So I'll take a variable let's say predictions and I'll pass on the model to it. So I'll say log model dotpredict and I'll pass in the value that is X test. So here we have just created a model fit that model and then we had made predictions. So now to evaluate how my model has been performing. So you can simply calculate the accuracy or you can also calculate a classification report. So don't worry guys, I'll be showing both of these methods. So I'll say from sklearnmetrics import classification report. So over here I'll use classification report and inside this I'll be passing in Y test and the predictions. So guys this is my classification report. So over here I have the precision, I have the recall, we have the F1 score and then we have support. So here we have the value of precision as 75, 72 and 73 which is not that bad. Now in order to calculate the accuracy as well, you can also use the concept of confusion matrix. So if you want to print the confusion matrix, I'll simply say from sklearn dometrics import confusion matrix first of all and then we'll just print this. So here my function has been imported successfully. So I'll say confusion matrix and I'll again pass in the same variables which is y test and predictions. So I hope you guys already know the concept of confusion matrix. So can you guys give me a quick confirmation as to whether you guys remember this confusion matrix concept or not. So if not I can just quickly summarize this as well. Okay Jagriti says a yes. Okay. Swati is not clear with this. So I'll just tell you in a brief what confusion matrix is all about. So confusion matrix is nothing but a 2x2 matrix which has a four outcomes. This basically tells us that how accurate your values are. So here we have the column as predicted no, predicted y and we have actual no and an actual yes. So this is the concept of confusion matrix. So here let me just feed in these values which we have just calculated. So here we have 105 215 and 63. So as you can see here we have got four outcomes. Now 105 is the value where our model has predicted no and in reality it was also a no. So here we have predicted no and an actual no. Similarly we have 63 as a predicted yes. So here the model predicted yes and actually also it was a yes. So in order to calculate the accuracy you just need to add the sum of these two values and just divide the whole by the sum. So here these two values tells me where the model has actually predicted the correct output. So this value is also called as true negative. This is called as false positive. true positive and this is called as false negative. Now in order to calculate the accuracy, you don't have to do it manually. So in Python, you can just import accuracy score function and you can get the results from that. So I'll just do that as well. So I'll say from sklearn. metrics import accuracy score and I'll simply print the accuracy. I'll pass in the same variables that is Y test and predictions. So over here it tells me the accuracy as 78 which is quite good. So over here if you want to do it manually you have to plus these two numbers which is 105 + 63. So this comes out to almost 168 and then you have to divide it by the sum of all the four numbers. So 105 + 63 + 21 + 25. So this gives me a result of 214. So now if you divide these two number you'll get the same accuracy that is 78% or you can say 78. So that is how you can calculate the accuracy. So now let me just go back to my presentation and let's see what all we have covered till now. So here we have first split our data into train and test subset. Then we have built our model on the train data and then predicted the output on the test data set. And then my fifth step is to check the accuracy. So here we have calculated accuracy to almost 78% which is quite good. You cannot say that accuracy is bad. So here it tells me how accurate your results are. So here my accuracy score defines that and hence got a good accuracy. So now moving ahead let us see the second project that is SUV data analysis. So in this a car company has released new SUV in the market and using the previous data about the sales of their SUV they want to predict the category of people who might be interested in buying this. So using the logistic regression you need to find what factors made people more interested in buying this SUV. So for this let us see a data set where I have user ID I have gender as male and female then we have the age we have the estimated salary and then we have the purchased column. So this is my discrete column or you can say the categorical column. So here we just have the value that is 0 and one and this column we need to predict whether a person can actually purchase a SUV or not. So based on these factors we will be deciding whether a person can actually purchase a SUV or not. So we know the salary of a person, we know the age and using these we can predict whether a person can actually purchase SUV or not. So let me just go to my Jupyter notebook and let us implement logistic regression. So guys, I'll not be going through all the details of data cleaning and analyzing the part. So that part I'll just leave it on you. So just go ahead and practice as much as you can. All right. So my second project is SUV predictions. All right. So first of all, I have to import all the libraries. So I say import numpy as np and similarly I'll do the rest of it. All right. So now let me just print the head of this data set. So this we have already seen that we have columns as user ID, we have gender, we have the age, we have the salary and then we have to calculate whether the person can actually purchase a SUV or not. So now let us just simply go on to the algorithm part. So we'll directly start off with the logistic regression or how you can train a model. So for doing all those thing we first need to define your independent variable and dependent variable. So in this case I want my X that is independent variable. I say data set doilock. So here I'll be specifying all the rows. So colon basically stands for that. And in the columns I want only two and three dot values. So here it should fetch me all the rows and only the second and third column which is age and estimated salary. So these are the factors which will be used to predict the dependent variable that is purchase. So here my dependent variable is purchase and the dependent variable is of age and salary. So I'll say data set dot I'll have all the rows and I just want fourth column that is my purchased column dot values. All right. So I've just forgot one square bracket over here. All right. So over here I have defined my independent variable and dependent variable. So here my independent variable is age and salary and dependent variable is the column purchase. Now you must be wondering what is this eyelock function. So function is basically an indexer for panda's data frame and it is used for integer based indexing or you can also say selection by index. Now let me just print these independent variables and dependent variable. So if I print the independent variable I have the age as well as the salary. Next let me print the dependent variable as well. So over here you can see I just have the values in zero and one. So zero stands for did not purchase. Next let me just divide my data set into training and test subset. So I'll simply write in from sklearn. crossplit doc cross validation import train test. Next I'll just press shift and tab and over here I'll go to the examples and just copy the same line. So I'll just copy this. I'll remove the points. Now I want the text size to be let's say 25. So I have divided the train and test set in 75 25 ratio. Now let's say I'll take the random state as zero. So random state basically ensures the same result or you can say the same samples taken whenever you run the code. Now you can also scale your input values for better performing and this can be done using standard scaler. So let me do that as well. processing pre-processing import standard scaler. Now why do we scale it? Now if you see our data set we are dealing with large numbers. Well although we are using a very small data set. So whenever you're working in a broad environment you'll be working with large data set where you'll be using thousands and hundred thousands of pupils. So their scaling down will definitely affect the performance by a large extent. So here let me just show you how you can scale down these input values. And then the pre-processing contains all your methods and functionality which is required to transform your data. So now let us scale down for test as well as your training data set. So I'll first make an instance of it. So I'll say standard scaler. Then I'll have x train I'll say sc. fit fit transform and I'll pass in my xrain variable. And similarly I can do it for test wherein I'll pass the X test. All right. Now my next step is to import logistic regression. So I'll simply apply logistic regression by first importing it. So I'll say from sklearn from sklearn. linear model import logistic regression. Now over here I'll be using classifier. So I'll say classifier dot is equals to logistic regression. So over here I'll just make an instance of it. So I'll say logistic regression and over here I'll

### Logistic Regression [4:45:59]

just pass in the random state which is zero and now I'll simply fit the model and I simply pass in X train and Y train. So here it tells me all the details of logistic regression. Then I have to predict the value. So I'll say y bread pred is equals to classifier then predict function and then I just pass in x test. So now we have created the model we have scaled down our input values. Then we have applied logistic regression. We have predicted the values and now we want to know the accuracy. So to know the accuracy first we need to import accuracy score. So I'll say from sklearn. metrics import accuracy score. And using this function, we can calculate the accuracy or you can manually do that by creating a confusion matrix. So I'll just pass in my Y test and my Y predicted. All right. So over here I get the accuracy as 89%. So we want to know the accuracy in percentage. So I just have to multiply it by 100. And if I run this, so it gives me 89%. So I hope you guys are clear with whatever I have taught you today. So here I have taken my independent variables as age and salary. And then we have calculated that how many people can purchase the SUV. And then we have calculated our model by checking the accuracy. So over here we get the accuracy is 89 which is great. So what is a decision tree? Decision tree is basically a technique or a data structure that you build that help you in making your decisions. Okay. And it's very common even though we don't call it decision tree we all use it in real world right let's say I'm a manager or a architect in computer science uh department in a company right I want to make a decision in terms of I have some requirement in my team I need to make a decision whether I should build my own software to do that requirement or offshore that requirement to some consulting team or even buy a product that already solves that problem. Right? There are bunch of different alternatives that I may have to address the requirements of my team. Right? Or you may have multiple different consulting companies trying to offer a solution work as an implementation partner. Or you may have multiple competing products that you can buy and uh solve your requirements or build something on your own inhouse. Right? These are all different options. How do you go about finding the best solution among these all options? You basically build kind of a series of uh decision points. First you may want to see what kind of uh effort that is required for each of these alternatives, right? and also what kind of uh cost you might incur and what kind of ROI you can expect out of each of these decisions. So you kind of draw a list of decision points and then based on those you go about choosing the final solution or final approach to your problem. Right? And similarly, let's say if I'm plus two student and trying to or I'm a 10th class student trying to find my stream of uh study whether should I take uh MPC mathematics background or uh biology or civics or something else. How would you do that? For every alternative, if I take maths as my major, what are my different choices? I can do engineering, I can do I don't know chartered accountant and blah blah. If I do engineering again, I can do masters, I can do PhD. If I charted accountant, I can do I don't know practice in a good firm something like that. So you have a tree of different alternatives and then based on your final outcome maybe expected income after 20 years just throwing it out as an example based on that outcome value you're going to choose which of these path I need to take right that's kind of a decision tree a tree of decisions that you have and then you're going through these tree of alternatives finally reach a particular path of career. Right? So distant tree is very very common and it looks pretty much like that a simple tree right here uh I'm simply seeing oh what is the gender of a particular data point if it is female go to the left if it is male [clears throat] go to the right if it is female check the income of that particular person if it is less than something go this way greater and then you make finally a decision something like email is a spam email is not a spam right? Or this is a human face and this is not a human face. So that's a decision. So it's a series of alternatives you're exploring to reach a particular decision point. So that's the decision tree algorithm. Okay? And this is also you can think of it as a prediction, right? If gender is female and income is less than something, I'm going to predict some outcome, right? If my email is not from a known sender and the email contains words like Nigeria and uh I don't know uh some lottery or things like that then I classify that as a spam. That's kind of a decision tree right first check if it is non sender or not given my history. If yes, go to the left. If no, go to the right. And if no, check if the email contains the word Nigeria. If yes, go to left. If no, [clears throat] go to right. And then again, if it is yes, then see if it is talking about bank accounts. If yes, then classify it as uh spam. Right? So that's the kind of decision tree that we are talking about. Okay? And similarly you can talk about many different examples like this is another example which is very real world application as one of you were asking it's a credit risk detection right when somebody is giving you a financial institution loan or a credit or something they analyze the risk right whether you uh you can pay back that particular loan or EMIs uh on time right they analyze the risk and there are lots of softwares just for doing risk analysis as well but you can and this kind of technique can be one of those techniques in that suit of software that analyzes risk so you see what's your income of a person depending on income you may check what's the credit [clears throat] history of that person especially in US citor history is very important credit score right if it is uh good or If it is bad credit history but your income is very high then you have kind of moderate risk. So since your income is very high maybe it's okay. But if your income is very uh is kind of average and you have bad credit history then suddenly you become like a high-risk customer and you may not get a loan. Right? So that's again like a decision tree based on different characteristics of that particular application you are making decision whether this application has high risk or low risk or medium risk. Okay. [clears throat] So to understand a little more let's take a concrete example which is a very toy example. In fact this is the example that was given by the inventor of decision trees in his original paper. His name is uh Kinland. Yeah, this guy Ross Kinland. When he proposed this algorithm for decision trees, he used this example in his paper. Okay, [clears throat] what is this example? I have bunch of data points. Okay, about whether I should decide to play outside or not, outdoor playing. And how would I go about deciding whether I should play outside or not depending on the weather. Let's say I'm going to look at three features. Okay, that means three properties based on which I'm going to make decision. Those are called features. Features of my data. And what are the features? What is the outlook? What's the outlook for tomorrow? Is it going to be uh sunny or raining or overcast like cloudy or something? Right? And the other thing is what's the humidity level whether it's high, normal, low, whatever are the values. And then also what's the wind speed? Is it going to be strong winds or very mild winds or moderate winds? Whatever, right? Different values. So based on these three attributes of tomorrow's weather, I'm going to decide whether to play outside or not. Right? That's the decision variable. Yes or no. Okay. These are my training data. For every record, I also have the outcome associated with it like spam, not spam, fraudulent transaction, normal transaction, play outside, yes, play outside, no. Okay, so that's my training data. when a new data point comes in that means if I want to decide for tomorrow I will see what's my outlook what's my humidity what's the wind tomorrow wind speed tomorrow based on those I'm going to predict whether to play outside or not right when a new data point comes in you get all the feature values but not obviously the label value because that's what we are going to predict okay so I have weather information of last 14 days and I also have the decisions that we have made okay whether we played or not played okay not just the decisions whether we actually played or not that's the actual data right and then we can predict for future based on those 14 days of data set okay so this becomes our training data now using the training data let's see how to build a decision tree right if you look at our previous examples how does it Look, I'm going to check different properties, different features. First, I checked income here. Then I checked credit history. Then I checked whether the person has debt or not. Right? These are all the different features. So similarly in our data set, we have features like outlook, humidity, and bin. Okay? So let's say let's take outlook to begin with and let's see what are all the different values of outlook. Okay. What are the three values? It can be sunny, it can be overcast or it can be rain. Okay, so those are the three possible outcomes if I look for outlook. Okay, so what that means is if tomorrow's outlook is sunny, I go to the left branch. If tomorrow's outlook is overcast, I go to the middle branch. Otherwise, I go to the right branch. And in my total data set, there are nine yes records and five no records. Okay, these are according to the label that we have. Okay, and now we will see how these 14 records nine SS and five nos will get distributed across different alternatives. Very straightforward, right? So out of all these 14 records, how many of them have sunny and decide to be overcast are raining? Okay, you [clears throat] would get like this, right? It's simply dividing the data set according to different outcomes [clears throat] of Outlook. Now let's take a look at this data set. these partitions of the data. Okay. So if you look at the first partition, look at yes and no combinations. Okay, there are two S records, the ones mentioned in green and then three no records. Okay, that's what you will see in uh if you go step by step in the data as well, right? Take a look at sunny, sunny, three reds and then sunny, two greens. Okay, that's what we have here. Two greens and three reds, three no records and two yes records. And then if you look at all the overcast, everything is green. That means in the last 14 days whenever there was a overcast I was playing outside. Okay. And then rain again you can see there are three yeses and two nos. Okay. [clears throat] But there is a interesting characteristic about this middle guy. What is this saying? It's all saying play. So what that means is based on our historical data that we have if the outlook is overcast I don't know to look at whether wind speed whether it's going to be high temperature or low temperature or humidity I don't care about anything else if outlook is [snorts] overcast I'm going to play right so we reached a decision right there one of the decision so if you come and tell me tomorrow is going to be overcast then I can immediately say oh according to my data that I have seen so far that I learned from we can play tomorrow but then you will come and say oh tomorrow is going to be sunny then I say oh if I look at my old data and I expect the similar behavior in the future also then I'm not sure because my data is not convincingly telling me to play or not right maybe we need more variables oh now I'll say oh tell me the temperature also or tell me the humidity level let me see if that helps right just based on sunny I cannot make a decision so I try to go further I want to split further okay and similarly if you say tomorrow is going to raining again I'll say oh that's not good enough for making a decision tell me about humidity or tell me about some other property okay so that's the idea now if you go further you can split further so if it is sunny I will ask you oh tell me humidity whether is humidity is going to be high or normal. If it is high, my data is telling convincingly it's a pure subset. That means all or no records. There's no mix. So if tomorrow is going to be sunny and humidity is high, then according to my data, I can tell that we shouldn't play. But if tomorrow is sunny but humidity is normal, then again according to my data, we can play. I'm emphasizing on according to my data because the decisions that we are making is completely based on what data that we have instead of 14 days if I give you 30 days of data maybe the tree structure that I'm going to get will be completely different maybe or may not right but they can be different so you always try to make a decision based on what data that you have seen that's why I'm keep on emphasizing according to my data okay And similarly rain we have a mix here. So we try to split further. Then again in this case probably I want to check wind. It's raining but if rain and wind comes together that's a deadly combination. So I don't want to play. But if wind is weak normal winds then maybe I can play. Okay. Again according to my data in the past. So that's my pretty much decision tree that I built. Right now you tell me any day's forecast outlook, humidity and wind combination. I can tell you whether to play outside or not. Right? Can you do that? We all can do that. We simply need to go over different branches and then finally reach a leaf node, the last node where the actual decision is made. This is not play. This is play. Play not play. Right? That's pretty much decision tree for you. But this is again a very toy example and then we are trying different combinations. But then we made some assumptions. Meaning why did I look at outlook to begin with? Why didn't I look at humidity or wind level? Right? Those are perfectly possible alternatives. And again here once I know outlook is sunny. Why did I check only humidity? Why not wind or some other feature that may have temperature or something else, right? So then the question comes these are great. I mean this is my final decision tree if I remove my data but how would I know which attribute should I take at any step in the thing right? So there should be some math behind it that drives our decisions or drives our choices in terms of which attribute to inspect on. Okay. And this is another example just to give you more variety. So hold on to that question. We'll come back to that question or we'll try to answer this question of which attribute to take in the context of this example. So what is this example? It's just a case of a bank that wants to market its products to different customers. Okay. and uh and then see whether the customer is going to subscribe to that particular product or not. Okay. And we are going to predict whether the customer is going to subscribe or not based on certain features, certain attributes. [clears throat] So subscribed or not is our output column or the result column that we are interested in that we want to predict and the value for subscribed based on all these different things. Okay. Some demographics of the client like uh marital status, education, housing does a client has house or not, loan or not, what's the type of contact? what's the previous outcome in my interactions with him and then what's the job area that he works in bunch of some features about that particular client right based on those things I want to make a decision of course I need to start with some training data that's a training data with some examples of features as well as the outcome that I have observed in the past okay with this now I have to construct a decision tree so obviously the question is should I first look at job, marital status, education, housing, blah blah. Which one should I start with? And this is only how many I have seven columns. I have seven features 1 2 3 4 5 6 7 and then one output call. But in real world, uh like if I'm looking at email data, the features that I can construct would be thousands. So out of all these thousands of columns I have to choose which one to begin with and then at every step right not just in the first root node but at every step I have to decide what column to choose from right so that's a decision we need to make and for that we have some math associated with it okay what's the math tells you the math is based on certain concept called entropy Okay, a common way to identify the most informative attribute is to use entropy based methods. Okay, entropy is a mathematical concept. Okay, and nothing related to machine learning. Entropy was actually invented in the context of information theory, right? Back in the time when they were trying to invent a way of communicating information like before telephone networks came in, right? how to pass information from one place to another place in a concise manner. In that context, uh the entropy was first used in the context of information theory and the same concept is used here as well. What does entropy mean? It measures the amount of uncertaintity or randomness that is present in the data. Okay, it is a measure of amount of uncertaintity or randomness. So what does that mean? Let's take a common example to explain this entropy concept is politics. Okay. Before the elections, there will be lots of exit polls, right? That's my data. Let's say exit polls. What do exit polls do? They go to bunch of people the survey whom you're going to vote whether Republican or Democrat or BJP or Congress assuming let's say there's two parties. Okay? So you get bunch of data points Congress, Congress, BJP, BJP, BJP like that, a list of preferences provided by different survey participants. Now given this data of preferences, [clears throat] the entropy tells you how [snorts] random that data is. What's the uncertaintity if the election is very tight? Okay, in US elections, everybody predicted Clinton is going to win, but Trump won. But let's say I don't know some elections where uh it was very tight that means 50% or 49% of the people said one party and 51% another party. So it's kind of neck-to- neck competition. So that is highly random meaning you cannot predict or highly uncertain. the outcome of that event is going to be very uncertain right because it's 50/50 or 4951 chance right you cannot really predict so that's the high entropy okay if my data is like that then I will say my entropy in this data is very high the uncertaintity that is present whereas if you take our last elections in India BJP versus Congress pretty much every Tom Dick and Harry knows knew that BJP is going to win because Congress did very badly in the previous year. So everybody got fed up with it. So if you had done the survey, 90 people would have said BJP and 10 Congress out of 100. So there the amount of uncertaintity that is present in the data is very less. It's very biased towards one particular outcome. So in that data set I would say entropy is very less. Okay. — [clears throat] — Yeah, I'm coming there prain how entropy helps in decision making but you get the idea of entropy in one case when it is neck to neck thing entropy is very high and in the other case when everybody knew BJP is going to win entropy is very less now you tell me if I give you these two data sets okay let's take data set one to be US election where it is neck to neck and uh data set two which was a Indian election now if I give you two data sets and then ask you to predict who's going to win. Pick a winner. Which one do you think is easy to pick? Is uh US election data or the Indian election data which is easier to predict the winner? Absolutely. We are going to say in Indian election where BJP got 90% of the votes or 90% of the survey says yes to BJP we are going to the prediction is much more easier in that case because the majority 90% majority is saying BJP is going to win. So you can see when the entropy is low prediction becomes easier right that's the connection between entropy and the ability to predict if the entropy is high that means it's 49 51 vote uh thingy the prediction is difficult right it's highly uncertain scenario so anything can happen I cannot predict very well so when entropy is low prediction is easy when entropy is high prediction [clears throat] is difficult agree everyone so that's the theory now how does it relate to in this particular case in my data set so I can take this as uh two parties right yes party no party right no is let's say BJP and S is Congress right so these are let's say my survey participants there are two SS and uh I don't know 4 8 and 12 nos right so let's say no is BJP and S is Congress so we are saying out of 14 participants 12 are saying no meaning BJP and two are [clears throat] saying yes meaning Congress so what do you think the entropy is very low right if I compute the uh entropy value according to whatever equation I have the entropy would be very small. So this problem should be very easy to predict blindly. I can just say whatever data you give me I'll just say predict no what's the chance of error 2 out of 14 right that's a very small error you see the point everyone so that's the idea of entropy okay if I compute an entropy based on the outcome values so every outcome column has bunch of possible values yes s no or if I'm predicting the risk of diabetes high risk of diabetes, medium risk of diabetes, low risk of diabetes. So I have three outcome possibilities and in this case yes no two outcome possibilities. So it's like a two-party system, three-party system and then using this existing data I can compute the entropy. what's the level of uncertaintity that is present in the data and how can I divide this data so that I can become smaller entropy partitions. So if I look at our original data, I mean previous example, this is my original data, right? Nine play records and five don't play records. So that means nine people voting for S party, five people voting for no party. So I can do the math and compute the entropy. It would be fairly high because 9 to5 is still difficult to predict. So what we did we divided the data in such a way that everything is a clean split pure subset. What does this mean? What's the entropy of this guy? Let's say this partition everyone is saying play. So 100% of people voted for one party and zero voted for the other party. So how easy or difficult is it to predict? It's very easy, right? If you come here, you just say yes. And no because the entropy is actually zero. There's no uncertaintity at all. So the point of decision tree is to divide the data set such that the entropy of the resulting subsets is smaller. Okay, does [clears throat] it make sense everyone? So to begin with in my original data there is some measure of entropy we can calculate. Okay, ignoring the equations, I just want you to capture the idea. We have some level of entropy, some level of uncertaintity. But we are dividing the data such that each resulting partition has less amount of uncertaintity. Okay, which means easy to predict. So we are breaking up a difficult to predict problem into many easy to predict problems. That's the underlying idea of the math behind decision tree. Does that make sense everyone? Okay. So that's it. If you understand that decision tree is very that's it that's the end of it and all you need is just a math to compute the entropy value ultimately you need to compute that value right and that depends on the probabilities uh without going into the details you can just plug in these probability values like if I say nine yes ss and five nos what's the probability of yes 9 divided by 9 + 5 right so if I have in my data 9 ss and uh five nos then probability of s would be 9 / 14. Probability of no is 5 / 14. Those are the two probabilities for my outcome. Then I simply plug it into this equation and then I get a value. That's the entropy in the given data set. And then we can do some more math and say oh if you divide based on certain column okay in this example some contact column okay what are the entropies of my resulting partitions something okay there's some math I don't want to go into confuse you more but there is some math to figure out what's the entropy of the resulting partitions and then you simply take the subtraction what's the entropy before and what's the entropy after partitioning by a particular attribute and what's our goal? We want to take a difficult to classify problem and then turn it into easy to classify problem. So we want to reduce the entropy as much as possible. So we want this difference to be as high as possible. Right? Before minus after entropy, right? If the difference is very high that means we are dividing the problems into very small entropy partitions right. So you simply compute this difference which is called as an information gain that metric for all the attributes in your data like this. If I divide on P outcome this is my information gain. If I divide based on my contact column this is my information gain. So you do that for all the columns and then you simply pick the one with maximum value in this case P outcome. So what that means is given the data set the first thing I should check is P outcome and then based on that you divide okay like this [clears throat] P outcome whatever is the set of values and then you make again same calculation out of all the columns compute the information gain for this particular subset of data and then choose the best one in this case probably education turned out to be the attribute with highest information again and you keep on doing that and ultimately whenever you get like a pure subset then you'll say there's nothing to divide anymore I can make a decision right so just with the notion of entropy you can pretty much derive the entire technique okay is that clear okay so that's pretty much decision tree and there are bunch of pros and cons to it I mean sometimes it works well there are some negative points about the data as well. There's a notion of overfitting meaning not always you get very good prediction accuracy. So there are certain techniques to deal with that as well which is basically so biased uh data and training set. So that's the notion of that and then if your data is slightly organized in different ways then you may not get good results always. So there are bunch of pros and cons with that technique. I mean you would be able to understand once you drill down deeper and then try to understand the properties of that particular math but overall it's a very intuitive and easy to develop an algorithm okay and surprisingly it also performs very well in practice overall I mean in a common scenario right you can always find a better algorithm maybe but typically decision trees gives you reasonably good performance for your data sets Okay. And that's why it's uh very commonly used as well. Okay. If uh there are no questions, I can give you some demo of this particular algorithm as well. Okay. Very good. So the data we are going to work with is diabetes data set uh which looks like this. So I'm going to show this data set demo in our programming language. Okay. Question. How important is it for data science? You mean the distant? So I would like to rephrase that question because uh how important is decision trees for data science is actually a kind of a misleading question. So you have to think of these techniques whether it's decision trees or logistic regression or support vector machines random forest name base all these different techniques are different ways of solving a problem and the best analogy to understand that is like different tools in a toolbox like a hammer, screwdriver, a spanner blah right these are all different techniques or different tools. Now for a given task, if I want to fix something in my house, I will use a certain combination of these tools, right? That are appropriate for that particular problem, right? If I'm fixing my door, I would use couple of tools. And if I'm putting nail in my on the wall, I'll use a different set of tools, right? Maybe a driller and a hammer and bunch of other things. So for a given problem, a different set of tools would be appropriate. So I cannot say how important is a hammer for building houses right yes it's important every tool is important right but for a given task some tools would be more appropriate and perform better than other set of tools exactly the same way all these different algorithms are like tools in your toolbox and that's why a lot of people also write articles saying data scientist toolbox box. What does that toolbox contain? Different techniques. And for a given data, for a given problem, like if I'm predicting diabetes [clears throat] or predicting credit risk or predicting something else, email spam for each of these different problem and for a given data, couple of these techniques may perform better than the other techniques. Okay, make sense? That's a great question. On what basis we can select an algorithm? What do you think? It's based on the performance of an algorithm. Something like accuracy. If I make 100 predictions, how many of them will turn out to be true? If 99% of them valid predictions, then yeah, that's great. Compared to another algorithm which may give only 75% accuracy, right? So accuracy is one measure based on which I can select an algorithm whether decision tree is doing good for my data or logistic regression data. Right? Again accuracy is only one measure. There are couple of other important measures as well like precision and recall. There are slight differences between these different measures. Uh but you have bunch of different measures and based on those measures select your algorithm. What algorithms we need in data science? That's a great question as well. There are some fundamental techniques, okay, that [clears throat] every data scientist must know. These are like basic techniques for doing anything. And it not just teach you technique but also different concepts like decision tree is teaching you the concept of entropy, right? So those fundamental concepts these techniques will teach you something like K means clustering, decision trees, linear regression, NA base. So there are bunch of fundamental techniques that any data scientist must know and you can easily get this list in a if you open up any university's data science class. If you look at their outline, you can find out what are the basic algorithms. Okay. And then on top of it there are many complex variations of these techniques and it's an ocean right the more you learn the more it will be and even in decision trees uh you can go into the math and then there's lot of complexities on decision trees itself okay entropy is one intuitive way of figuring it out but there are 20 different ways people have found out how to choose that attribute instead of entropy somebody says Let's uh look at uh something called as minimum description length. That's another metric. Just like an entropy minimum description length is a different measure. So based on that you can build a decision tree. Right? Now as a data scientist you can choose for your problem whether this makes better or this makes sense. Right? So there's complexities layers of complexities for any given thing. But as a basic introduction, there are bunch of techniques that every data scientist should know. Okay, very good. So now let's look at simple demo. So this is an R language similar to Python if you're familiar with that. So similar things you can also do in Python as well. Uh but R is more kind of statistician and mathematician oriented language than Python. Python is more computer scienceoriented language. So I have this data set in a CSV file diabetes. csv. So let's read it and uh view that data. So this is a data set in a tabular format. Okay, which are bunch of columns. Uh this data set is about it's a real data set by the way but it's a very small data set. Okay, total it has I guess few hundred records. And what does this data set represent? It's a data set about women patients who has participated in some diabetic study. Okay. And for these patients, they collected different features. Okay. things like number of times a particular woman got pregnant and what's the glucose concentration in her blood, diastolic blood pressure, uh skin fold thickness, serum insulin level, body mass index, uh diabetes predictory function like family history of diabetes, right? age and then finally the column that we want to predict whether this particular patient has risk of diabetes or not. Okay, so that's the column that we are planning to predict. Given a profile of a patient, we want to predict whether that particular patient has a risk of diabetes or not. And depending on that risk, I can probably take some preventive measures. Right? If a patient's uh risk is very high then we may want to control her diet very uh strictly or start taking some medication do some preventive actions right so that's the advantage of predicting ahead of time and we are going to predict that based on all these different columns okay that's the problem definition so that's my data and that's our goal to build a decision tree to be even specific that can predict whether the person is going to have risk of diabetes or not. Okay. Now for that of course we can run our algorithm that we just learned using the entropy and then build a model. Great. But then how do I know this is good or not? Like as uh I think Pravin asked how can I select whether I mean how can I decide this algorithm is good or not? We said find the accuracy. But how do we find the accuracy? we used all the data that is available to training itself. So I have to test the model as well, right? It's like in our classes they don't give you like all the examples in the class itself, right? The teacher nicely holds back certain examples and then gives those questions on the exam, right? If they give you same question in the exam as what you saw in the class, everybody scores well. So you're not really testing the learning capability. Instead you show some examples in the training while learning but then you show some other new examples in the exam to test right to test the ability of your mental model how well it learned the concept. So in a similar idea for a given data set you first always divide your data into two sets two subsets one is a training set another one is a testing set okay and using the training set you build your model that means you practice and then build your mental model okay and once you find the model you evaluate that model how good that model is using the testing data set right in the testing data sets since we derived it from the original data we also have the real answer associated with it and then the model is going to predict something and then you can check how well you are doing it's like for the questions that are given in the exam the teacher already knows the key answer key but then based on the answers that you gave he can match how well it aligned with the answer key and then give you a percentage right that's like accuracy of the mental model that you built exactly the same way. Okay. So we're going to divide our data set into training and testing and there are bunch of different ways in Python you can do in some way in R other language you can do it in a different way. Okay. And in this uh example I'm going to use uh one particular method of dividing training and testing. So if I look at it n row of dabbit that's my original data total of 768 records but then I randomly split into training and testing. Now if I see n row of uh diabet train that's 529 records and test 239 records. So randomly I divided this 768 data set into two parts. One containing 529, another containing 239. Okay. And then I'm going to build a model using the training data set. And then I'm going to test it using the test data set. Okay. And for training I need to have some implementation of the algorithm, right? Something like whatever entropy that we have seen. And that kind of an implementation is already available for me implemented by someone else. Okay, it's called as a package. Okay, the package that you can just import and then start using the methods that are implemented in that particular package. And that package for decision tree is called as R part. RART stands for recursive partitioning. Okay. And in that particular R part I have a function called R part again. So that's the function which can learn a decision tree from a given training data set according to some measure not exactly entropy but some other variant of it. Okay. And I need to tell what is my column on which I'm making predictions in this case has risk diabetes and what is the data using which I'm training my model. Okay. And there are bunch of other properties as well other parameters that the function can take. But at the minimum you need to give this what's the data from which you are learning and what's the column that you are trying to predict. Okay. So that's the model and uh you can print out that model and it looks like this. So at the root node you have all the 529 records and then first you want to check plasma glucose concentration. If it is less than 154. 5 then you check body mass index less than 26 or greater than 26 like that. Okay. So it's like a tree structure but we can actually plot it as a tree and then visualize it as well. Okay. That's the other two things I'm doing here. It looks like that. Okay. I'm not actually explaining individual syntax or anything. I'm just telling you what we are trying to do. So that's the decision tree that we constructed. What is the decision tree telling us? First, it is asking me to check the glucose concentration. It figured out that glucose concentration is the most important thing to make a prediction of diabetes in this case. Okay. So I'm checking glucose concentration which is probably intuitive also, right? We all know that glucose level is very important indicator of diabetes. If it is less than 154, you go to the left. If it is greater, right. And if I'm going to the right, that means my glucose is very high, glucose level. And then I'm checking age. If age is greater than 53. 5, that means kind of elderly woman, then I can pretty much directly say that this per this person has a high risk of diabetes. That's again kind of intuitive given our general knowledge of diabetes. If an elderly person has high glucose concentration then pretty much we can say has a risk of diabetes and if the age is less than that we are saying no again based on the data and if the glucose concentration is less then we are going to check body mass index. Okay. If it is less than 26 then we are saying no. So that means your body mass index is low and your glucose concentration is also low. So pretty much you are healthy. So no risk of diabetes like that. [clears throat] It's a bunch of uh decision tree that it created. Now this is the model that we built from the data. Now we need to evaluate it. Okay. How good this model is. And for that we are going to use our test data. The portion of the data set that we kept aside. Right? And what are we going to do? We're going to use this model, apply it on testing data and then get some predictions. What are the predictions according to this particular model and then we are going to verify how well they match with the actual results that we already know. Right? So there's [snorts] a particular function called predict to make predictions. For that you provide what's the model using which you are making predictions and what's the data set on which you want to make a prediction. Okay. Now if you see

### Decision Tree Algorithm [5:34:29]

it shows you basically bunch of snos these are the predictions that I made but then I also know the actual values of this records right that is basically dollar has risk gravities. These are the actual values. So these are my predicted values. These are my actual values. So it's like a answer sheet, right? This is the answer key. Yes, no. And these are the answers given by a student. Now we can match. We can say how many he got right, wrong. That's the accuracy or percentage, right? And to do that also there is a notion of a confusion matrix. Okay, it's like this. The actual answers are here. The predicted answers are here. So we got 123 and 47. Meaning there are 123 questions or test records for which the model said no and also the actual answer is also no. And then there are 47 questions where the model said yes and the actual answer is also yes. So 123 + 47 are the right number of questions on which the model did. These are the did right. So my accuracy is 123 + 47 divided by the total 239 which is if you compute it becomes 7113. So 71. 13%. So that means if I use this model in production I would get roughly 71% accuracy out of 100 patients I make prediction 71% I get right the remaining I'm going to make some mistakes make sense everyone so that's the accuracy and similarly there are bunch of other measures as well like these are all the different other measures that people may be interested in like confidence interval is one and sensitivity, specificity. There are bunch of other things. Each one has a different flavor and the type of information that it captures and depending on the application, one measure may be important than the other measure. Okay? And that we will learn when we try to understand each of the uh measure. Okay? But the point is for the given model you can come up with these kind of accuracy or some other measure and then based on that you can compare among other algorithms and then choose the best one that is available in your toolbox. Okay. [clears throat] So that's kind of the life cycle of building a model and then evaluating a model. Again, all of this is just scratching the surface of model building. And in each of the stage, you can go deeper and then get more complexities and learn different layers of these algorithms and evaluation techniques and data preparation and all that. Okay, so that's actually pretty much uh what I have. The remaining slides are essentially just demo itself. That's how you read the data and then divide it into training and testing. uh as we said training to train the model, testing to validate and make predictions and then evaluate the model and then you actually do the implementation of the model. Either you implement on your own or use some existing implementations to build the model. Okay. And then you may want to visualize the model depending on the model of course and then finally you validate how well you are doing for that particular model and that's the accuracy of uh the model. finally. Okay. And that's it. That would be the end of one model. And sometimes if the accuracy is bad, then the immediate question is, oh, how can I improve it? Right? Maybe you feed better data or you tune some parameters of the algorithm. Every algorithm comes with certain parameters. Okay? And then you can tune those parameters to do a better fit or you may want to add some new data. Okay? you go to a diabetic specialist and then show your problem setting. I'm using these columns and then uh I'm trying to make prediction. Then diabetic expert might say, oh recent study showed that the number of times you eat in a day also makes a difference. Okay, then maybe you want to go back and then add another column in your data. Number of times person eats in a day. Okay, that would be another [clears throat] column and maybe you get better results by adding that column. Okay, so you may change the data, algorithm, you may change the parameters of the algorithm. There are multiple things you can do to improve your performance of your uh yes, accuracy is the simplest factor everybody can understand. So that's why I gave you accuracy as one measure, but there are other measures as well. Like if you look at our demo just to give you a hint there are so 123 + 47 we got them right but at the same time we made some mistakes as well. The model made some mistakes and there are also two types of mistakes and if you notice there are 24 questions where the model said yes but the actual answer is no. What does that mean? The patient is fine actually but the model said oh this patient has a risk of diabetes so it's a false alarm it's called as false positive okay so that's 24 of those and then there are 45 patients or 45 questions for which the model said no but the actual said yes that means the patients indeed has a risk of diabetes but our model said no No, these guys are fine. Don't need to worry. Right? And if you look at it, there's a different cost for a different mistake. Like in this case, 24 case, what's the worst can happen? You're mistakenly saying somebody has risk of diabetes. The max that they do is they go for a more detailed test or they do some dieting or something like that. Hopefully they don't take medicine yet, but they do certain steps. Maybe at that level, it's a less risky mistake, right? Maximum they go to second test and then say, "Oh, you're fine. " And then everything is okay. Whereas these guys, they had the risk of diabetes, but you're saying, "No, no, you're fine. " And that may be a costly mistake, right? costly meaning there's higher impact because their disease will not get uh recognized. They will think all fine and they continue their living style but then the problem may aggravate and then they have some other problems in future. Right? So the point is different types of mistakes have different cost associated with it in your business. Again, what's the [clears throat] cost of saying somebody is uh less risky to give a loan whereas he could not have paid the loan at all or the other way around. He was very good borrower but you said oh this guy does not has high risk and then ignored them. Right? What's the cost? Different uh decisions [clears throat] will cost you differently. Right? And then in those cases accuracy may not be the best uh measure because accuracy says all types of mistakes are of equal cost. Whether you do false alarm or whether you miss the actual diabetic patient they are all of equal cost. If your application is like that then yes accuracy is good measure but typically you have different costs for different mistakes and depending on your application you may choose a different measure that gives you higher priority for the high cost mistakes and lower priority for the low cost mistakes. Okay. [clears throat] So that's just one flavor but then there are other aspects to these as well. Okay. So now let's understand what is a random forest. So random forest is constructed by using multiple decision trees and the final decision is obtained by majority votes of these decision trees. So let me make things very simple for you by taking an example. Now suppose we have got three independent decision trees. Here we are just taking three decision trees and I have got an unknown fruit and I want that these trees would give me a result of what exactly this fruit is. So I pass this fruit to the first decision tree, the second decision tree and the third decision tree. Now a random forest is nothing but a combination of these decision trees. So the results are been fed into the random forest algorithm. So what it sees is that okay the first decision tree classifies it as speech. The second decision tree says that it is an apple and the third one says that it is a peach. So random forest classifier says that okay I've got the result as two peach and one for an apple. So I would say that the unknown fruit is an peach. All right. So this is based on the majority voting of the decision trees and that is how a random forest classifier comes to a decision of predicting the unknown value. Okay. So this was a classification problem. So it took the majority vote. Now suppose if it was an regression problem, it would have taken mean of it. Okay. So now let's move on further to understanding what is a decision tree. But before that we should understand that random forest the building blocks are decision trees and that's why studying decision tree becomes important because if we understand one decision tree we can apply the same concept to random forest. Okay. So let us understand what is a decision tree. So decision tree has basically three nodes. They're important. The first one is the root node. The root node as the name suggests here the data set the entire data set is fed at the root node. And then there are decision nodes where decisions are being taken and splitting is performed. And then we've got the leaf node. And these leaf nodes are the ending point of the tree where no further division takes place. And we can say that the predictions are made at the leaf nodes. Okay. Now another thing to note here is that decision nodes provide links to the leaf nodes and decision tree breaks the data set into smaller subsets. Splitting is done at nodes and at the end of the tree the final point the decision or the prediction is made. Now let's construct a decision tree and take an example of the penguin's classification. So let me just walk you through what is this penguin classification problem. So we've got some three species of penguin. Let us get familiar with these penguin species. This is kento, this is Edalie and this is chinstripe species. So these are the penguin species of Antarctica and they are found on different islands. So we have to classify these penguin species correctly. So we using random forest here. But for convenience sake, let us just work with decision tree right now and we will see how a decision tree really classifies these species. So that's really interesting and let's move on forward and understand some parts of this penguin because we'll be working with this data set. Okay. So this is a penguin and this is the head, bill, flipper, belly and claws. The different body parts of the penguin. So we are majorly concerned with the bill and flipper and the body mass of the penguin because our data set contains majorly of these features. So make sure that you understand the flippers and also the bill of the penguin. Okay. So now moving on forward and let us now construct a decision tree. So let's see how a decision tree is constructed. So I have taken a subset of the penguin's data. And here we see only two columns that is um island and body mass and of course the species of the penguin that is the outcome or the target variable that we have in this subset. So now we construct a decision tree here and we take body mass as the first feature and the splitting is done based on one condition that if the body mass is greater than equal to 3500 and if it is yes then based on island the another feature we will classify or get the leaf node as either torresen or bisco island. So if the island is either Toggressen then the species would be Edalie and if the island is Bisco then the species would be Gento. So after torress and Bisco no further division takes place because we are getting the predictions at these leaf nodes. Whereas if the body mass is less than equal to 3500 and it says that yes the body mass is less than 3500. So we get the species as chin strip and so no further decision had to be made at this node and that's why it was been ended here. So this was a very simple basic example of a decision tree and suppose if we had got a huge data set this decision tree would have gone into a huge depth and the depth of the decision tree would have led to overfitting of the data. So that is one of the drawback of decision trees that random forest overcomes. So now let's move on forward and understand the important terms in random forest and this will also help us consolidate whatever we have learned so far. So we have taken the same small decision tree of the previous example and let's understand these are also the important terms which will be relevant to random forest also. So the first is the root node. Now here what happens is that the entire training data has been fed to the root node. And then we've got here that each node will ask either true or false question with respect to one of the feature and then in response to that question it will partition the data set into different subsets. That's what it is doing here based on the condition that if the mass body mass is greater than equal to 3500 it ask a question either yes or no and based on that again for the partition is done and if not then it just classifies the species and then again what happens is that the splitting now this is very important here the splitting takes place either with the help of a genie or entropy methods and these helps to decide the optimal split and we will be discussing about splitting methods very soon, right? Okay. And then we've got the decision nodes which provide the link to the leaf nodes and these are really important because then only the leaf nodes would tell us what actually the real predictions or to which class does the species belong. So now coming to the leaf node and these are the end points where no further division will take place and we will obtain our predictions. Okay. So now coming up to another important thing here is working of random forest. So now for working of random forest we will have to understand a few important concepts like random sampling with replacement feature selection and also the ensemble technique which is used in random forest and that is bootstrap aggregation which is also known as bagging. So we will understand this with the help of an example which will be very simple and then we will go on understanding how feature selection is done in both the classification and the regression problem. Actually how random forest select features for the construction of decision trees. Well in random forest the best split is chosen based on genie impurity or information gain methods. So this also we will understand. Now let us first understand random sampling with replacement. Now what happens here is that we've got a small subset of the same penguin data set wherein we've got some six rows and four features that means four columns and the arrows that you can see is that now we will be creating three subsets from this small subset right and these three subsets will become our decision trees and then we'll be constructing decision trees from these subsets. So let us create our first subset. And you can see here that the subset is randomly being created. And for convenience sake, let me just also show you the different subsets here. Okay. So now for better understanding, let us understand this that in the first subset if we focus, we've got certain random rows here and we've got certain feature. But we do not know how this feature has been selected. We got island and we got body mass. But in the second subset we got island and flipper length. And in the third subset we got body mass and flipper length. Right? Now let's look at the rows. Now when I am talking about these features I will say this is feature selection. And remember this term. Now coming to the second concept that is random sampling. Now random sampling is nothing but selecting randomly from your subset. So I'm selecting randomly certain rows from my subset and creating further subsets. Okay. So what is replacement here? Replacement is can be seen here and can be understood with this second subset. We see here that the gento species this row is being repeated again and this is replacement. That means that when we are working with repeated rows and this row can be repeated again in the second or the third subset then this is random sampling with replacement. That means my random forest can use a row multiple times in multiple decision trees. Right? So this is the basic concept of random sampling with replacement and feature selection in random forest. Another important term which I would like to bring into the notice is that when we are working with these type of small subsets these are also known as a bootstrap data sets and when we aggregate the results of all these data set it becomes bootstrap aggregation. So just filling in the gaps so that later on the concepts become more clear. So now let's move on to drawing decision trees of these subsets. Okay. So let's draw the decision tree of the first subset. Again we are taking body mass as the first root node and then based on a decision like if the mass is greater than equal to 3500 then take a decision either yes or no. If it is no then the species is chinstripe and if it is yes then again you partition based on island and if it is torresen then it is adi and if it is visco then it is genu species. Okay. So this is how we will construct two more decision trees of the remaining subsets. So on the second subset let us just again create decision tree and here now we are taking flipper length and then based on a condition that if the flipper length is greater than equal to 190 then make a split. If it is yes then the specy become Gentu and if it is no that means again make a decision based on island and if it is torressen it is Edelie and if the island is dream island then it is a chinstrap species. So this is how the decision tree of the second subset has been created and this is how it will take decisions right based on the tree length depth and also the features it is selecting. Okay. So now let's create the third decision tree of the third subset. And we get a decision tree something like this where in body mass if it is greater than 4,000 and if it is yes then clearly it is a Gendto species. And if it is no then again make a partition with the with respect to flipper length another feature here. And then if it is again greater than equal to 190 then the specy would be edley else it would be chinstrap. So this is how decision tree three will make a decision. Now let's just keep these decision trees with us. Okay. And we will make sense of these trees just in a while. Okay. But before that let us understand how feature selection is done in a random forest. How am I selecting the columns? So for classification by default the feature selection is taken as the square root of total number of all the features. Now suppose I've got here four features. So it is a classification problem. I will take the square root of these four features which becomes two. So decision tree would be constructed based on two features each. If suppose I had 16 features then it would be square root of 16 that would be four. So four features would be taken in each decision tree. All right. And suppose if this would have been a regression problem then by default what would happen? The features would be selected by taking the total number of features and dividing them by three. Okay. So this is how by default the feature selection is being done by a random forest. Okay. Now let us move on forward to consolidating our learning. So now we are coming to ensemble techniques that is also known as bootstrap aggregation. Random forest uses assemble techniques. And what is ensembling? It just means that you are aggregating the result of the decision trees and taking the majority vote in case of classification and the mean in case of regression problems and giving the output. Okay. So now we have again plotted all our decision trees here and below we can see that there's an unknown data and I want to predict the species of these data. So what will happen is that again let us just feed this problem to each of the decision tree and let's see what each decision tree makes the prediction. So I just feed this unknown data to decision tree one and it says that okay the specy seems to be chin straight. Okay. And then decision tree two says that based on the data it has been found that this species is elderly and then decision tree three says that no I with my decision tree this specy is chinstrip okay now all these data is been fed to random forest classifier and it says that okay for chinstripe I've got two votes for edi it's got one vote so the new specy would be chinstrap right so this is how the bootstrap aggregation ation is done based on the majority voting and the decisions taken by different decision trees. They have been combined together, aggregated and we get an assembled result in the random forest. Okay. So this was very simple concept ofsembled techniques which has been used in random forest. Okay. So now let's move on forward to splitting methods. So what are the splitting methods that we use in random forest? So splitting methods are many like genie impurity, information gain or kai square. So let's discuss about genie impurity. So jinny impurity is nothing but it is used to predict the likelihood that a randomly selected example would be incorrectly classified by a specific node and it is called impurity metric because it shows how the model differs from a pure division. Right? And another interesting fact about genie impurity is that the impurity ranges from 0 to one with zero indicating that all of the elements belong to a single class and one indicates that only one class exist. Now value which is like 0. 5 this indicates that the elements they are uniformally distributed across some classes. Right? Now moving on forward to information gain. Now this is another splitting method which random forest can use and information gain utilizes entropy. So entropy is nothing but it is a measure of uncertaintity. So information gain let's talk about that first. So the features they are selected that provides most of the information about a class right and this utilizes the entropy concept. So let's see what is entropy. This is a measure of randomness or uncertaintity in the data right. So we will understand this entropy with the help of a small example. So don't worry about it. So let's understand this entropy. Now suppose there's a fruit tray with four different fruits right and uh what do you feel about the entropy here? That means the randomness of the data. Is it really easy to classify these fruits into the respective class? So this becomes really uncertain and the data looks messy here. But what if we just split here these into two trays wherein the first tray would have peaches and oranges and in the second tray will have apples and lemons. So now this becomes a little more certain. We get low randomness here and this is called as low entropy. So when we move down from tree that means from root node to the leaf nodes the entropy reduces and we can also calculate information gain from this entropy that is the difference in entropy before and after the split that is known as information gain. Okay. So once we move down the tree and start reducing the randomness from the data the entropy becomes lower and that is what we want in our data. If there's low entropy that means we are likely that the predictions would be more accurate and we can make predictions very easily as compared to very messy data which has high entropy. Okay. So that was about entropy. So now let's move on to understanding the advantages of random forest and we see here various advantages. So let's focus on firstly low variance. Now since random forest overcomes the limitations of decision tree and it also has the advantage of low variance because it combines the result of multiple decision trees and each decision tree is being trained on limited data set that we have seen earlier also. So each tree was making its own subset of data and training the data on that limited length of the tree. So there's less step there's less overfitting and low variance of the data. So coming to the next point that is reduced overfitting. Again since we were working with multiple decision trees hence reduced depth of the tree. So we get reduced overfitting that means the model is fitted well and it does not tries to learn even the noises right. So uh we use the bootstrap aggregation or bagging here in random forest and that is why we also get reduced overfitting in random forest and this is one of the reasons that why is it so popular because you don't have to worry about overfitting of the data right all right now moving on forward to the another advantage is that the normalization is not required in random forest because it works on rule-based approach right and uh another advantage is that it gives really good accuracy which we will also So seen or hands-on. It really gives a very nice predictions either precision or recall and generalizes well on unseen data as compared to other classifiers or machine learning classifiers which are present like nbase or SVM or KN& N. Random forest really outperforms other classifiers. Right? So let's move on to understanding few more advantages of random forest is that it is suitable for both classification and regression problems and also it works well with both categorical and continuous data. So you can use it well with any of the data sets right and it performs well on large data sets right so it solves most of the problem that's why random forest is largely been used in machine learning problems now moving on forward to certain disadvantages of random forest so the first disadvantage is that it requires more training time because of the multiple decision trees if you've got a huge data set you would be constructing hundreds and hundreds of decision trees and that requires a lot of training time and here comes one more disadvantages is that the interpretation becomes really complex when you've got multiple decision trees. So decision tree interpretation is easy because it is an individual decision tree. But when you combine hundreds of decision tree to form a random forest, the interpretation is really very difficult to understand and it becomes quite complex to apprehend what exactly the model is trying to predict and where the splitting occurs and what features are being selected and so on. And another disadvantage is that it requires more memory. So memory utilization is really heavy in case of random forest because we are working with multiple decision trees. And another disadvantage is that it is computationally expensive and requires a lot of resources because of the training of multiple decision trees and also storing them. All right. So this was all about random forest the theory part of it and now let us just move on to the practical demonstration or a hands-on on random forest. Okay. So now it's time for an hands-on on random forest. So let us just import a few basic libraries of Python in our Jupyter notebook and we will run this. We will import pandas spd numpy as np and seabon as sns. Now seabon is need needed here because we want to load a data set that is a penguins data set with the help of seaborn and this is already been preloaded in seaborn. This is already loaded data set and seabon has got multiple data sets you know for practice for beginners. So it is a good way to practice for data sets. Now we can see this asterk sign that means it is telling us to wait. So let us just let it get loaded. So we get got our data in an object called DF and we can see the first five entries here and this uh data frame is shown in the form of a table rows and columns and we see here some species bill length bill depth flipper length body mass and the sex of the penguin. So our task is to specify or to classify these species of penguins into the respective correct species. Right? So we see the shape of our data and we see that it is like 344 rows and seven columns and we will see the info. So we see df. info and this gives us along with the nonnull count we also get the data type of the values. So we have got species island as the object data type whereas the bill length, bill depth, ripple length and body mass are in floating point or you can say floating data type and the sex is in object data type. Right? So now moving on forward to calculating how many null values are there with the help of df do isnull dot sum. So we get certain like sum around two null values in all these columns as you can say the features like bill length, bill depth, flipper length and body mass whereas there are 11 null values in sex feature right. So what we do is since they are very small null values we can just drop it or you can also ignore them. So here in this data frame what I'm doing is I'm just dropping these null values and let us just check whether the they have been dropped or not with the help of again the same function dot isnull dots sum and then we see that yes they have been dropped from our data frame. Now let us do some feature engineering with our data. Now we have seen that we have got some object data type in our data frame and before feeding it into algorithm that is random forest we have to transform the categorical data or the object data type into the numeric. So we are using here one hot encoding to convert the categorical data into numeric. Now there are various ways in Python which we can do that like one hot encoding or you can also use mapping function in Python but here we are using one hot encoding. So let us just do that and we find here first of all let us apply it on the sex column and here we see that we have got two unique values in sex that is male and female and we use pandas here to get dummies that is how we will apply this one hot encoding because this is how get dummies work. So what happens is here is that the new unique values are converted into the respective columns in the data frame. So we see here we have got two unique values males and female and they are being converted into the columns. Okay. So one thing to note here is that we also get a problem of dummy trap because here we see only two unique values. Now suppose if I had six or seven unique values and I do this one hot encoding I would have lots of features in my data frame and that would lead to several complexities. So what I do is uh to keep things simple I can use one hot encoding when my data frame or my unique counts are low when my unique values are less. So since I had just two or three I can use it. So I'm using here. So what I do is again now one row one column as we can see here that it is redundant giving me extra information. So I will just drop it. So I drop this first column and what I get in this data frame is only male. So let us just infer whether I can also infer females from this or not. So if the value is one that means the penguin is a male and if the value is zero that means the penguin is a female. Okay. So only one column is needed for this data frame. So I just kept one and dropped the another one. Okay. Now apply again one hot encoding to the island feature. So in island if we check the unique values we've got three unique values here. Toggressen, bisco and dream island. And the object is the data type right. So again we will use pandas pd. t get dummies and we will use apply it on the feature island and let's get the head of it. So we get here again the unique values were converted into columns and we get here respective three columns and then again we will just drop the first column to get the remaining two columns. So here also we can infer that if the island is torresen if it is one then it is not dream neither bisco right. So this is how you can read it from the data frame and understand that now remember this thing that these two island and here sex these are two independent data frames these are not yet included in the main data frame. So what we will do now is we will concatenate the above two data frames into the original data frame. So what we do we again create a new data frame that is new data and let us just concat with the help of pd. conat function and we will concat what df island and sex and x is one that means in the column okay so when we will run this let's see the head of it so everything gets concatenated in a single data frame which is good for the feeding this data into or splitting the data into test and train data. So now we have this new data frame and we've got some repeated columns here which needs to be deleted. So what we do is we will delete sex and island here which are just repeating because we've got here male and we have also got here dream and toggress. So we do not require this island column neither the sex. So we just drop it with the help of new data drop and the column names x is one in place equals to true. Right. And let's see the head of this data frame. Head of the data frame gives me five unique values. Right? And now it is time to create a separate target variable. And what we'll do is we will store in a variable called y only species. So what we do is from this new data dot species we will just store the species in this y and we see this y. head that is the first five species and we got the values here. That means another target variable is been created now. So and you can also see the y do. values as Eddie, chinstrap and genu. So now we see here three unique values of the penguin that is chinstrap, edily and genu and the data type is object here. So again we need to convert this object into the numeric data type. So now what we are doing is we are using the map function in python and what we do is we map edi to zero chinstrap to one and genu to two. So this is how we see that all the values have been mapped to numeric. This is another way to convert a categorical value into a numeric value in Python. Now what we do is let us just drop the target value species from our main data frame. So we'll just drop it and let's see our new data frame. So we see that we don't have any target species here. Right? Okay. So in X let's store this new data and perform the splitting of the data. So what we do is from skarn. mmodel selection we will import our train test split and we will split our training data into 70% and 30%. So test data becomes 30% and training data is some 70%. And this random state is zero which means that I'm not fixing any random state and this is also used for the code reproducibility. Now suppose if I again run this code I will get the same result. It will not change. You can set this random state to any of the random number as per your choice and the result would differ. Okay. So now let us print the shape of X train by train X test and Y test. So we see here that it has been splitted into 70 and 30% and we get X train has 233 values here and seven features and X test has 100 values and seven features. Similarly Y train you can see 233 values and Y test has 100 values that means the species. Okay. So that has been perfectly splitted into 70 and 30%. Now what we do is we will train the random forest classifier on the training set. How do we do it? We will import the random forest classifier from skarn. nol. So we've already dealt with what is ensemble and then in classifier we will store this random forest and this n estimator is nothing but decision tree. So we are creating some five decision trees here and the criteria is entropy and again random state is set to zero. So let's see and then we will fit this X train and Y train. So this has been fitted and the criteria is entropy here. All right. So now let's make some predictions and let's create a variable called Y predict and we will just predict it on X test and we have also printed this Y prediction and now let's print the confusion matrix to check the accuracy of random forest algorithm. And what we do is from matrices skarn matrices we will import classification report and confusion matrix and also the accuracy score. So we will just import them and then in cm variable we will print the confusion matrix of y test and y predictions. So we will print it and we see here the accuracy score also which is 98%. So our random force classifier is giving us a very good accuracy of 98%. And you can see a confusion matrix that only two cases have been mclassified. Rest all the cases have been correctly classified by random forest classifier. Okay. So now let's move on to printing the classification report of Y test and Y prediction. Let's see. And we get the precision as 96%. That means the two predictions by the algorithm is 96%. The recall or the true prediction rate is 100% which is very nice. And F1 score is also good which is 98%. So this is giving us a good result. But what if we change the criteria from entropy to genie. So let's just experiment with that too. So let's try this with the different number of trees and change the criteria to gen coefficient. So now again from skarn. semble we will import random forest classifier and fit it. Okay. And here what we are doing is just we are using seven trees. Previously we used five and now in the criteria we will use jinny coefficient and random state is zero. So let's run this and see whether there's a change in accuracy or not. And let's predict this and let's check the accuracy score. What is the accuracy score for this random forest classifier with seven trees. So we get 99% accuracy with changing the criteria and changing the number of trees. So you can just experiment with different number of trees and different number of decision trees. Let's just experiment with you know 12 decision trees and see what happens. So you can see the accuracy reduced to 98%. Okay with seven we were getting 99. So let's just keep seven because it is giving us really good accuracy. So this is about random forest classifier and how it works with several trees and different criteria to give us very good accuracy on our training and test data. So let us understand when we have machine learning why do we need deep learning that is we'll look at various limitations of machine learning. Now the first limitation is high dimensionality of the data. Now the data that is now generated is huge in size. So we have a very large number of inputs and outputs. So due to that machine learning algorithms fail. So they cannot deal with high dimensionality of data or you can say data with large number of inputs and outputs. Now there's another problem as well in which it is unable to solve the crucial AI problems which can be natural language processing, image recognition and things like that. Now one of the biggest challenges with machine learning models is feature extraction. Now let me tell you what are features. So in statistics we consider features as variables. But when we talk about artificial intelligence these variables are nothing but the features. Now what happens because of that the complex problems such as object recognition or handwriting recognition becomes a huge challenge for machine learning algorithms to solve. Now let me give you an example of this feature extraction. Suppose if you want to predict that whether there will be a match today or not. So it depends on a various features. It depends on the whether the weather is sunny, whether it is windy, all those things. So we have provided all those features in our data set. But we have forgot one particular feature that is humidity. Now our machine learning models are not that efficient that they will automatically generate that particular feature. So this is one huge problem or you can say limitation with machine learning. Now obviously we have limitation and it won't be fair that if I don't give you the solution to this particular problem. So we'll move forward and understand how deep learning solves these kind of problems. Now as you can see that the first line on your slide which says that deep learning models are capable to focus on right features by themselves requiring little guidance from the programmer. So with the help of little guidance what these deep learning models can do they can generate the features on which the outcome will depend on and at the same time it also solves the dimensionality problem as well. If you have very large number of inputs and outputs, you can make use of a deep learning algorithm. Now, what exactly is deep learning? Again, since we know that it has been evolved by machine learning and machine learning is nothing but a subset of artificial intelligence and the idea behind artificial intelligence is to imitate the human behavior. The same idea is for the deep learning as well is to build learning algorithms that can mimic brain. Now, let us move forward and understand deep learning what exactly it is. Now the deep learning is implemented with the help of neural networks. And the idea or the motivation behind neural networks are nothing but neurons. What are neurons? These are nothing but your brain cells. Now here is a diagram of neuron. So we have dendrites here which are used to provide input to our neuron. As you can see we have multiple dendrites here. So these many inputs will be provided to our neuron. Now this is called cell body and inside the cell body we have a nucleus which performs some function. After that output will travel through exxon and it will go towards the exxon terminals and then this neuron will fire this output towards the next neuron. Now the studies tell us that the next neuron or you can say the two neurons are never connected to each other. There's a gap between them. So that is called a synapse. So this is how basically a neuron works like and on the right hand side of your slide you can see an artificial neuron. Now let me explain you that. So over here similar to neurons we have multiple inputs. Now these inputs will be provided to a processing element like our cell body and over here in the processing element what will happen summation of your inputs and weights. Now when it moves on then what will happen this input will be multiplied without weights. So in the beginning what happens these weights are randomly assigned. So what will happen if I take the example of x1? So x1 multiplied by w1 will go towards the processing element. Similarly x2 and w2 element and similarly the other inputs as well and then summation will happen which will generate a function of s that is f of s. After that comes the concept of activation function. Now what is activation function? It is nothing but in order to provide a threshold. So if your output is above the threshold then only this neuron will fire otherwise it won't fire. So you can use a step function as an activation function or you can even use a sigmoid function as your activation function. So this is how an artificial neuron looks like. So a network will be multiple neurons which are connected to each other will form an artificial neural network and this activation function can be a sigmoid function or a step function that totally depends on your requirement. Now once it exceeds the threshold it will fire after that what will happen it will check the output. Now if this output is not equal to the desired output so these are the actual outputs and we know the real outputs. So we'll compare both of that and we'll find the difference between the actual output and the desired output. On the basis of that difference, we are again going to update our weights and this process will keep on repeating until we get the desired output as our actual output. Now this process of updating weight is nothing but your back propagation method. So this is neural networks in a nutshell. So basically deep learning is implemented by the help of deep networks and deep networks are nothing but neural networks with

### Random Forest [6:19:50]

multiple hidden layers. Now what are hidden layers? Let me explain you that. So you have inputs that comes here. So this will be your input layer. After that some process happens and it'll go to the next node or you can say to the hidden layer nodes. So this is nothing but your hidden layer one. So every node is interconnected if you can notice. After that you have one more hidden layer where some function will happen and as you can see that again these nodes are interconnected to each other. After this hidden layer two comes the output layer and this output layer again we are going to check the output whether it is equal to the desired output or not. If it is not we are again going to update the weights. So this is how a deep network looks like. Now there can be multiple hidden layers. There can be hundreds of hidden layers as well. But when we talk about machine learning that was not the case. We were not able to process multiple hidden layers when we talk about machine learning. So because of deep learning we have multiple hidden layers at once. Now let us understand this with an example. So we'll take an image which has four pixels. So if you can notice we have four pixels here among which the top two pixels are bright that is they're black in color whereas bottom two pixels are white. Now what happens we'll divide these pixels and we'll send these pixels to each and every node. So for that we need four nodes. So this particular pixel will go to this node. it This pixel will go to this node. And finally, particular node that I'm highlighting with my cursor. Now what happens? We provide them random weights. So these white lines actually represent the positive weights and these black lines represents the negative weights. Now this particular brightness when we display high brightness, we'll consider it as negative. Now what happens when you see the next output or the next hidden layer, it'll be provided with the input with this particular layer. So this will provide an input with positive weight to this particular node and the second input will come from this particular node. Since both of them are positive, so we'll get this kind of a node. Similarly, this node as well. Now when I talk about these two nodes, the first node over here, so this is getting input from this node as well as from this node. Now over here we have a negative weight. So because of that, the value will be negative and we have represented that with black color. Similarly over here as well we're getting one input from here which has a negative weight and the another input from here which has again has a negative weight. So accordingly we get again a negative value here. So these two becomes black in color. Now if you notice what will happen next we'll provide one input here which will be negative and a positive weight which will be again negative and this will be also negative and a positive weight. So that will again come out to be negative. So that is why we have got this kind of a structure. If you notice this, this is nothing but the inverse of this particular image. When I talk about this node over here, we are getting the negative value with a positive weight which is negative and a negative value with a negative weight which is positive. So we are getting something which is positive here. Now obviously I want this particular image to get inverse. I want these black strips to come up. So what I'll do I'll actually calculate the inverse by providing a negative weight like this. So over here I've provided a negative weight it'll come up. So when I provide a positive weight so it'll stay wherever it is. After that it'll detect and the output you can see will be a horizontal image not a solid not a vertical not a diagonal but a horizontal and after that we are going to calculate the difference between the actual output and the desired output and we are going to update the weights accordingly. Now this is just an example guys. So guys this is one example of deep learning where what happens we have images here we provide these raw data to the first layer to the input layer then what happens these input layers will determine the patterns of local contrast or it'll fixate those which means that it will differentiate on the basis of colors and luminosity and all those things. So it'll differentiate those things and after that in the following layer what will happen it'll determine the face features it'll fixate those face features. So it'll form nose, eyes, ears, all those things. Then what will happen? It'll accumulate those correct features for the correct face or you can say that it'll fixate those features on the correct face template. So it'll actually determine the faces here as you can see it over here. And then it'll be sent to the output layer. Now basically you can add more hidden layers to solve more complex problem. For example, if I want to find out a particular kind of face, for example, a face which has large eyes or which has light complexion. So I can do that by adding more hidden layers and I can increase the complexity also at the same time if I want to find which image contains a dog. So for that also I can have one more hidden layer. So as and when hidden layer increases we are able to solve more and more complex problem. So this is just a general overview of how a deep network looks like. So we have first patterns of local contrast in the first layer. Then what happens? We fixate these patterns of local contrast in order to form the face features such as eyes, nose, ears etc. And then we accumulate these features for the correct face and then we determine the image. So this is how deep learning network or you can say deep network looks like. And I'll give you some applications of deep learning. So here are a few It can be used in self-driving cars. So you must have heard about self-driving cars. So what happens? It'll capture the images around it. It'll process that huge amount of data and then it'll decide what action should it take. Should it take left, right? Should it stop? So accordingly it'll decide what action should it take and that will reduce the amount of accidents that happens every year. Then when we talk about voice control assistance I'm pretty sure you must have heard about Siri. All the iPhone users know about Siri right. So you can tell Siri whatever you want to do it'll search it for you and display for you. Then when we talk about automatic image caption generation. So what happens in this whatever image that you upload the algorithm is in such a way that will generate the caption accordingly. So for example, if you have say blue colored eyes, so it'll display a blue colored eye caption at the bottom of the image. Now when I talk about automatic machine translation, so we can convert English language into Spanish. Similarly, Spanish to French. So basically automatic machine translation, you can convert one language to another language with the help of deep learning. And these are just few examples guys. There are many other examples of deep learning. It can be used in game playing. It can be used in many other things. And let me tell you one very fascinating thing that I've told you in the beginning as well. With the help of deep learning, MIT is trying to predict future. So yeah, I know it is growing exponentially right now guys. What is TensorFlow? TensorFlow is a popular open-source framework developed by Google for building and training machine learning models. It's like a toolkit for creating artificial intelligence systems. TensorFlow can be used for various tasks including neural networks, computer vision where computers learn to see and understand images and natural language processing where computers can understand and use human language. So now we got an understanding of TensorFlow. So let's get deeper into it. TensorFlow is a versatile machine learning framework that utilizes tensors, multi-dimensional arrays and computational graphs to perform operations. This architecture make it adaptable and scalable for various machine learning task. TensorFlow caters to users with different levels of expertise by offering both highle APIs like keras for simplified model building and low-level APIs for greater customization. Furthermore, its capabilities with CPUs, GPUs, and TPUs ensures its suitability for both small scale research and large scale production environments. Now that you are familiar with TensorFlow, so let us see its significance in AI and machine learning. TensorFlow is a powerful platform that empowers developers to transform AI and ML ideas into scalable solutions. Seamlessly transitioning from research prototypes to real world applications. It is adaptable to diverse needs with features like visualizations and debugging tools that enhance model understanding and troubleshooting. TensorFlow streamlines the entire machine learning pipelines enabling efficient handling of large data sets and complex tasks while ensuring scalability and performance. Its versatility allows it to be customized for various applications from simple experiments to advanced AI systems. Ultimately, TensorFlow has a real world impact by enabling developers to create innovative solutions that address global challenges and drive technological progress. Now that you know the significance of TensorFlow, so let's discover why to use TensorFlow. TensorFlow is one of the leading deep learning frameworks widely used for machine learning and AI research and production. Its scalability allows TensorFlow to handle massive data sets and complex models efficiently making it ideal for large scale AI systems in applications like image recognition and natural language processing. Next, flexibility. Flexibility is another key strength as TensorFlow offers APIs ranging from highle kas for simplicity to low-level APIs for advanced customization catering to diverse developers needs. Next, its rich ecosystem includes an active community, extensive documentation, pre-trained models, and numerous resources that simplify its adaption and usage. TensorFlow's cross-platform support enables seamless deployment across different operating systems and hardware platforms. Finally, its optimized performance as TensorFlow runs efficiently on CPUs, GPUs, and TPUs, ensuring faster training and inferences times. These trends such as scalability, flexibility, ecosystem, crossplatform compatibility, and optimized performance make TensorFlow a preferred choice for AI and ML projects. Now that we know what is TensorFlow. So let's compare TensorFlow with other frameworks. So here is the comparison between TensorFlow and another deep learning framework likely PyTorch highlighting their respective strengths. TensorFlow is known for its flexibility and versatility enabling developers to build a wide range of models with customizable implementations. While PyTorch is recognized as initiative and Pythonic, offering a user-friendly approach that appeals to many developers. Next, when it comes to production readiness, TensorFlow excels with robust tools for deploying models in real world environments, whereas PyTorch is more research focused, favored for its dynamic computational graphs and ease of experimentations. Next, TensorFlow boost a large and established ecosystems with an active community, extensive documentation, and a proven track record. While PyTorch is growing rapidly in academic circles, gaining traction as in popular choice for research projects. In terms of performance, TensorFlow is optimized and scalable, capable of handling large scale models and data sets efficiently. While PyTorch is seen as a competitive alternative, offering its own strengths and appealing to a different subset of developers. Ultimately, the choice between TensorFlow and Python depends on the specific use case and developer preferences as both frameworks offer compelling features suited to different needs. Next, let's explore the real world applications of TensorFlow. TensorFlow in computer vision showcasing its versatility and impact across various fields. TensorFlow is widely used for identifying objects in images. With algorithms like YOLO, you only look once and SSD singleshot detector enabling tasks such as detecting pedestrians and obstacles in self-driving cars or identifying suspicious object in security systems. It plays a crucial role in medical imaging and satellite imagery where it aids in analyzing X-rays or MRIs to detect anomalies and assist in monitoring deforestations, identifying land use patterns and predicting natural disasters. Additionally, TensorFlow powers security systems and user authentications enabling facial recognitions for tasks like facial detection and identifications. This application highlights TensorFlow's ability to transform industries like healthcare and security making it an indispensable tools in computer vision. Next, TensorFlow in natural language processing. TensorFlow is instrumental in task like spam detections and sentiment analysis where it helps identify spam emails and determine the emotional tone or polarity which is positive, negative or neutral of text such as customer reviews or social media post. It provides services like Google translate enabling accurate translations between numerous languages and facilitating global communications. TensorFlow also enhance security systems and user authentication by analyzing text data to detect suspicious patterns, fraudulent activities or by improving authentication process through textbased inputs like passwords and security questions. These applications demonstrate TensorFlow's versatility in advancing communication, enhancing security and driving innovations in NLP. TensorFlow in the field of generative AI, showcasing its role in driving innovation. TensorFlow powers GANs as similar models enabling the creation of realistic images, art and even the manipulation of existing visuals. It is instrumental in training large language models like GPT which stands for generative pre-trained transformers, translate languages, write creative content and perform summarization task. Additionally, TensorFlow facilitates the creation of deep fake audio and speech generation, producing synthetic media where a person's likeliness or voice can be convincingly replicated. This applications demonstrate TensorFlow's pivotal role in advancing image creation, text generation and audio synthesis within the field of generative AI. Now, let us explore the industrial use of TensorFlow. There are diverse applications of TensorFlow across various industries showcasing its transformative impact in healthcare. TensorFlow is used for predicting analytics to forecast disease outbreaks, identify high-risk patients, and optimize treatment plans as well as for medical image analysis to detect anomalies in X-rays, MRIs, and CT scans. In autonomous vehicles, TensorFlow powers object detection systems, enabling safe navigation and supports decision making models in industries like supply chain management and risk assessment. In finance, TensorFlow is utilized for algorithmic trading, analyzing market trends and detecting flaw transactions. Retail applications include inventory management to predict demand and reduce stock outs along with personalized recommendations to enhance customer experience and boost sales. In entertainment, TensorFlow facilitates content creation such as generating music or art and it is used in video and audio processing task like noise reduction and voice stabilization. Overall, TensorFlow's versatility and advanced capabilities are driving innovation across these industries. Now, let us move ahead and install TensorFlow. To get started with TensorFlow, you first need to install the necessary prerequisites, including Python 3. 5 or a higher version. You can use a package manager like PIP or cond for installation. Here's how you can proceed based on your operating system. First, open your terminal or command prompt and run the following command to create a virtual environment. And the command is python - m virtual environment my environment. Next, activate the environment. For Linux or Mac OS, activate the environment with the following command. And the command is source my environment/bin/activate. For Windows, activate the environment by running my environment backward slashcripts backward slash activate. Once the environment is activated, you can install TensorFlow by running pip install TensorFlow. Now to ensure that TensorFlow has been installed successfully, you can verify the installation by running the following command and that is Python C. Inside the double quotes type import tensorflow as TF print inside the bracket typef dot double version double underscore and this will display the installed version of tensorflow confirming that the installation was successful. Once the environment is activated you can install tensorflow by running this command pip install tensorflow. Now let us open our vs code terminal and install the tensorflow. So let us type the command pip install TensorFlow. As you can see TensorFlow is installed. Now to ensure that TensorFlow has been installed successfully, you can verify the installation by running the following command. So now let us run this command in our terminal. Now as you can see on the screen, the version is displayed. So this will display the installed version of TensorFlow confirming that the installation was successful. Now let us discuss TensorFlow ecosystem. The TensorFlow ecosystem provides a comprehensive set of tools for building, training and deploying machine learning models. At its core is TensorFlow the foundation of the ecosystem. TensorFlow light enable running models on mobile and embedded devices while TensorFlow extended supports building production grade ML pipelines including data validation and model serving. Next, the TensorFlow model garden offers pre-trained models and examples for tasks like image classification and NLP. TensorFlow. js allows running ML models in web browsers and TensorFlow Hub provides a library of pre-trained models for easy integration into projects. Now let us take a look at the key capabilities of TensorFlow where we will be showcasing its strengths as an open-source communitydriven framework that evolves through contributions. At its core, TensorFlow utilizes tensors, multi-dimensional arrays for efficient data representation and manipulations. Its flexible architecture allows developers to choose between static graphs for optimized performance and eager execution for an interactive development experience. TensorFlow supports a wide range of applications including natural language processing, generative AI, computer vision, and more, making it highly versatile for various machine learning tasks. Furthermore, it offers crossplatform compatibility running efficiently on CPUs, GPUs, and TPUs, enabling developers to leverage the best hardware for their needs. So overall, TensorFlow stands out as an robust, adaptable, and versatile framework for machine learning. Now let us head towards hands-on. There are three main steps in building a churn prediction model using TensorFlow. In step one, a model is created by defining its architecture including layers and parameters tailored to the specific problem of prediction customer churn. In step two, the model is trained using historical data where it learns patterns and relationships that help predict customer behavior. Finally, in step three, the train model is used to make predictions identifying customers likely to churn based on input data. This process demonstrate how TensorFlow can be effectively utilized for churn production in machine learning projects. Now, without any further delay, let's code it. So, first let us import the libraries. So, let us import TensorFlow as TF. Next importlib dot Piplot as plt. Next from tensorflow. kas import sequential. Next from tensorflow dot keras dot layers import flatten comma dense. So over here we are importing the tensorflow library a popular framework for machine learning and deep learning. Next, import mattplots plotting model for data visualization. Then we are importing the sequential class from TensorFlow's KA API which is used to build models layer by layer. Next, we are importing the following kas layers which is flatten and dense. Flattens multi-dimensional data into a 1D array for input into dense layers. And then dense creates fully connected dense layers for the neural network. And this imports set up the tools needed to build and train a neural network and then visualize data and results. Now to get the data and split it for training and testing let us write the command. So inside the bracket let us type train images, train labels and give comma and again inside the bracket let us type test images, test labels and equal to tf dot kas dot data sets dot mnest dot load data. So this code loads the mnest data set of handwriting digits using tensorflow. Here the train images and train labels were images and labels for training 60,000 samples and the test images and test labels are where the images and labels for testing 10,000 samples. Here the each image is a 28x 28 grayscale image and the labels represent the digit 0 to 9. Now let us scale down the values of the pixels from 0 to 255 to 0 to 1. So for that let us type train images equal to train images by 255. 0. Next test images equal to test images by 255. 0. zero. Now next let us visualize the data. For that print and inside the bracket let us type train images dotshape. Next print test images dotshape and finally print train labels. So in this code the print train images dotshape displays the shape of the training images data set. So let us run the code to check the output. So as you can see the output here it says 60,000 28A 28. This means there are 60,000 images each of size 28x 28 pixels. Next we have the code print test images dotshape which displays the shape of the testing images data set. So the output here is 10,000 28 which means there are 10,000 images each of size 28x 28 pixels. The next piece of code is train labels which prints the labels digit 0 to 9 for the training set. Now let us display the first image. So for that let us type plt dot image show and inside the bracket let us type train images at index zero comma cmap equal to gray inside the single code. To see the graph let us type plt dot show. So over here the plt m show or train images. This displays the first image in the data set which says train images of index zero. Then cm map equal to gray ensures that the image is shown in gray scale. So let us run this code. So as you can see the screen here is the output. Now let us move ahead. The next step is to define the neural network model. So let us type my model equal to tf. karas dot models dot sequential function. Next type my model. Okay. Next is my model dot add flatten and inside the bracket let us keep input shape equal to 28 comma 28. Okay. Next let us type my model dot add and inside the bracket let us type dense and open bracket and write 128 comma activation equal to inside the single quote let us type relu. Next type my model dot add and inside the bracket let us type dense. again open the bracket and let us give 10 comma activation equal to soft max. So over here the tf. karas domodel sequential which creates a sequential model where layers are added one after another. And next we have this code my model. flatten input shape equal to 28 which flattens the 28x28 input images into 1D array of 784 values and making them suitable for the dense layers. And the next quote my model do add dense 128 comma activation equal to relu which adds a dense fully connected layers with 128 neurons. And then the activation equal to relu which applies the relu activation function to introduce nonlinearity. And the next code which is my model do add dense 10 comma activation equal to soft max which adds the output layer with 10 neurons corresponding to the 10digit classes which is 0 to 9 and activation equal to soft max outputs probabilities for each class. Now let us compile the model. The network will be actually created. So let us type my model dot compile and inside the bracket let us give optimizer equal to inside the single quotes let us give adam, loss equal to inside the single code let us keep it as pass categorical cross entrophy comma matrix equal to accuracy Over here the optimizer equal to Adam uses the atom optimizer an efficient and widely used algorithm for optimizing neural networks. This specifies the loss function for multiclass classification where labels are integers. Next the matrix equal to accuracy tracks the model's accuracy during training and evaluation. Next let us train the model. So for that

### KNN Algorithm [6:47:02]

let us type my model dot fit inside the bracket type train images train labels comma epoch equal to three. So here the my model. fit this is a function where it trains the model with the provided data and then we have the train images train labels where the input data train images and corresponding labels which is train labels for training. Next is the epoch equal to three. The model will go through the entire training data set three times to learn patterns. Next let us check the model accuracy on the test data. For that let us type value loss comma value_acc equal to my model dot evaluate and inside the bracket let us type test images test labels. Next let us give a print statement. So type print test accuracy of my model comma value_acc. So here in this code my model do evaluate evaluates the model using the test images and their corresponding labels which returns two values which is value loss the loss on the test set and the value accuracy on the test set. And next statement we have print. So this will display the test accuracy showing how well the model performs on unseen data. Now let us check the output. So let's run the code. So as you can see the screen the model is finding patterns in three iteration because we had set the epoch to be three and here you can see the accuracy is 0. 97%. So let us change the epoch value to be 50. So initially we had set three. So now let us give 50 and run the code. Now as you can see it's making 50 iterations. However, achieving perfect accuracy on the training set could also indicate potential overfitting where the model might not generalize well to new unseen data. So to confirm the model's effectiveness, it is essential to evaluate its performance on a separate validation or test data set. And if the accuracy remains high and the loss is minimal on the test set, the model can be considered robust and effective. Well, human beings are the most advanced species on earth. There's no doubt in that. And our success as human beings is because of our ability to communicate and share information. Now, that's where the concept of developing a language comes in. And when we talk about the human language, it is one of the most diverse and complex part of us. Considering a total of 6,500 languages that exist. So coming to the 21st century, according to the industry estimates, only 21% of the available data is present in the structured form. Data is being generated as you speak, tweet and send messages on WhatsApp or the various other groups of Facebook. And majority of this data exists in the textual form which is highly unstructured in nature. Now in order to produce significant and actionable insights from this data, it is important to get acquainted with the techniques of text analysis and natural language processing. So let's understand what is text mining and natural language processing. So text mining or text analytics is the process of deriving meaningful information from natural language text. It usually involves the process of structuring the input text, deriving patterns within the structured data and finally evaluating and interpreting the output. Now on the other hand, natural language processing refers to the artificial intelligence method of communicating with an intelligence system using the natural language. As text mining refers to the process of deriving highquality information from the text, the overall goal is here to essentially turn the text into data analysis via the application of natural language processing. That is why text mining and NLP go hand in hand. So let's understand some of the applications of text mining or natural language processing. So one of the first and the most important applications of natural language processing is sentimental analysis. Be it Twitter sentimental analysis or the Facebook sentiment analysis, it's being used heavily. Now next we have the implementation of chatbot. Now you might have used the customer chat services provide by various companies and the process behind all of that is because of the NLP. Now we have speech recognition and here we are also talking about the voice assistants like Siri, Google Assistant and Cortana. And the process behind all of this is because of the natural language processing. Now machine translation is also another use case of natural language processing. And the most common example for it is the Google translate which uses NLP to translate data from one language to another and that to in the real time. Now other applications of NLP include spellchecking, keyword search and also extracting information from any doc or any website. And finally, one of the coolest application of natural language processing is advertise on matching. Basically, recommendation of ads based on your history. Now, NLP is divided into two major components that is the natural language understanding and the natural language generation. The understanding generally refers to mapping the given input into natural language into useful representation and analyzing those aspects of the language. Whereas generation is the process of producing meaningful phrases and sentences in the form of natural language from some internal representation. Now the natural language understanding is usually harder than the natural language generation because it takes a lot of time and a lot of things to usually understand a particular language especially if you are not a human being. Now there are various steps involved in the natural language processing which are tokenization, stemming, limitization, the poss tags, name entity recognition and chunking. Now starting with tokenization. Tokenization is the process of breaking strings into tokens which in turn are small structures or units that can be used for tokenization. So if we have a look at the example here, taking this sentence into consideration, it can be divided into seven tokens. Now this is very useful in the natural language processing part. Now coming to the second process in natural language processing is stemming. Now stemming usually refers to normalizing the words into its base or the root form. So if you have a look at the words here we have affectation affects affections affected affection and affecting. Now all of these word originate from a single root word and as you might have guessed it is affect. Now stemming algorithm works by cutting off the end or the beginning of the word taking into account a list of common prefixes suffixes that can be found in an infected word. This indiscriminate cutting can be successful in some occasions but not always. So let's understand the concept of limitization. Now lemitization on the other hand takes into consideration the morphological analysis of the word. To do so, it is necessary to have a detailed dictionary which the algorithm can look through to link the form back to its original word or the root word which is also known as lema. Now what limitization does is groups together different infected forms of the word called lema and is somehow similar to stemming as it maps several words into one common root. But the major difference between stemming and leatization is that the output of the limitization is a proper word. For example, a lemitizer should map the word gone, going and went into go. That will not be the output for stemming. Now, once we have the tokens and once we have divided the tokens into its root form, next comes the POS tags. Now, generally speaking, the grammatical type of the word is referred to as POS tags or the parts of speech. Be it the verb, noun, adjective, adverb, article and many more. It indicates how a word functions in meaning as well as grammatically within the sentence. A word can have more than one part of speech based on the context in which it is used. For example, let's take the sentence Google something on the internet. Here Google is used as a verb although it's a proper noun. Now these are some of the limitations or I should say the problems that occur while processing the natural language. Now to overcome all of these challenges we have the named entity recognition also known as NE. So it is the process of detecting the named entities such as the person name, the company names. We have the quantities or the location. Now it has three steps which are the noun phrase identification, the phrase classification and entity disambiguation. So if you look at this particular example here, Google CEO Sundap Pichai introduced the new Pixel 3 at New York Central Mall. So as you can see here, Google is identified as a organization Sundap Pay as a person. We have New York as location and Central Mall is also defined as an organization. Now once we have divided the sentences into tokens, done the stemming, the limitization, added the tags, added the name, entity recognition, it's time for us to group it back together and make sense out of it. So for that we have chunking. So chunking basically means picking up individual pieces of information and grouping them together into the bigger pieces. Now these bigger pieces are also known as chunks. In the context of NLP, chunking means grouping of words or tokens into chunks. So, as you can see here, we have pink as an adjective, panther as a noun, and the as a determiner. And all of these are together chunked into a noun phrase. Now, this helps in getting insights and meaningful information from the given text. Now, you might be wondering where does one execute or run all of these programs and all of these function on a given text file. So for that Python came up with NLTK. Now what is NLTK? NLTK is the natural language toolkit library which is heavily used for all the natural language processing and the text analysis. So what is NLP? Natural language processing or in short NLP is a automatic way of presenting or processing human language. What I'm trying to say here is here we try to develop applications and services in order to understand human language. Some of the practical examples of NLP are Google voice search, sentiment analysis and many more. As I mentioned earlier, we use NLP to extract meaningful data from textual data. Right? So do you think NLP is a magical tool that when a text is passed, we get a desired output? Well, this isn't the case. But as a matter of fact, raw text input data has to go through various stages just so that we can perform operations on the textual data set. As you see here in the pipeline, a raw text data under goes data cleaning which involves steps like tokenization, stop word removal, limitization and many more. The next step is vectorization where we convert our text data into numerical format. Finally, based on the requirements, we perform the classification task. All right. Now, let's see few of these steps in detail. Starting off with cleaning our data. As mentioned earlier here the goal is to convert raw text into clean text data. This involves steps like tokenization, stop word removal, stemming and many more. Speaking about tokenization, tokenization is essential for splitting a sentence or a paragraph or a entire text documents into smaller units such as individual words or phrase. Each of these smaller units are then called as tokens. Then we have stop word removal. Stop word removal in general refers to filtering words whose presence in a sentence make no difference to the analysis of our data. So why do we have to remove them? Well, we remove the stop word just so that you know our model doesn't get more complicated. In the next step, we have something called a stemming. Stemming is a process of reducing a word into its root form. What I'm trying to say here is with stemming, we're basically removing the prefix. For example, consider a word giving, right? So once a stemming is performed on giving, the giving ends up becoming give. Moving ahead we have vectorization. Text vectorization is a process of converting text into numerical representation. Here we end up creating something called as bag of word model which is a model that signifies or represents a text and describes occurrence of text in that word document. Finally coming down to classification task. Text classification also known as text tagging or text categorization is a process of categorizing text into organized groups by using natural language processing. Text classification can automatically analyze text and then assign a set of predefined tags or categories based on the content. Now let's move ahead and understand an open-source tool called as NLTK. NLTK stands for natural language toolkit. This toolkit is one of the most powerful NLP libraries which contain packages to make machine understand human language and reply them in an appropriate desired response. So why do we need NLTK? You'll see NLTK has many built-in package to process our textual data at every stage. We can perform tasks like data cleaning, visualization, vectorization that will help us in classifying our text. So let me now move to my code editor and show you how we can pre-process or clean our data using NLTK. All right guys, as you can see here, I'm going to use Google Collab. Okay, although you can use any of the code editor like Jupyter Notebook or Visual Studio Code, but I would prefer to go with this. Okay. So in the next stage now we need a data set right. So where will I get that? So in order to get a data set what I'm going to do is from skarn dot data set import fetch 20 news. Right. This fetch 20 news groups will give us a data set. So let's quickly see how that would look like. All right. So if I have to execute this I just need to press shift enter. Okay. So let's see our textual data now. Okay. text data is nothing but we are going to create instance of our fetch 20 news group. So we'll copy this here and call it. So what would this return? This would return us a bunch right bunch of object. So let's now quickly execute this. If you're downloading this for the first time, it would take some time to download our data set. Okay. And this data set is present in this particular link. Fine. So this is done now. So let's quickly look at how this data would look like. Okay. So in order to do that we are going to use uh like type data text data and then you will see here this is a bunch. So let's now import our numpy and convert this back to a list because we cannot perform any operations on bunch right we have to convert this into a list. So import numpy as np. Uh let's execute this. And now what we're going to do is let's give it as raw text. So raw text over here would be equal to text data. Okay. Dot data. Okay. So let's print how this data would look like. Raw text. Okay. So as you can see here we have a huge amount of data set. Fine. And all of these are separated by commas. Let me quickly show you here. Okay. So as you can see first off we have a list and then we also have a huge sentences or you can say a paragraphs which are separated by commas. And make sure you don't confuse this to CSV. And now what we're going to do is as a list right we don't want to take the entire data set because it's going to be computationally expensive and apart from that it will take more time to execute right. So, and to get a better understanding of what we're doing, we're going to take only first four paragraphs or sentences, I would say. Fine. So, in order to do that, all I'm going to do is I'm going to use slice operation have a colon and put four. So, let's see how this would look like now. Okay. So, as you can see here, we just have first four paragraphs of sentences. Let me zoom out here just so that you get a better view. So, yeah, fine. So, now we have our text data and we are supposed to start cleaning our data, right? So let's start doing that. What we're going to do is first off let's reduce all the upper cases into lower case. Okay. So let me just give here a text and give a heading step stage one convert into lower text. Fine. And now what we're going to do is in order to convert to a lower text. So we'll just give here lower text. So this is just going to be an array. Let's ignore this for now. So as we supposed to do this like we supposed to no matter what input you take right like whether it's for a training or a testing data set you obviously have to convert it rather than writing the entire code here let's give it a method so that you know next time if you want to lower our text we just call the method so def lower this is the name of our method and then we're going to pass data here that would be an argument okay and we have a for loop now for words in this data raw text fine for words in raw text what we're going to do is we have to append it to this part so let's give here as clean text in order to get a better understanding clean text stage one so I hope this is fine okay let me quickly copy this and we're going to append this part dot append okay so we're supposed to convert our words into lower case right so it's going to be str lower this is a built-in method and all we're going to do is we're going to pass as words. Okay. So now we have a method. So let's call this method. So to lower and all we're going to do is we're going to pass our raw text input data. Raw text. Okay. So it's not st it's going to be str here. Let me quickly execute that once again. Okay. Perfect. So now what we're going to do is let's compare our raw text with this. So we have something like clean text stage one right cl. So let me execute this. So as you can see we have from which is capital upper case and we have what car is this and here everything has been converted to a lower case. Okay in the next stage we have tokenization. As I've mentioned earlier what we do with tokenization is whatever the sentence is there or a paragraph we convert that into either you know individual sentences or into individual words. So in order to convert this into words we have something called as word tokenizer. And to convert this into a sentence we have sent tokenizer. So let me quickly go here and give this name of the next block as stage two. And this is going to be tokenize. Okay. And let's see how our tokens would look like. Fine. Same like before we have to write one array here. So let's give it as clean text two. And this is going to be empty. Right. And now what we're going to do is from NLTK dot tokenize import sentence tokenizer. Okay. And then we also need word tokenizer. Here I'm going to show sentence tokenizer only for the demo sake. We're going to use word tokenizer in future. Okay. So before this we also have to download something that's called as punkit. Okay. So import ntk and then we have nltk. d download we have something called as punkit fine let me shift enter all right so now what we're going to do is we are going to perform sentence tokenizer okay for words or sentence in obviously we need this data right clean text one so clean text one what we're going to do is sentence we're going to use send tokenizer as simple as that and we are going to pass this sentence Okay. And now in order to obviously we have to store this somewhere. We're going to store this in a new variable here. Sent tokenize. And this is just for showing you, right? the demo. That's why I'm not going to use clean text three over here or two. Okay. So we'll append this sent dot append. And we're going to append sentence here. Okay. So looks good. So let's see how this would look like. So as you can see here, right? Earlier we have a single dimension array. Okay. And here this each of these over here represents a paragraph. Now within a paragraph we all know we have multiple sentences. So as you can see now we have become two dimension array and each of these within this array represents a paragraph and each of the sentence in this paragraph has become a word or one particular character. Okay. So now we'll do word tokenize and we move ahead. We need word tokenization. So we'll add this to a clean text too. Okay. Because this is a part of this thing. So what we're going to do is we're going to perform word tokenization. So let me give a comment here. Okay. So now what I'm going to do is for word tokenize you know rather than writing this for loop I'm going to show you a simple and easy way. What we were doing so far is we are initializing this array here and then writing a for loop and appending it. Right? We can also do something called as you know list comprehension. Let me just show you what it is. So we have this right clean text two. Now in list comprehension what will happen is the for loop we write it within our list. Okay. So let's see word tokenize and we obviously want some kind of textual data over here which will fill it in a while. So over here I'm going to write for loop for I in clean text data. Okay. And now whatever words that come out or sentences I want them to be tokenized. Okay. So let us now see how this clean text would look like. So as you can see here we have every word within this sentence converted into form of tokens. Let me scroll this up so that you can see how it looks. So still it's a two dimension array. You know whatever is there within this first array here within this array represents a paragraph. Okay. So everything over here has been converted into a single word tokens. Fine. So next stage right we want to remove some punctuations. You see in our data set we have some special characters, punctuations. We don't want these edus and dots. So in order to do that we are going to use something called as regular expression. Okay. So let me quickly show you how we can implement this using regular expression. So to do that what we're going to do is we are going to import regular expression. Fine. And now what we want is as this is a two dimension data. So we need two for loops. Unlike previous we had only one dimension data but now it's going to be 2D data structure. Okay. So we'll create something here an empty list which is clean text 3 and then we're going to have a for loop. So for words in clean text two we're going to create one array here and now we're going to have another for loop to access inner words. Right? So for W in words now for regular expression what we are going to do is we're going to show a pattern. Okay. So S is equal to so E dots substitute. So wherever that particular character is there we want to substitute it with something else. So dots substitute then we have R which is going to be obviously a comment over here. Then we have a carrot symbol followed by W followed by S. We want to replace this with an empty string here. and then word. All right. And now what we're going to do is if is not null, if is not empty, then what we want to do is we want to append that word. Okay. Clean dot append w. Finally, we have to add this clean to our main array. Okay. So, let's define that array as well. Clean text. Oh, yeah. You have to give it an array here. Correct. Fine. So let us now append this to our array. So it's going to be clean text 3 dot append and then we're going to pass just this clean array here. Fine. Okay. So let's now see how our this thing looks like. So we have clean text three, right? So clean text three. You know according to our analysis all of these semicolon everything should disappear by now. Okay. So as you can see here, we don't have any special characters within our data set. Okay, still this is a 2D array but we don't only difference over here is that we don't have any special characters. We have only alpha numeric values. Right? Okay, this is great. So in our next stage we're going to remove stop words. I hope you remember what is stop words. As I mentioned earlier, stop words is nothing but you know those words which are most commonly repetitive. Okay. So let me now in order to do that let me just show you import NLTK. Okay. We obviously going to download the stop words. Okay. So we have NLTK dot download and we'll download stop words. Although we can find the stop words on Google. What we can do is we can copy those top words put them in the form of a list and if that stop word is present within our data set here make sure you don't include that in our clean text for stage. Moving ahead let me just give a title here for stop word removal. Okay. So now that we have downloaded our stop word what we're going to do is from nltk corpus import stopwards. Looks great, right? Okay. So now what we're going to do is we'll have clean text 4. Okay. Similar to the previous for loop, we going to have four words in clean text three. And then we'll just create an empty list which is going to be appended to clean text form. We'll have four word in words. If the clean text three contains this words, right, the stop words, we're going to eliminate that. Okay. So if not word in stop word dot words here we're going to pass the language. Okay, if the word is not present in this list, what we're going to do is we're going to append because that means that word is not a stop word, right? We're going to append that word. Okay, and now we have to append this to our clean text 4. So clean text for dot append this list the view. So let me quickly execute this. It would take some time to execute because it has to go through a lot of data set, right? So please be patient and let's see how it would look like. Okay. So this has finally executed and let's now see how this would look like. So clean text 4. Okay. So as you can see here we have removed couple of unnecessary words and we have just the important ones. All right. Now so as you can see we have removed couple of words over here and yeah so now moving ahead to our next stage that is stemming. I hope you remember what is stemming. Stemming is nothing but you know whatever word we have to convert that into a root form. So over here data processing is there after stemming it becomes data prep-process. Okay that's what is stemming. So let me quickly show you how we can perform stemming here. Okay. So as I mentioned earlier we use stemming just to remove this prefix right. So for in order to perform stemming we have various types of stemmers. So we have something like porter stemer snowball stemmer lancaster stemer. So today we're going to use porter stemer right. So in this particular example, so let's get that now from NLTK stem. Porter import porter stema. All right. So once we have imported this, we going to create an instance of our porter stemer. So port is equal to porter stema. Okay. So just to give an example of what we're going to do here. Let's take three words. Okay. Let's take a list. Let's say list is a, right? So a has couple of words. Rather than having words, let's pass a list within our list. Okay. So we'll have port dot stem. This is how we call a stemmer. Okay. And here we're going to pass the word that we want to stem, right? And now we're going to have for loop for i in. Okay. So now let's pass couple of words like let it be like reading, washing. Let's give one word which doesn't have a prefix like wash. Then let's give driving. Okay. So let's now print this. So this going to be I here. And what output that we I'm expecting over here is that reading should be converted into read. Washing will be converted to wash and then wash would remain the same because there's no prefix. And driving would be converted to driving drive. So let me just print here and execute this part. I hope you can see this. It read becomes read, wash remains wash and rest everything remains the same. Okay. I have done a small typo here. So it's going to be driving. Okay. So let me execute this once again. So as you can see here we have successfully removed all the prefix and they do make some sense. Okay, this is the case in case of port stema. But this doesn't hold good for when we trying to use Lancaster stemer or snowball stemer. So let us now quickly move ahead and see how we can implement the stemer in our data set here. So we all know we need our loop here. Okay. So before that we're going to have array. So clean text five and it's going to be an empty list. So now we'll have a for loop another empty list. Okay. So now we're going to pass this list and we're going to append. So it's going to be w. append. It's going to be word. Okay. And now we're going to append this smaller list to this one. So clean text 5 dot append w. I hope this is done right. Let me quickly execute this. Okay. And let me see how this plain text file looks like. Okay. So as you can see here we don't have any more uh yeah obviously there is there are some errors. That's because you know this stemer might not recognize it. That's why we have multiple other stem. But most of the places you can see I know our words have been converted down to the stem words. All right. So I hope now you understood how to perform stemming. But you know as I've mentioned earlier we have multiple stemmers. We have like porter stemer we have like lancaster stemmer and each of those stemmers are unique in their own way. Sometimes what happens is when we perform stemming we get words which makes no sense right and that thing can sometime be really annoying. So in order to overcome that we have something called as limitization. So let me quickly show you what limitization is. So let me just give limitization here. Okay. So in order to get this limitization we use something called as word net. Okay. So from NLTK dot stem and this is a form of stemming but it makes sure the word which is being outputed has some sense okay import word net limitizer okay and we're going to create an instance of this so it's going to be word net or let's give word net w ne it's going to be word net limitizer okay and now we obviously have to download couple of packages so import nk

### Naive Bayes Classifier [7:19:27]

And then we have nltk. d download word net. Fine. Perfect. So now in the same way what we're going to do is we'll create just limitize words. All right. So we have limitized words here. L e mm still be lemm and then we have an empty array. And then we'll have a for loop for words in clean text 4. It's not going to be five obviously. is going to be clean text 4 because this is the form of stemming, right? And now what we're going to do is just the same drill. We have W which should be an empty list and another for loop and we are going to append whatever is there. W. append the limitize words. Okay. So word dot limitize and whatever word we want to limitize. So it's going to be word here. And now once this is done, we're going to append this W to our bigger lim. So lem. append W. And let me execute this now that this is done. So let's see how this would look like. Okay. So LEM let's print this. Okay. And let me just execute this part here. Okay. We cannot print this. That's because it's saying data is too long. So rather than printing I'll just do this part here lem so that we can just see some glimpse of how our data looks like. Okay. So yeah so as you can see here although we are performing limitization but now the words make sense just to give you a brief insight right let's compare how our data looked earlier and how it looks now. Okay. So what we'll do is we'll take our raw text that is this part here and we'll compare this with our final text which is nothing but clean text four right that is after stemming sorry it's going to be clean text five okay so let's now see how it would look like so let me quickly print them print raw text okay let me execute this first all right and now let me print clean text live. Okay, as the data is pretty huge, what I'm going to do is I'm just slice this up over here. Okay, let me just take our first one data. Okay, this is going to be the first sentence. And as you can see here, we have all the words which have been tokenized. And it's unlike this part here we have everything which is it looks organized. Okay. So obviously we want to pre-process this data right because this makes more sense and it is more easy on a system to analyze and the classification would be pretty accurate as compared to what it would be over here. All right now moving ahead. I hope now you understand why we need to pre-process our data. Okay. So now that we know how to pre-process our data using NLTK let's see how classification of text is done. So okay in order to classify our text we use something called as navebias algorithm. So what is this navebas algorithm? You see before we understand this navebas algorithm right let us see what is classification. In simple words classification means grouping of data based on common characteristics. As you see here we have couple of figures right we have triangles circles and a square. And now when we pass this through a classification algorithm all of those get categorized into different classes. And it's totally based on the shape, size and whatever other features are. This is a similar way how the nave bias algorithm works. Okay. So the principle that drives nave bias algorithm is something called as base theorem. And we use base theorem to calculate the conditional probability. So let us now see the maths behind our conditional probability. All right. Then as I mentioned earlier right we use nas algorithm to perform classification on our textual data and nbas algorithm has something called as base theorem. Okay. And the way this base theorem works is that we have to find a conditional probability. So what is this conditional probability? You see conditional probability we can say mathematically like probability of occurrence of event A when event B has already occurred is equal to probability of occurrence of event B when event A has already occurred times probability of occurrence of event A and this would be normalized by probability of event B. Okay, I'm sure you might be having confusion like what is this right? You see this line over here. This represents the conditional probability. Okay. So this is the conditional probability. Now what we are going to do is let's understand what is this right? So probability of event A and B. So let us now take event A to be like shopping and event B would be something like rain. Okay. So what is the probability of you going to shopping when it has already started raining? So this is what this means. Okay. probability of a B right so this represents and you can also say it as conditional probability so probability of occurrence of event A when B has already occurred okay so when it's already raining what is the probability that you'll be going down for shopping and there are a couple of terminology that you need to know when we are dealing with this let's quickly see that okay so when we dealing with conditional probabilities we have couple of terminologies as I've mentioned so this part over here is referred to as posterior probability Okay, this is the most important part to be found and this is called as likelihood. This part over here, probability of occurrence of event A is called as prior probability. As you can see by its name, right? Prior refers to something that has already occurred. Okay. And this part over here is the most unused part that is nothing but likelihood. Okay. We call this as marginal likelihood. All right. So now speaking about probability, let's see how this concept came into existence. So we have probability only because we have something called as random variables. Okay, this random variables give rise to randomness. Okay, to give you a better understanding of what I'm trying to say, let's take an example. Okay, so we have two bags here. We have say bag one and let's say bag two. Okay. And now what is happening here is in bag one we have balls. Okay. We have red balls. Five of Okay. So do you think probability exists over here? Obviously not. Right. So no matter whichever ball you try to pick out we're going to get red balls. And randomness over here is zero. Okay. So let's take one more bag over here. So this bag has like five red balls and then four blue balls. [snorts] So do you think probability exists over here? Absolutely. Over here you can see if I try to put down my hand and pick up any ball. So probability of getting blue is nothing but the total number of elements right. So we have like five balls and this so it would be 9 divided by total number of blue balls that's nothing but four. Okay. So this is what is probability. So over here we have more randomness. Okay. This is how probability came into existence. And speaking about our conditional probability, let's try to derive this conditional probability equation. Okay. So the way we get this conditional probability is by having P of A intersection B. Okay. We all know this is equal to probability of A by B when B has already occurred. Okay. So this would be our equation one. Okay. Similarly, we know that it holds good for probability of B intersection A. Right. The reason for this is because P of A intersection B and P of B intersection A is commutative. Okay. So this should be similar. Only difference that we're going to have is the change in the values. So instead of A it's going to be B here times P of A. Okay. And now when we equate these two right we are going to get something like probability of occurrence of event A when B has already occurred times probability of B. And this would be equal to this equation over here. All right. So let's bring this down and then we'll have probability of occurrence of event A time probability of occurrence of A whole divided by probability of occurrence of B. So this is called as base theorem and if you see right so this is something which is very much similar to what we had over here. Okay. So this is what is base theorem and this is how we derive it. So now you might be wondering how can I use this base theorem for classification problems. Right. So just a quick recap as I mentioned earlier classification is nothing but you know categorizing a data based on its characteristics. Okay. So here what's going to happen here? We'll have something like you say we'll have a data set, right? So let's take X. So we'll have X data set. So this would be nothing but group of values. Okay, the text data. And then we'll have Y. Y is nothing but the classes and Y refers to the class. And what this class means is it can be like 0 1 and so on and so forth. So here let's take something like 0 and one. And here zero refers to being it not spam and this is spam. And here x would be nothing but group of emails. Okay. So now let's put this in a base theorem and see how it would look like. All right. So over here we have for base theorem we'll have something like probability that an email is spam. Okay. when we already have the email is nothing but probability of this particular email being in spam class times probability of that email all divided by P of X. Okay. And similarly this is for spam email. And now for not spam it would be P of Y is equal to zero. Okay. Given X this should be nothing but given that we have a label of non-spam and then what is the probability of that email being here? This would be times probability of y is equal to 0 all divided by p of x. Okay. So this is how it would look like you know base theorem for finding whether the email is spam or not. So to better understand this right let's see what each of this represents. Okay. So let's take this part here. I'm pretty sure you must be confused what this part represents right. So what this part says is think that we have a data set. Okay. So let's this be our this is our data set and we have x values here and then we have y values. What this x values will have is nothing but emails. Okay. So this is nothing but group of emails and this would be a class. So if the email is spam it would be zero. not spam it would be one and then one and zero so on and so forth. Let's just take it as an example and this is something which is an unknown value. Right? So this part over here is an unknown. See both of these are same. Okay. So as of now let's just consider for a spam email. Okay. So we need to find a new email. Okay. So we'll be given like we'll have a test. So let's call this as a train data. Right? So X train. Okay. And this is the output. This is a class. So now what will happen is I'll be given a email. So I'll be like Janet find out whether this email that I'm giving you is a spam or not spam. So this is going to be like this. Okay. And then there'll be X test. But only difference is here we don't know which class they belong to. Okay. So what we're supposed to do here is we are supposed to train our model and figure out which class this email would belong to. Okay, so this is an email here. These are the question marks. Okay, we don't know what class does this email belong to. So what P of X and Y is equal to 1 represents is when we are given a class Y. Okay, when we already know that a email is spam, that is this particular email. What is the probability of this email part of being this? Okay. And then we compute this part over here. And then finally what we're going to do is when we get this test data, right? X test data, we'll just say what is the probability of this particular email being a part of zero. Okay, zero means not spam and one means it's a spam. So we'll basically get a digital value over here. So for example, now let's take an example over here. So I got this X test value. X test is something which will be over here. So I got this extest value say something like free food. So now this would be represented in spam right? The reason is because spam is a keyword which is usually found in a fake email. So what will happen over here? So probability that a given email that is x test is spam. Okay. So probability that a given email over here so this email that is this email is not spam and is spam. So this would say something like this will have a high probability of being in a spam right? So this would give us a numerical value say something like 80 which refers to 80%. And this would give us like say 20 okay and this is nothing but 20%. So which among this is high. So obviously this particular value is high right. Therefore this email would be classified as a spam email. So this is how basically it works. Okay. So now in order to find this values over here you know that is nothing but in order to find the value of our posterior probability we have to calculate likelihood prior probability and marginal likelihood. Although we can ignore marginal likelihood this is because we're trying to normalize it. So we can ignore this part. Finding the probability of this is pretty simple because all we need to do is find the total number of spam email by total number of emails. And similarly I would do it for total number of non-spam but total number of spam. But the only difficult part over here to find is likelihood. So let's now see how we can do that. So to start off let's say something like we are given some emails. Okay. So we have 100 emails. Out of this 100 emails we have 40 of them are spam. We know that these are the 100 emails and 40 of them are spam and 60 of them are not spam. And this not spam is represented by zero and one. That is nothing but y. Okay y is equal to 0 or 1. So how this would look like is let's say we have a table here and out of this table we'll have say x test which is nothing but the 100 emails okay so we'll have from 0 1 2 dot dot and this would end up till 100 and now at the same time we'll also have y is nothing but a class and these emails over here can belong to either 0 1 or anything but it should be either 0 or one and here it's going to be 0 or one it's just that we're taking an assumption so now what we're going to do is we'll be calculating our prior probability. So if this is our data set, our prior probability is going to be nothing but P of Y is equal to 1. So what this means is count all the spam emails. Okay, count all spam emails. But total number of emails. Okay, so let's now see what would be the probability for this. So what is the total number of emails? It's 100, right? So it's going to be 100 over here. And what are the total number of spam emails? It's 40. So let's quickly write 40 over here. Similarly, we're going to do this for P of Y is equal to zero. Okay, here is going to be total number of emails. And here we're going to write all the number of non-spam emails. So what would this give us? This would be 100 which is the total number of emails and then we'll have 60. So this is how this particular part would look like. In order to give this in a mathematical form because you know we obviously we'll be putting this in the form of formula right. So in a mathematical form. So this is nothing but an average that is 1x m summation of all the ones for y is equal to 1 or zero and here this i will range from 0 to n. So basically we're trying to add 111 over here. Okay. Okay, so this is how we calculate our prior probability and as I've mentioned earlier, we don't have to calculate our marginal likelihood and finally we are coming down to important stuff that is the likelihood. Okay, so this is the part this is the likelihood which is the most important part and the toughest part to calculate. Although it's pretty simple, you have to understand the maths behind it. So in order to calculate our likelihood, what we're going to do is we're going to calculate the probability, right? So let's see how we can do that. So we have this P of X when Y is equal to 1. Okay, what this means is when we have this email, right? So, we already know that email belongs to spam. non-spam. So, what is the probability of that email belonging to this particular group? This is nothing but probability that email belongs to class one. So, zero. Fine. So, now how X would look like? So just to give you a brief before we move ahead. X over here would be nothing but a email. So it will have multiple words and somewhere over here in the middle it will be like get unlimited 50% of and so on and many other words. These are called the features and based on these features we calculate whether this email belongs to a spam class or non-spam class. So how this would work? How this probability over here works? It'll take each of these feature let's take something like ultimate. Okay. So it's going to be like probability of ultimate belonging to spam. This would give me some value say 0. 9% because it's high probability right that an ultimate word comes in a spam email. And then we'll also calculate at the same time probability that ultimate belongs to non-spam. So this is going to be less probability. You obviously are not going to use uh ultimate in your day-to-day activities, right? Or day-to-day conversation. So this is how it's going to be. So let's now quickly see how we can calculate for this. So now we have X right. So if X this is a capital X is nothing but list of words okay and this is nothing but an email okay and then small X represents the words which are there. So here it can be X1 X2 X3 X4 and this would end up to XN. So these are nothing but features or words. Okay. And X is the entire email. Then what we're going to do find over here is probability that X nothing but the capital X belonging to Y is equal to0 is equal to probability of all of these individual words over here. So words belonging to a spam. And similarly we can do this for spam. So when we have an entire email what is the probability that all of the words or the content of that email belonging to spam. So here we'll have x1 x2 x3 x4 so on y= 1. So this is how it works. Let's see the expanded version of this. Fine. Let me copy this entire equation here. Okay. And let's paste it on a new sheet. And let's see how we can calculate each of these. Right? So what's going to happen now? Probability P of X that Y is equal to 1 is equal to I'm just expanding this part over here. Okay, let me just erase this to give you a better insight. All I'm trying to do is I'm trying to expand this part. P of X1, X2, X3, right? So this is nothing but probability. You see this comma here represents and okay, it's an and operator. So probability of x1 belonging to y=0 multiplied by probability of x2 when y=0 that x1 is also not spam. This represents and then we'll perform multiplication and then we'll do something like this again. So for P3 probability that X3 that is nothing but the word this X3 belongs to non-spam category when we already know that X1 and X2 also belong to non-spam category.

### Support Vector Machine [7:39:45]

What I'm trying to say over here is each of these words are dependent upon each other only if X2 is considered as not spam only if X1 is not spam. You know all of these words are dependent on each other and the probability of them is holds true only if the other one holds true. So what's the issue with this is by the time it reaches this X and right it becomes pretty huge value and it becomes computationally very expensive. In order to overcome this we use something called as nave bias assumption and what this nave bias assumption says that when we calculate this probability right here we have calculated probability of x1 when y is equal to0 and then we have also calculated the probability for the second word according to navba's assumption this word is totally independent of the first word. So the first word can have higher probability of being a spam and second not spam but they are totally independent of each other. So this is what is nave by assumption is. So let's now see how this equation would look like after nave by assumption. So what we're going to do is I'll write one below the other so that we get a better understanding. So if y is equal to 1. If an email we consider that to be a spam email only if p of the first letter or the first word of that email is spam is equal to 1. This will give us some probability and then I'll multiply that with probability of the next word in that email being a spam. So and then this won't be dependent upon the second value here and then I'll multiply this again by the third word P of X of three that is nothing but the third word is equal to spam because over here we're trying to find the probability that email belongs to spam category and this would continue for the nth term P of Xn when Y is equal to 1. So you see here right that none of the probabilities are dependent on each other thus reducing the computation need. Okay. So in order to put this in a mathematical form what will happen over here is P of X that or we can say P of email belonging to a spam category is equal to the product. Okay, so this is multiplication, right? So for summation we use this and for product we use pi. Okay, so this is for product and where i will range from 1 to n and then we'll have probability of x of i when we know that particular character or word belongs to a spam category. And same way we'll have for non-spam y is equal to 0. This is nothing but pi where i ranges from 1 to n probability of x i when we know y is not a spam. This is an equation for our likelihood. So now as you can see right we have found the value for likelihood. posterior probability. We don't have to calculate marginal likelihood. So we can uh just skip that part. And now finally we are coming down to posterior probability. So let's now see how we can substitute our values and calculate our posterior probability. So posterior probability that's nothing but P of we feed an email right. So P of X I let's write a generalized version where Y is equal to C. C here refers to the class and class here is nothing but either spam or not spam. This is equal to on top we'll have the likelihood right. So let's write the likelihood first. So here we'll have something pi i = 1 to n probability of x i that is going through each and every word in an email y is equal to c can be either 1 or zero we will multiply this by prior probability right which is nothing but 1x m i = 1 n here it's going to be y is equal to c and obviously we have to put the normalization below but we can skip it because it doesn't make any difference right so this is the equation for our nave algorithm. This is the way how we can classify our text. Let us now go to my code editor and try to code the entire algorithm. This would help us in understanding how the underlining working of NLTK works. Okay. So as you can see I have come here to my Google Collab. Let me quickly give a name as NLTK or let's give classification implementation. So let's get started. Now there are a couple of things that we have to import. So let it be import pandas as pd. Then let's import numpy as np. And then let's have uh label encoder. So basically we use label encoder to convert our text into numerical form. And if you are asking why we are going to convert it into numerical form that's because computer no matter how advanced it is unable to understand textual form. So it has to take the data in a numerical format. Okay. From sklearn prep-processing import label encoder and finally we need sklearn dot model selection import train split. All right. So let us now execute this. So in our next stage right we will have to get our data set. So getting data set. So here we'll be using mushroom data set. So what we're going to do is df. We know it's going to be in data frame. So pandas do read csv. And the place where I have my file over here is here. And I'll just quickly copy the path and paste it over here. So let me execute this code. Fine. And now let's see the shape of our data set. So it's going to be DF. So what this represents over here is that we have 8,142 rows and then we have 23 features. So let's now see each of these. So let me just print this df do. head. Let me slightly zoom it out and let me play this. So as you can see here we have all our data in a textual format and then we'll have the values which would range from 0 to 8,124. And then the features over here are nothing but you know cap shape cap color cap surface there's nothing but the surface of the mushroom and the class you can see here right so this part over here this class is nothing but y and all of these features over here are represented as x so now what we'll do is let's try to you know encode this let's try to convert these values into numbers so what we'll do is le e and we're going to use label encoder okay so we are creating the instance of this and then we'll have df encoded that's df stands for data frame this is nothing but df do apply okay apply is one of the method what this apply method does is it's like a for loop over here so it'll go to each and every row and each and every column and apply whatever function we pass and we're going to pass label encoder right so over here I'm going to pass here le do ffit transform we'll we just have to give method name and no method call and then we'll give axis by default axis is zero we want zero because we want it to go for every row. So let's now execute this and let's see how our data would look like now. So as you can see here this is our data or let me just give it as head and let me zoom out so that we can compare this data with this. So these both are the same thing. Okay as you can see here class and then we have cap shape and all of these are the same thing. Only thing after performing label encoding all of these values have been converted into numbers. So now what we'll do is we obviously need to convert this into a list of array right so let's finally do that. So let's df is equal to encoded df. So let's see what it is. Yeah df encoded dot values. And now we'll have to define our x. Our x is nothing but df. So as you can see right this is y. This class refers to y. And x is all of these features except the first column. We want all the other columns. Right. So what we'll do is from column one because this is column zero we want it to all the columns. So basically here we want all our rows that is 0 to 8,000 some change and then except first column the zerooth column we want all of them. Okay. So this is X and for Y it's going to be all the rows. I hope you understand why it's all the rows right? So 0 1 2 3 and we need it for all the values. So it's all the rows and only the zerooth column because zeroth column gives us the class right? So let's quickly execute this and just for your satisfaction let's see how it would look like. So here I would press X. So you'll see here that yeah it's going to be capital X right uppercase. So everything has been converted in an array and except the first part except this part will have all other values and similarly let's see it for Y. So as you can see here we have just single column. So now what we'll do is we'll split our data. So this is going to be we are going to use train test split. So here we're going to have the value of x and then y and then we have to give by what percent we want our value to be splitted. Right? So for that we'll give something like this. Okay. Yeah. So as you can see here we need our test size. Right. So we need a test size and a random state. So let me just copy this from here. Now what this test size resembles is by what percent we want our data to be splitted. So let's give it as 20%. And let me now execute this. So now that we have all the data and every requirement. So let us now directly jump into having our knife by classifier. So let me give you a quick recap here. In order to get a knbbass classify, we need to find posterior probability. In order to find the posterior probability, we need to find something called as likelihood and prior probability. And then we're going to have prior probability. Let's now calculate each of this. And we'll start this by calculating prior probability. Okay? Because prior probability is pretty simple to calculate. Fine. So let's now start off with prior probability. So we'll create a function here def prior probability. So this is nothing but you're going to pass white train and then we're going to pass labels. Okay, labels over here refers to x and zero values. Instead of giving white train, let's generalize it to give y. And now what we're going to do is we need to find the size of y, right? So because prior probability is nothing but sum of either x or zero all divided by the total number of classes. Okay. So what I'm trying to say here is if you have 100 emails out of which 40 are spam and 60 are not spam then probability of email being spam is 40 by 100 and the other one is going to be 60 by 100. Okay. So for that we need to find the total size right. So it's going to be y dot shape and then this will give us an array right. So we need the first value. Apart from that we need a sum right. So s is equal to np dot sum y train or let it be y values is equal to label. I hope you understand why we are doing this. You see y value has only class values right? Only if the class value and label value are same you're going to get one. So this is how we are going to do here. And this is going to return as prior probability. So written m by s. Fine. So now let's find our likelihood. That's our next task. And in order to find our likelihood we need to have conditional probability. Okay. So def conditional probability. This is going to take a parameters like X and then Y. So we give here as X train and not X. And then we'll have Y train. And then we obviously need to find feature column. And then we need to have feature label and then finally label. Okay. So what this feature column says is it represents which column we want that particular feature. So as we all know that this is nothing but a tabular data. Right? So let me just quickly show you how it would look like. Okay. So over here we have a table and this table will have multiple rows and columns. Right? So what this feature column represents is within this feature column within this column which value represents the values present in the feature columns. Okay. So let me just quickly now erase this and move ahead. So what this will do is we're going to just filter out the value right. So x filter this is equal to x train where y train is equal to label. So over here we'll just get the email values which is either spam or not spam. That's why I'm giving here as xfiltered and number is equal to np dot sum it's going to be x filtered where we have all the rows and the columns is going to be feature column which is going to be equal to feature values. And now for denominator so it's going to be x filtered dot shape which is nothing but total number of values right zero and this will return as numerator by denominator and let's give it as float over here and it's going to be numerator divided by denominator okay this is nothing but the conditional probability okay I hope you remember what is conditional probability when we were discussing the derivation so now what we're going to do is we obviously have to predict our class right so let's do that let's write this function here predict def predict this predict takes two values x train y train apart from that we also need to perform predictions right so that's why we also need x test so now what we're going to do is we'll have classes refers to be spam or not spam over here it's going to be poisonous or not poisonous because here we are taking mushroom data set right so classes is equal to npunique Y right so it's going to be Y train okay what this predict will return is Y test values whatever value I feed to this I'm going to get the answer for that okay so now we also need features so n features this is going to be equal to x train dot shape all right so let's now calculate our posterior probability so posterior probability Let's leave it empty as of now. Okay. So for every value of a posterior probability we'll get some percentage. So this percentage represents you know probability of that particular word being a part of spam or not spam. So over here we're going to have four label in classes. This means that we're going to go class by class. So either it's spam or non-spam or over here it's going to be either it's going to be poisonous or non-spoisonous. So the values of classes can be either zero or one. Fine. And then we are going to give likelihood we'll give it as 1. 0. And then for features in range n features. Okay. N features represents the column over here. Right? So these are the n features except the classes whatever is there. They are nothing but n features. Right? So yeah. So go through each and every feature or in short I can say go through each and every column and then find the conditional probability. C N D is equal to conditional probability. Here we're going to call this function. So now we have to pass our X train values and then we have Y train then the features FEMA and within the features which feature you want. Okay. And then we'll have label. Okay, this label over here represents whether the class is either x or zero. So now what we're going to do is we're going to calculate our likelihood. So likelihood is equal to likelihood plus conditional probability time conditional probability. So this likelihood over here, we're just randomly initializing it over here. And for every iteration of this for loop likelihood increases. So now all we need to do is we need to append prior probabilities over here. So we'll give here as prior is equal to prior probability. Okay. This is nothing but this function over here which we have defined where we're going to give white train and then label and then for posterior probability this is nothing but likelihood times prayer. Okay. I hope you remember the mathematical equation for this particular part. So we are trying to find prior and posterior probability for each of this. And now what we're going to do is we'll just append these values over here. So we'll have posterior prop dot append. And the value that I want to append is post. And now what we'll do is we need to find the probability which has the max one. Right? So for that we going to use arm max over here. So predicted value is equal to npmax. Argax gives us in which place we have the highest value. Right? So that's what armax does. And so we'll give you a posterior probability that is nothing but this part. And this would return as the class. Right? So return predicted. Okay. So this is done. All right. So let us now do one thing. Let us now find by what accuracy we are finding this value. So let's also calculate the accuracy that we'll have a correct values right. So to do that def accuracy x train x test then we also need to give y train and y test right so y train and it's going to be y test and within this function what we're going to do is predicted is going to be an empty list so for i in range so it's going to be throw all the values of x right so x test dot shape. So now here it's going to be P is equal to or predicted value. So now we'll call this function here predict X train Y train and then we have X test and as we are doing this for all the values we'll just pass here I every time I perform this predicate it gives me whether one example belongs to a mushroom class which is poisonous or not. So only reason why I'm doing this accuracy or I can say only reason I'm having this for loop here is to find all the values and put them over here. This predict list over here contains the predicted value of each of the test values. Okay. So now what I'm going to do is whatever value I get over here. So pred. tappend P. Let's do one thing. Let's give this name as y prred and we'll convert this predicted over here to numpy array. So, np array and we'll give here a spread. And now in order to calculate our percentage or the accuracy all we need to do is accuracy is equal to np dot sum and we'll just compare this predicted value. Okay, whatever values because we get only ones and zeros with the value that we already know. Okay, this is just the way to have you know testing your data how accurate it is. So we'll have y bread spread has to be equal to y test and every time these two values are same right it will add up one and in order to get a value in percentage we'll have y bread divided by the size right so y bread by shape and we'll give index is zero and this will return as the accuracy fine so let's call our accuracy here and let's give all these values so before I run this code right let me just give you a quick recap Our main agenda over here was to classify whether these mushrooms over here belongs to class of poison or not. So that's what we are doing here. We have to do that obviously using posterior probability. In order to find posterior probability, we need two things. One is nothing but likelihood and another one is prior probability. Finding prior probability is pretty simple because we just need to find the total number of values by total number of other values. Right? That's what prior probability does. And for likelihood it's pretty simple. Okay? We have to find the conditional probability and then this should return us the predicted values and let me now quickly run this and let's see what is the accuracy of our model. Okay, so it's saying X test is not defined and the reason why we are getting that is because we have given a smaller case value over here. So let me quickly rerun this now. Oh yeah, we have to give X as uppercase. Okay. So in order to overcome this, what we'll do is we'll quickly run this from the start here. Okay. So let's wait for our data to get processed over here. Sometimes what happens is when you're trying to execute multiple lines over here, right? You know, one block might get executed before and then other later. So let's see if this works. This time I have restarted the runtime in order to run it from the beginning. So let's I hope it should work now. So now we don't have any errors. So let us now quickly see what is the accuracy that we got. So let's now print this. Okay. So we have got 0. 99 right. So let me quickly give this over here. Accuracy time 100. So now when we perform this classification task here we are getting 99. 63 accuracy. So this is how we can perform classification task using knifebased algorithm. All right. Now, so now that we know how NLP works, what is knife based classification and how knife based classification works and we also know how to pre-process our data, let's do one thing. Let's take certain amount of sentences. Let it be a small sentence and let's see if we can perform any kind of sentiment analysis on them. And over here, we'll be using a library called as skarn. With skarn, we don't have to write all the number of lines that we wrote now. So, let me now quickly move to my code editor and show you how I can implement that. Okay. So now let's change the name here. Let's give it as sentiment analysis. Okay. So what we'll do is we'll have a text over here and we won't have a huge amount of data because if you have then it would be pretty hard to understand. So let me get this textual data for you. Okay. So over here in my notepad I have some small amount of data and over here as you can see so we have X test sorry this is going to be X train and then we have Y train. So let's now analyze our data over here. So if you can see right what is happening here is we have a data set and we're supposed to train our model based on this and we also have classes over here. This class over here represents that whether this first sentence is a positive or negative sentence. So if it's a positive it is one and if it's a negative class it's going to be zero. So on and so forth we have it for all the movie rating predictions here. Okay. And now once we are done training our model let's we'll test it by passing this values. Okay. So over here we have three sentence. I was happy and I love acting in the movie and then the movie I saw was bad. And we can add some more examples. So what we'll do now is let me copy this part over here text and let me paste it for our data set. Okay. And let me quickly execute this. So let's now see the shape of our X. Let's try making this an uppercase. And the reason why I'm using uppercase here specifically and lower case and Y this is because this is a standard in the data science community. All right. So let's see the shape of our training data set. All right. And then we'll have X train. Okay. So as you can see we have a data set here and let's have shape. Yeah. Yeah, we won't get the shape here because this is not a numpy array, right? So this is extra fine. So now what we'll do is we have to clean our data, right? So let's do the data cleaning part wherein we'll be doing all kind of stuff like tokenization, stemming and stop removal. Okay, so let's give a heading here as data cleaning. And now rather than writing this as an individual function, what I'll do is I'll write it as the entire method. Okay, so let's import our values first. From NLTK dot tokenize import regular expression tokenizer and then we have from NLTK dot stem. port let's import portmer and then finally we have to have a stop word remove right. So from nltkus import stopwards. So let me now download our stop words. So import ntk and then we are going to have nltk. d download stopwards. All right. So now uh let's now create an object of our tokenizer portmer and stop words. This is going to be regular expression tokenizer. And then I'm going to pass what pattern I want. Right? So I want only the words and then I also want to concatenate those words.

### K- Means Clustering Algorithm [8:05:14]

And then we have stop words. So which language I'm using? I'm obviously using English. So stop words. This is going to be stop words dot words. It's going to be English. And now we have port stema. That's ps is equal to port stema. All I've done over here is just creating the object of our classes over here. So let me quickly execute this. Fine. And now we'll create a function or a method def clean data. And then we're going to pass here as text. Fine. And what we'll do is we'll convert our text into lower case. And now we'll perform tokenization. So this is nothing but tokens. This will be is equal to tokenizer. We are getting this tokenizer from here. Dot tokenize. So we want the text to be tokenize and then new tokens equal to token for token in tokens because we're getting a list over here right for tokens in tokens if token not in stopwards what I'm trying to do here is I'm just combining stopward remover and tokenizer right so that's why I'm using another for the list over here So this part over here for token okay for token in tokens right okay this part over here gets me the list of tokens and then what I'm going to do is I'm going to compare this tokens with stop word list and if that word isn't present then I print it basically I'm performing tokenization as well as stop word removal at the same time and now we'll perform stemming same way like before I'm going to give uh less comprehension so ps is the name that We have again port stemmer dot stem obviously it's going to be words right so for tokens in new token what I'm going to do is I want them to be stemmed token okay and now what's going to happen is here I'll have sentence or like it'll be like clean text this is going to be join stem tokens right and what this method will return is the clean Next. So let me quickly execute this now. All right. So this is done from our end now. All right. So now what I'm going to do is I'm going to use this get clean text to clean our text data and train data. Right. So what I'm going to give here is X clean. Okay. Xclean is going to be get clean text. Okay. And I'm going to pass X train. Similarly I'm going to do it for X test. So for that I'm going to do XT_cle and this is going to be get clean data and it's going to be Y sorry X test. All right. So let me now execute this. Okay. So we are getting an error over here saying that you know this word over here has no attribute. That's very true. The reason why we are getting this is because I'm just passing the value, right? So we don't want that. So what I'm going to do is I'm going to put this in a form of a list here. And rather than passing the entire extra I'll just give one word and pass for I in or X train sorry. And now similarly I'm going to do here for I in X test. So yeah we haven't defined our X test yet. So let's quickly define our X test. All right. So let me get this coding part over here and let me get my X test. So I'll copy the X test from here and let me paste it here. Fine. So let us now execute this. Okay. The reason why we getting an error over here is because over here I have defined as new tokens and here it is new token. Right. So let me quickly fix that and run this once again. It says X test is not defined. Yep. So we will fix this and rerun this again. Most of the times when you're trying to do this program, right, you'll encounter a lot of issues. And only when you encounter these issues, you're going to learn a lot. So now we have our clean text. So let's just compare it to and see how it looks. Okay. So as you can see here, our text has been reduced. And yeah, so in order to get this better, what I can do is I can just pass some space here and then run this here. Let me do one thing. Let me rerun this from the start here and restart the timer. and let me run it all. Okay. So, as you can see here, now we're getting some spaces. All right. So, the reason why we weren't getting any space is because I had not added any space. So, now let's perform our classification task. Let's we'll be using NLTK. Right? Before that, we have to vectorize our text. As I've mentioned, in order for us to perform classification, we need to vectorize it. So, vector. So now what we'll do is from skarn dot feature extraction dot text import count vectorizer. All right. And now we'll create the instance of account vectorzer. CV is equal to count vectorzer. And we'll give our engram range. It's going to be one and two. And now to vectorize our input. So x vectoriz is equal to cv do ffit transform. And now we're going to pass this value. Right? So we'll pass x clean and then we want to convert this into an array. So let us now execute this and let's see how this x vector looks like. Okay. So basically for every word you know we are getting this vectorzer over here. All right. So now what we're going to do is we are going to perform our classification task. But before that let's get our feature names. Okay. So print cv dot get what this get feature names does is now we don't understand these values right. So you don't know what 0 02 here represents. So in order to know what values over here represents all we need to do is get feature names and as you can see here this first over here represents act and it's the same for all the five arrays over here. So what basically this count vectorizer tells is that this word act how many times has it repeated in this sentence how many times it was present in this sentence this is what vectorizer does and this kind of model is usually referred to as bag of word model similarly we'll perform vectorization for our test value so x test cvt transform and now we have the clean text and now we also want to convert into an array All right. So now let's execute this part here. Okay. And finally we'll perform our classification task. And we're going to use multinnomial knife base here. Okay. If you don't know there are multiple versions of nbase that are available. In order to perform text classification we use multinnomial knbase. Okay. So multinnomial. All right. So let me import that from skarn. knife base import multinnomial knife base and now we'll create the instance of that. So mn and we'll give multinnomial knbase. So we'll fit our model. So it'll be multinnomial knife base dot fit and we'll give values. Okay. So it's going to be x vector. x vector is nothing but you know our vectorzed form which is nothing but this x vector and then we're going to give y values. All right. And let me execute this. So it says Y isn't defined. So let's go back here and see where we're going wrong. Oh yeah, it's not Y, it's going to be Y train, right? So let's copy this and let me paste it over here and let me execute this program. So now we have a multinnomial KN base. And let us now perform prediction. So MN will return this value, right? So it be like Y red. This is going to be MNB or MN multinnomial base. predict. predict and over here we're going to pass the test value right so x text vector xt vector fine so let's now see what will be the out for this but before that I would like you to guess what can be the output okay see we have performed classification task and now by doing this predict the value of xt all we are trying to do is predict whether that sentence belongs to class A or class B and class A here refers to spam and class B here refers do not spam and in vectorzed form it's going to be either one or zero. So let's now see why predicted it'll give us an array. So over here it gives one and zero. So let's see what it means. Okay here one refers to positive value and zero refers to negative class. And what was our x test value? So we have defined our x text somewhere over here. So here I was happy and I love acting in the movie. Y refers to positive right? So happy is a positive word which we know. Let's see what did our machine identify it as. So the first word over here or the first sentence refers to this and it says one which means happy. Okay, instead of doing this what I'll do is in our test data we'll just give one value. Okay, this is just so that you better understand this. So we know that this is a very bad word, right? And when I say bad word I don't mean you know I mean this words these are the bad words or the words that give us negative feature right? So let me execute this. what I'm expecting in the output is zero. So as you can see here our predicted value is zero. So with this we can say that our classification is working and we can also do this using a pretty huge data set. Now the success of human race is because of the ability to communicate and share information. Now that is where the concept of language comes in. However, many such standards came up resulting in many such language with each language having its own set of basic shapes called alphabets and the combination of alphabets resulted in words these words arranged meaningfully resulted in the formation of a sentence. Now each language has a set of rules that is used while developing these sentences and these set of rules are also known as grammar. Now coming to today's world that is the 21st century. According to the industry estimates only 21% of the available data is present in the structured format. Data is being generated as we speak, as we tweet, as we send messages on WhatsApp, Facebook, Instagram or through text messages. And the majority of this data exists in the textual form which is highly unstructured in nature. Now in order to produce significant and actionable insights from the text data, it is important to get acquainted with the techniques of text analysis. So let's understand what is text analysis or text mining. Now it is the process of deriving meaningful information from natural language text and text mining usually involves the process of structuring the input text, deriving patterns within the structured data and finally evaluating the interpreted output. Compared with the kind of data stored in database, text is unstructured, amorphous and difficult to deal with algorithmically. Nevertheless, in the modern culture, text is the most common vehicle for the former exchange of information. Now, as text mining refers to the process of deriving highquality information from text. The overall goal here is to turn the text into data for analysis and this is done by the application of NLP or natural language processing. So let's understand what is NLP refers to the artificial intelligence method of communicating with an intelligence system using natural language. By utilizing NLP and its components, one can organize the massive chunks of textual data, perform numerous or automated task and solve a wide range of problems such as automatic summarization, machine translation, named entity recognition, speech recognition and topic segmentation. So let's understand the basic structure of an NLP application. Considering the chatbot here as an example, we can see first we have the NLP layer which is connected to the knowledge base and the data storage. Now the knowledge base is where we have the source content that is we have all the chat logs which contain a large history of all the chats which are used to train the particular algorithm. And again we have the data storage where we have the interaction history and the analytics of that interaction which in turn helps the NLP layer to generate the meaningful output. So now if we have a look at the various applications of NLP. First of all we have sentimental analysis. Now this is a field where NLP is used heavily. We have speech recognition. Now here we are also talking about the voice assistants like Google assistant, Cortana and the Siri. Now next we have the implementation of chatbot as I discussed earlier just now. Now you might have used the customer care chat services of any app. It also uses NLP to process the data entered and provide the response based on the input. Now machine translation is also another use case of natural language processing. Now considering the most common example here would be the Google translate. It uses NLP and translates the data from one language to another and that too in real time. Now other applications of NLP includes spellchecking. Then we have the keyword search which is also a big field where NLP is used. Extracting information from any particular website or any particular document is also a use case of NLP. And one of the coolest application of NLP is advertisement matching. Now here what we mean is basically recommendation of the ads based on your history. Now NLP is divided into two major components that is the natural language understanding which is also known as NLU and we have the natural language generation which is also known as NLG. The understanding involves tasks like mapping the given input into natural language into useful representations analyzing different aspects of the language. Whereas natural language generation it is the process of producing the meaningful phrases and sentence in the form of natural language. It involves text planning, sentence planning and text realization. Now, NLU is usually considered harder than NLG. Now, you might be thinking that even a small child can understand a language. So, let's see what are the difficulties a machine faces while understanding any particular languages. Now, understanding a new language is very hard taking our English into consideration. There are a lot of ambiguity and that too in different levels. We have lexical ambiguity, syntactical ambiguity and referential ambiguity. So lexical ambiguity is the presence of two or more possible meanings within a single word. It is also sometimes referred to as semantic ambiguity. For example, let's consider these sentences and let's focus on the italicized words. She is looking for a match. So what do you infer by the word match? Is it that she looking for a partner or is it that she's looking for a match? Be it a cricket match or a rugby match. Now the second sentence here we have the fisherman went to the bank. Is it the bank where we go to collect our checks and money or is it the river bank we are talking about here. Sometimes it is obvious that we are talking about the river bank but it might be true that he's actually going to a bank to withdraw some money. You never know. Now coming to the second type of ambiguity which is the syntactical ambiguity. In English grammar this syntactical ambiguity is the presence of two or more possible meanings within a single sentence or a sequence of words. It is also called as structural ambiguity or grammatical ambiguity. Taking these sentences into consideration, we can clearly see what are the ambiguities faced. The chicken is ready to eat. So here what do you infer? Is the chicken ready to eat his food or for us to eat. Similarly, we have the sentence like visiting relatives can be boring. Are the relatives boring or when we are visiting the relative, it is very boring. You never know. Coming to the final ambiguity which is the referential ambiguity. Now this ambiguity arises when we are referring to something using pronouns. The boy told his father the theft. He was very upset. Now I'm leaving this up to you. You tell me what does he stand for here? Who is he? Is it the boy? Is it the father or is it the thief? So coming back to NLP. Firstly we need to install the NLTK library that is the natural language toolkit. It is the leading platform for building Python programs to work with human language data and it also provides easytouse interfaces to over 15 corpora and lexical resources. We can use it to perform functions like classification, tokenization, stemming, tagging and much more. Now once you install the NLTK library, you will see an NLTK downloader. It is a pop-up window which will come up and in that you have to select the all option and press the download button. It will download all the required files, the corpora, the models and all the different packages which are available in the NLCK. Now when we process text there are a few terminologies that we need to understand. Now the first one is tokenization. So tokenization is a process of breaking strings into tokens which in turn are small structures or units that can be used for tokenization. Now tokenization involves three steps which is the breaking a complex sentence into words understanding the importance of each words with respect to the sentence and finally produce a structural description on an input sentence. So if we have a look at the example here considering this sentence tokenization is the first step in NLP. Now when we divide it into tokens as you can see here we have 1 2 3 4 5 6 and seven tokens here. Now, NLTK also allows you to tokenize phrases containing more than one word. So, let's go ahead and see how we can implement tokenization using NLTK. So, here I'm using Jupyter notebook to execute all my practicals and demo. Now, you are free to use any sort of IDE which is supported by Python. It's your choice. So, let me create a new notebook here. Let me rename as text mining and NLP. So first of all let us import all the necessary libraries. Here we are importing the OS NLTK and the NLTK corus. So as you can see here we have various files which represent different types of words, different types of functions. We have samples of Twitter. We have different sentimental word net. We have product reviews. We have movie reviews. We have non-breaking prefixes and many more files here. Now let's have a look at the gutenberg file here and see what are all the fields which are present in the gutenberg file. So as you can see here inside this we have all the different types of text files. We have Austin Emma, we have the Shakespeare, we have the Hamlet, we have Mobex, we have the Carol Alice and many more. Now this is just one file we are talking about and NLTK provides a lot of files. So let's consider a document of type string and understand the significance of its tokens. So if you have a look at the elements of the hamlet, you can see it starts from the tragedy of hamlet by William Shakespeare. So if you have a look at the first 500

### Apriori Algorithm Explained [8:25:22]

elements of this particular text file. So as I was saying, the tragedy of Hamlet by William Shakespeare 1599 actor's premise. We can use a lot of these files for analysis and text for understanding and analysis purposes and this is where NLTK comes into picture and it helps a lot of programmers to learn about the different features and the different application of language processing. So here I have created a paragraph on artificial intelligence. So let me just execute it. Now this AI is of the string type. So it'll be easier for us to tokenize it. Nonetheless, any of the files can be used to tokenize. For simplicity, here I'm taking a string file. The next what we are going to do is import the word tokenize under the NLTK tokenized library. Now, this will help us to tokenize all the words. Now, we will run the word tokenize function over the paragraph and assign it a name. So, here I'm considering AI tokens and I'm using the word tokenize function on it. Let's see what's the output of this AI tokens. So as you can see here it has divided all the input which was provided here into the tokens. Now let's have a look at the number of tokens here we have here. So in total we have 273 tokens. Now these tokens are a list of words and the special characters which are separated items of the list. Now in order to find the frequency of the distinct elements here in the given AI paragraph we are going to import the frequency distinct function which falls under NLTK. probability. So let's create a f test in which we have the function here frequentist and basically what we are doing here is finding the word count of all the words in the paragraph. So as you can see here we have comma 30 times we have full stop nine times and we have accomplished one according one and so on. We have computer five times. Now here we are also converting the tokens into lower case so as to avoid the probability of considering a word with upper case and lower case as different. Now suppose we were to select the top 10 tokens with the highest frequency. So here you can see that we have comma 30 times the 13 times of 12 times and 12 times. Whereas the meaningful words which are intelligence which is six times and intelligence six time. Now there is another type of tokenizer which is the blank tokenizer. Now let's use the blank tokenizer over the same string to tokenize the paragraph with respect to the blank string. Now the output here is nine. Now this nine indicates how many paragraphs we have and what all paragraphs are separated by a new line. Although it might seem like a one paragraph, it is not. The original structure of the data remains intact. Now another important key term in tokenizations are biograms, diagrams and engrams. Now what does this mean? Now biograms refers to tokens of two consecutive words known as a bagram. Similarly, tokens of three consecutive written words are known as triagram. And similarly, we have engrams for the n consecutive written words. So, let's go ahead and execute some demo based on bagrams, diagrams, and engrams. So, first of all, what we need to do is import biograms, diagrams, and engrams from nltk. util. Now, let's take a string here on which we'll use these functions. So taking this string into consideration, the best and the most beautiful thing in the world cannot be seen or even touched. They must be felt with the heart. So first what we are going to do is split the above sentence or the string into tokens. So for that we are going to use the word tokenize. So as you can see here we have the tokens. Now let us now create the bagram of the list containing tokens. So for that we are going to use the nltk. bs biograms and pass all the tokens and since it is a list we are going to use the list function. So as you can see under output we have the best and most beautiful thing in the world. So as you can see the tokens are in the form of two words it's in a pair form. Similarly if we want to do the triagrams and find out the triagrams what we need to do is just remove the bagrams and use the triagrams. So as you can see we have tokens in the form of three words and if you want to use the engrams let me show you how it's done. So for engrams what we need to do is define a particular number here. So instead of n I'm going to use let's say four. So as you can see we have the output in the form of four tokens. Now once we have the tokens we need to make some changes to the tokens. So for that we have stemming. Now stemming usually refers to normalizing words into its base form or the root form. So if we have a look at the words here we have affectation, affects, affections, affected, affection and affecting. So as you might have guessed the root word here is affect. So one thing to keep in mind here is that the result may not be the root word always. Seming algorithm works by cutting off the end or the beginning of the word taking into account a list of common prefixes and suffixes that can be found in an infected word. Now this indiscriminate cutting can be successful in some occasions but not always and this is why we affirm that this approach presents some limitations. So let's go ahead and see how we can perform stemming on a particular given data set. Now there are quite a few types of stem. So starting with the potter stem, we need to import it from nltk. stem. Let's get the output of the word having and see what is the stemming of this word. So as you can see we have as the output. Now here we have defined words to stem which are give, giving, given and gave. So let's use the porter stemer and see what is the output of this particular stemming. So as you can see it has given, give, given, give and gave. Now we can see that the stemmer removed only the ing and replaced it with an e. Now let's try to do it the same with another stemmer called the Lancaster stemmer. You can see the stemmer stemmed all the words. As a result of it, you can conclude that the Lancaster stemmer is more aggressive than the potter stemer. Now the use of each of these stemmers depend on the type of task that you want to perform. For example, if you want to check how many times the words GIV is used above, you can use the Lancaster stemmer. And for other purposes, you have the Potter stemer as well. Now, there are a lot of stemmers. There is one snowball stemmer also present where you need to specify the language which you are using and then use the snowball stemmer. Now, as we discussed that stemming algorithm works by cutting off the end or the beginning of the word. On the other hand, lemitization takes into [clears throat] consideration the morphological analysis of the word. Now, in order to do so, it is necessary to have a detailed dictionary which the algorithm can look into to link the form back to its lema. Now, limitization what it does is groups together different infected forms of a word which are called lema. It is somehow similar to stemming as it maps several words into a common root. Now one of the most important thing here to consider is that the output of limitization is a proper word unlike stemming in that case where we got the output as GIV. Now GIV is not any word it's just a stem. Now for example if a leatization should work on go on going and went it all stems into go because that is the root of the all the three words here. So let's go ahead and see how lemitization work on the given input data. Now for that we are going to import the leatizer from NLTK. Now we are also importing the word net here. As I mentioned earlier that lemitization requires a detail dictionary because the output of it is a root word which is a particular given word. It's not just any random word. It is a proper word. So to find that proper word it needs a dictionary. So here we are providing the word net dictionary and we are using the word net leatizer. So passing the word corpora into can you guys tell me what is the output of this one? I'll leave this up to you guys. I won't execute this sentence. Let me remove this sentence here. You guys tell me in the comments below what will be the output of the limitization of the word corpor. And what will be the output of the stemming? You guys execute that and let me know in the comment section below. Now let's take these words into consideration. Give, giving, given and gave and see what is the output of the limitization. So as you can see here the limitizer has kept the words as it is and this is because we haven't assigned any poss tags here and hence it has assumed all the words as nouns. Now you might be wondering what are poss tags? Well, I'll tell you what are poss tags later in this video. So for just now let's keep it as simple as that is that POS tags usually tell us what exactly the given word is. Is it a noun? Is it a verb or is it different parts of speech? Basically POS stands for parts of speech. Now do you know that there are several words in the English language such as I, ate, for, above, below which are very useful in the formation of sentence and without it the sentence would make any sense. But these words do not provide any help in the natural language processing and this list of words are also known as stop words. NLTK has its own list of stop words and you can use the same by importing it from the NLTK. cus. So the question arises are they helpful or not? Yes, they are helpful in the creation of sentences but they are not helpful in the processing of the language. So let's check the list of stop word in the NLTK. So from nltkus we are importing the stop words and if we specify what all stop words are there in the English language. Let's see. So as you can see here we have the list of all the stop words which are defined in the English language and we have 179 total number of stop words. Now as you can see here we have these words which are few more most other some now these words are very necessary in the formation of sentences. You cannot ignore these words but for processing these are not important at all. So if you remember we had the top 10 tokens from that particular word that is the AI paragraph I mentioned earlier which was given as F test top 10. Let's take that into consideration and see what you can see here is that except intelligent and intelligence most of the words are either punctuation or stop words and hence can be removed. Now we'll use the compile from the re module to create a string that matches any digit or special character and then we'll see how we can remove the stop words. So if you have a look at the output of the post punctuation, you can see there are no stop words here in the particular given output. And if you have a look at the output of the length of the post punctuation, it's 233 compared to the 273 the length of the AI tokens. Now this is very necessary in language processing as it removes the all the unnecessary words which do not hold any much more meaning. Now coming to another important topic of natural language processing and text mining or text analysis is the parts of speech. Now generally speaking the grammatical type of the word which is the verb, noun, adjective, adverb, article indicates how a word functions in the meaning as well as the grammatical within the sentence. Now a word can have more than one part of speech based on the context in which it is used. For example, if you take the sentence into consideration, Google something on the internet. Now here Google acts as a verb although it is a proper noun. So as you can see here we have so many types of poss tags and we have the descriptions of those various tags. So we have the coordinating conjunction CC, cardinal number CD. We have JJ as adjective, MD as modal. We have the proper noun singular pler. We have verbs, different types of verbs. We have interjection symbol. We have the Y pronoun and the Y adverb. Now we can use P tags as a statistical NLP task. It distinguishes the sense of the word which is very helpful in text realization and it is easy to evaluate as in how many tags are correct and you can also infer semantic information from the given text. So let's have a look at some of the examples of pos. So take the sentence the dog killed the bat. So here the is a determiner dog is a noun killed is a verb and again the bat are determiner and noun respectively. Now let's consider another sentence. The waiter cleared the plates from the table. So as you can see here all the tokens here correspond to a particular type of tag which is the parts of speech tag. It is very helpful in text realization. Now let's consider a string and check how NLTK performs POS tagging on it. So let's take the sentence Timothy is a natural when it comes to drawing. First we are going to tokenize it. And under NLTK only we have the poss tag option. And we'll pass all the tokens here. So as you can see we have Timothy as noun is a verb or as a determiner natural as an adjective when as a verb it as a preposition comes as a verb to as a to and drawing as a verb again. So this is how you define the poss tags. The poss tag function does all the work here. Now let's take another example here. John is eating a delicious cake. Now let's see what's the output of this one. Now here you can see that the tagger has tagged both the word is and eating as a verb because it has considered is eating as a single term. This is one of the few shortcomings of the POS taggers. One thing important to keep in mind. Now after poss taggings there is another important topic which is the named entity recognition. So what does it mean? Now the process of detecting the named entities such as the person name, the location name, the company name, the organization, the quantities and the monetary value is called the named entity recognition. Now named entity recognition we have three types of identification. Here we have the nonphrase identification. Now this step deals with extracting all the noun phrases from a text using dependency passing and parts of speech tagging. Then we have the phrase classification. The step classification. This is the classification step in which all the extracted noun phrases are classified into respective categories which are the location, names, organization and much more. And apart from this, one can curate the lookup tables and dictionaries by combining information from different sources. And finally, we have the entity disambiguation. Now, sometimes it is possible that the entities are mclassified. Hence creating a validation layer on top of the result is very useful and the use of knowledge graphs can be exploited for this purpose. Now the popular knowledge graphs are Google knowledge graph, the IBM Watson and Wikipedia. So let's take a sentence into consideration that the Google CEO Sunda Pichai introduced the new pixel at Minnesota Roy Center event. So as you can see here Google is an organization, Sunda Pichai is a person, Minnesota is a location and the Roy center event is also tagged as an organization. Now for using any in Python we'll have to import the NE chunk from the NLTK module which is present in Python. So let's consider a text data here and see how we can perform the ne using the NLTK library. So first we need to import the NE chunk here. Let's consider the sentence here. We have the US president stays in the white house. So we need to do all these processes again. We need to tokenize the sentence first and then add the POS tax. And then if we use the any chunk function and pass the list of tpples containing POS tax to it. Let's see the output. So as you can see the US here is recognized as an organization and white house is clubed together as a single entity and is recognized as a facility. Now this is only possible because of the poss tagging. Without the POS tagging it would be very hard to detect the named entities of the given tokens. Now that we have understood what are named engineary recognition and yes, let's go ahead and understand one of the most important topic in NLP and text mining which is the syntax. So what is a syntax? So in linguistics syntax is the set of rules, principle and the processes that govern the structure of a given sentence in a given language. The term syntax is also used to refer to the study of such principles and processes. So what we have here are certain rules as to what part of the sentence should come at what position. With these rules, one can create a syntax tree whenever there is a sentence input. Now syntax tree in layman terms is basically a tree representation of the syntactic structure of the sentence of the strings. It is a way of representing the syntax of a programming language as a hierarchical tree structure. This structure is used for generating symbol tables for compilers and later code generation. The T represents all the constructs in the language and their subsequent rules. So let's consider the statement the cat sat on the mat. So as you can see here the input is a sentence or a w phrase and it has been classified into non-phrase. Then the prepositional phrase again the noun phrase is classified into article and noun and again we have the verb which is sat. And finally we have the preposition on the article and the noun which are the and matt. Now in order to render syntax trees in our notebook you need to install the ghost strip which is a rendering engine. Now this takes a lot of time and let me show you from where you can download the ghost script. Just type in download ghost script and select the latest version here. So as you can see we have two types of license here. We have the general public license and the commercial license as creating syntax and following it is a very important part. It is also available for commercial license and it is very useful. So I'm not going to go much deeper into what syntax tree is and how we can do that. So now that we have understood what are syntax trees, let's discuss the important concept with respect to analyzing the sentence structure which is chunking. So chunking basically means picking up individual pieces of information and grouping them into bigger pieces. And these bigger pieces are also known as chunks. In the context of NLP and text mining, chunking means grouping of words or tokens into chunks. So let's have a look at the example here. So the sentence into consideration here is we caught the black panther. We is the preposition caught is a verb. the determiner. Black is an adjective and panther is a noun. So what it has done is here as you can see is that pink which is an adjective, panther which is a noun and the is a determiner are chunked together in the noun phrase. So let's go ahead and see how we can implement chunking using the NLTK. So let's take the sentence the big cat ate little mouse who was after the fresh cheese. We'll use the POS tax here and also use the tokenizing function here. So as you can see here we have the tokens and we have the PS tags. What we'll do now is create a grammar from a noun phrase and we'll mention the tags that we want in our chunk phrase within the curly braces. So that will be our grammar np. Now here we have created a regular expression matching string. Now we'll now have to pass the chunk and hence we'll create a chunk pass and pass our non-phrase string to it. So as you can see we have a certain error and let me tell you why this error occurred. So this error occurred because we did not use the co script and we do not form the syntactical tree. But in the final output we have a tree structure here which is not exactly in the visualization part but it's there. So as you can see here we have the np noun phrase for the little mouse. Again we have the noun phrase for fresh cheese also. Although fresh is an adjective and cheese is a noun, it has considered a noun phrase of these two words. So this is how you execute chunking in NLTK library. So by now we have learned almost all the important steps in text processing and let's apply them all in building a machine learning classifier on the movie reviews from the NLTK corpora. So for that first let me import all the libraries which are the pandas, the numpy library. Now these are the basic libraries needed in any machine learning algorithm. We are also importing the count vectorzer. I'll tell you why it is used later. Now let's just import it for now. So again if we have a look at the different elements of the corpora as we saw earlier in the beginning of our session we have so many files in the given NLTK corpora. Now let's now access the movie reviews corporas under the NLTK corpora. As you can see here we have the movie reviews. So for that we are going to import the movie reviews from the NLT corpus. So if you have a look at the different categories of the movie reviews we have two categories which are the negative and the positive. So if you have a look at the positive we can see we have so many text files here. Similarly if we have a look at the negative we have thousand negative files also here which have the negative feedbacks. So let's take a particular positive one into consideration which is the cv0029590. You can take any one of the files here doesn't matter. Now the above tokenization as you can see here the file is already tokenized but it is generally useful for us to do the tokenization but the above tokenization has increased our work here. And in order to use the count vectorzer and the TF IDF, we must pass the strings instead of the tokens. Now in order to convert the strings into token, we can use the D tokenizer within the NLTK. But uh that has some licensing issues as of now with the environment. So instead of that, we can also use the join method to join all the tokens of the list into a single string. And that's what we are going to use here. So first we are going to create an empty list and append all the tokens within it. We have the review list that is an empty list. Now what we are going to do here is remove all the extra spaces the commas from the list while appending it to the empty list and perform the same for the positive and the negative reviews. So this one we are doing it for the negative reviews and then we'll do the same for the positive reviews as well. So if you have a look at the length of this negative review list, it's 1,000. And the moment we add the positive reviews also, I think the length should reach 2,000. So let me just define the positive reviews. Now execute the same for positive reviews. And then again, if we have a look at the length of the review list, it should be 2,000. That is good. Now let us now create the targets. before creating the f features for our classifiers. So while creating the targets we are using the negative reviews here we are denoting it as zero and for the positive reviews we are converting it into one and also we will create an empty list and we'll add 1 th00and zeros followed by th00and ones into the empty list. Now we'll create uh panda series for the target list. Now the type of Y must result into a pandas series. So if you have a look at the output of the type of Y, it is pandas. That is good. Now let's have a look at the first five entries of the series. So as you can see it is th00and zeros which were followed by th00and ones. So the first five inputs are all zeros. Now we can start creating features using the count vectorzer or the bag of words. For that we need to import the count vectorzer. Now once we have initialized the vectorzer now we need to fit it onto the rev list. Now let us now have a look at the dimensions of this particular vector. So as you can see it's 2,000 by6,228. Now we are going to create a list with the names of all the features by typing the vectorzer name. So as you can see here we have our list. Now what we'll do is we'll create a pandas data frame by passing the sci matrix as values and feature names as the column names. Now let us now check the dimension of this particular pandas data frame. So as you can see it's the same dimension 200 by 16,228. Now if we have a look at the top five rows of the data frame. So as you can see here we have 16,228 columns with five rows and all the inputs are here zero. Now the data frame we are going to do is now split it into training and testing sets and let us now examine the training and the test sets as well. So as you can see the size here we have defined as 0. 25 that is the test set that is 25% the training set will have the 75% of the particular data frame. So if you have a look at the shape of the X train, we have 15,000 and if you have a look at the dimension of X test, this is 5,000. So now our data is split. Now we'll use the nave bias classifier for text classification over the training and testing sets. So now most of you guys might already be aware of what a nave bias classifier is. So it is basically a classification technique based on the base theorem with an assumption of independence among predictors. In simple terms, a nave bi classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. To know more, you can watch our nave bi classifier video the link to which is given in the description box below. If you want to pause at this moment of time and check quickly what a nave by classifier does and how it works, you can check that video and come back here. Now to implement nave bias algorithm in python we'll use the following library and the functions. We are going to import the gshian nb from sklearn library which is a scikitlearn. We are going to instantiate the classifier now and fit the classifier with the training features and the labels. We are also going to import the multinnomial nave bias because we do not have only two features here. We have the multinnomial features. So now we have passed the training and the test data set to this particular multinnomial n bias and then we will use the predict function and pass the training features. Now let's have a look and check the accuracy of this particular metrics. So as you can see here the accuracy here is one that is very highly unlikely but since it has given one that means it is overfitting and it is overly accurate. And you can also check the confusion matrix for the same. For that what you need to do is use the confusion matrix on these variables which is y test and y predicted. So as you can see here although it has predicted 100% accuracy the accuracy is one. This is very highly unlikely and you might have got a different output for this one. I've got the output here as 1. 0. You might have got an output as 0. 6 0. 7 or any number in between 0 and 1. Now what is syntax tree? So this is important thing. Syntax tree. Syntax is a study of rules governing the way words are combined to form sentences in a language. So whenever you create sentence there's always some rule that you need to start off with some identifier maybe the is a then you have certain verb a certain noun coming into picture then you have certain verb maybe some adjectives comes into picture. So there are rules for creating sentence. You cannot write a sentence without any rules. We have to have some rules like we have to specify some noun then verb then adjective then prepositions. So these rules are called as a syntax. So syntax is study of rules governing the way words are combined. Sentences are composed of discrete units called rules. So every sentence has certain rule in it. Whether it's a past continuous or present perfect or past perfect or whether it's a simple sentence, simple tense. So we have

### Reinforcement Learning [8:55:04]

certain rules for defining a sentence and that is defined by a syntax. Now phrase structure rules suppose any word can be starting with a noun. Suppose we have a certain word which can start with a noun or a verb. Then we have something called noun verb and then we have a determined prepositions. Then we can have again nouns, prepositions and adjectives. Then again we can have prepositions. Then finally we can have some closing preposition. Any kind of a word has some rules by what we can start it end it what has to be in between. So these are called as a phrases of rules. So in layman term syntax tree is a tree representation of syntactic structure of sentence or a string. So we have a whole sentence like for example the old tree sword in the wind. The is what the is a determinant old is adjective tree is a noun swad is verb in is a preposition. T is a determinant and wind is a noun. So if you see the rule for this language in this sentence we have determinant adjective noun verb preposition then determinant then noun and we can put it in the hier hierarchy as well noun preposition verb preposition and both are combining to become a sentence. So any word has got certain rules so we can define those rules we can check these rules whether it's working fine or not and in the same order the sentences are coming or not words of not. So in order to render syntax tree in your notebook you need to install go script a rendering engine for the link this. So if you go to this link I've already downloaded it. You need to download relevant. exe this exe file go script. exe and you need to update the path in the path variable. So if I go to this website I've already downloaded for 60 ghost script agl. And it's a. exe file which get downloaded. So I've already downloaded in my machine. This is the file which I have downloaded. I ran this file. It's like a installer as like other softwares. Once it is done, it is getting installed in C drive. It has got all these rules which are all the syntaxes defined in them. So when you go to program files, you get a folder called GS within GS9. 25. Within that we have a bin directory. So you need to copy this path and put it in the path variable in the Windows. You go to C drive properties this PC properties advanc system settings environment variables and under path you need to edit it and add the path edit text and you need to add this path after putting semicolon. So I've already added it so that this can be accessed from any location. So this exe file is having all the syntaxes present in it. Whatever can be possible that noun can come after adjective or not or adjective can come after noun or not. How the verb is going to be there in the sentence where the prepositions can be kept. So all the rules whichever is required to write a English language this go script AGPL release is having. So you need to download it first and once you download it you go to the folder where it is installed open the bin folder add the path of bin folder in your environment variables. So you need to copy the path where the bin folder is there and you need to paste that path in the environment variable so that any place wherever you're running NLP you can access it. So you need to go to the path variable edit it and add a text there. You need to Adding the path. So once you import the path once you do that now you'll notify the path of environment variable through piece of code you'll say import OS path to GS is path where it is present and in the environment variable of OS you add this path OS. environment path in the environment variable of OS library. So OS library has got some environment variables. Environment variable means path variables would automatically get set when we invoke Python or when we invoke notebook. So you need to add this path in the environment variable so that we can easily access it. So once you have invoked this path now that you have modified the path environment variable let's discuss some important concept with respect to analyzing sentence structure. What is chunking? Chunking is basically means picking up the individual pieces of information and grouping them into a bigger piece. So what is chunking? It is basically picking the smaller information and putting them together to make a bigger piece. The bigger piece is also known as a chunks. In the context of NLP chunking means grouping of words and tokens into chunks. So chunking is what? picking up the individual words, grouping them to make a valid sentence maybe or make a valid syntactically correct sentences. And this is what we do in chunking. We take the smaller units called tokens and we club them and make a bigger chunk. For example, you can see here we have individual words like V is a preposition, so is a verb, it is a determinant, JJ is characteristic or NN is a noun. So here dog we are doing you can see here adjectives in different colors. Dog is a noun and determinant are chunked together into a noun phrase. So we call it as a noun phrase. Three things combined together become a noun phrase. Determinant or determiner and noun and JJ which is becoming a noun phrase. So there are certain rules about English. Noun phrase is having determinant JJ and noun. It becomes a rule for this. So let's see how we do chunking in Python. So I'll show you one example. So let's say we import NLTK and OS library. Then we import NLTK. corpus. Then we use word tokenizer and regular expression tokenizer. We import formality ticket data import load. So this is used because we already have some inbuilt data there in this import load. Now we have a sentence called Mary is driving a big car. We tokenize it. Send tokens. So we print send tokens. So Mary is driving a big car. We get the tokens out of it and we put a POS tag in all those tokens. Once we put PS tags, we get to know what is a noun there, what is a verb, what is adjective, all those things. Similarly, we have another sentence. John is eating a delicious cake. We tokenize it. We put it to PS tags. We get for John what is the word having all these things. Similarly, for the Jim eats banana, we tokenize it and we put in the tokenizer. Similarly, using regular expression tokenizer, we can do regular expression tokenizer to tokenize it. We get Jim eats a banana with a space in between. And now, if I pass this to POS tagger, we get individual spaces in between them. So in the corpora if you're going to find an NLTK data find the word corpora you'll see word corpora there in this you'll see so many files which is having corpora in it the word corpora we see the corpor fields we can set see all the fields there so this is for just our knowledge so let me come to the point so what we are doing here we pick up one of the files from there so we have all the fields in the corpus all these text files are already present there we pick up one file Shakespeare helmet. txt txt maybe. So we are already using inbuilt file rather than passing our own string. And we see length of this. So it has got 37360 different words in it. It has got 37,360 words. And we pick only top 2,000 words. And we pass it to PS tag and we create a helmet pause. We get a create a list having all these words. And this is what the list has become. It is going to print the list having all the words having poss characteristics. So it will take some time because it's a big file. So we get all the words 2,000 words from this file. T is a determinant. Tragedy is a noun. Helmet is a noun. By is in. William is a noun. We get the complete details of this. Now we are going to pass it check only the NP words in this which is a noun phrase. We're going to check if it is a noun phrase then only we are using it. So we get all the nouns from this file. So we get all the things. We got a article, we got a determinant, we got a verb in this file. But we are interested in all the filtering NNP and we create a new list called helmet NNP. Then we are going to put one more filter there. We are going to filter out names like William, Williams, reading and reading. And we get to know the context of the same in this file. If you go down, you'll see in which context they are getting used. So William is used as a noun, noun phrase or it noun. Reading is used as a verb as well as noun. Now we import chunk. any chunk and we say US president stays in white house tokenize it put it to NS tax and chunks it we have seen all these examples the US is a organization white is a facility white house is a facility we have seen these examples similarly the state of New York touches Atlantic Ocean from there we come to know geographical place is New York organization is Atlantic similarly Apple is company name we get geographical location person is Apple organization is company so as I told you like this is not a very good way of analyzing sentences because it is not going to give you all the time right results to you. So let me open chunking one. Okay. So this is pertain to junkie. So we have seen we have imported word tokenizer. We have used regular expression tokenizer. So we have used ntk tokeniz import word tokenizer and regular expression tokenizer. We have added the path c program file gs in the bin path and we add the path in the environment variable. So this is the first step we have to do. We have to download this file gs9. 25. Reason being we want to have some goose how the words are getting created in a form of a sentence. So let me increase the font size. Now we have the sentence the little mouse ate a fresh cheese. So this sentence we pass it to P tag. P tag is doing what? It is going to categorize whether it's a noun, it's adjective, it's a verb. Then we introduce a grammar. We say the grammar always starts with NP. That's a keyword we need to start. And in grammar we say we have a determinant then any single word. then JJ then noun any of the conjunction words and then we have a noun and we define this kind of a grammar and we parse it through regular expression parser for this and then we use it to whenever the parser is getting created we parse it and see the chunk results chunking is doing what here if I run this it is getting a tree like structure so whole is a sentence s is a sentence noun phrase is what it is going to determine internally what all are the nouns in them noun phrase so the little mouse this is the noun phrase and another noun phrase is the fresh G's and eight is a verb. So we got a from the grammar whatever we have decided the grammar is what on the basis of that it is going to internally create a treel like structure how it is joining two noun phrases. So it is a verb we get a hierarchy of words how the noun phrases are getting joined and one noun phrase can have a determinant j and noun. Adjective j is what adjective we have a determinant adjective and noun. Similarly, determinant, adjective and noun. So if you put it in a form of a clause, we can say noun phrases are made up of determinant, adjective and noun. And two noun phrases are always joined by the verb. So you can define your own grammar. Say that we want a sentence in which we have a noun phrases and we join noun phrases by a verb. And noun phrases can have a determinant, adjective and nouns. So you see this the rule what you have saying said determinant adjective and noun and that is what is coming in the answer determinant adjective and noun so depending on the grammar the sentence has been chunked out let's see the next example okay so here she is wearing a beautiful dress we poss tag it we find the adverb adjectives and all those things then we use chunk parser parse send two tokens right we have not defined any grammar in this we have not seen and we get the results. So if I run this, see it is giving me the results as preposition, verb and again verb. Similarly preposition, determinant, adjective and noun. So noun phrase always consist of determinant, adjective and noun. And a sentence is having prepositions, verbs and verb another verb. So it is determining on its own the treel like structure how the complete grammar has been created. So it is going to take its grammar by default as per the English language and creating our own treel like structure. So what is the need of this chunking? The question comes why do we choose this chunking? Reason being we need to figure it out how the sentences have been created in the whole paragraph or maybe in the whole document. And based on those sentences we are going to create a big treel like structure and filter out all the noun phrases there, all the adjectives there, all the prepositions there, all the verb phrases there and figure it out in which tree at what level they lie. or our decision- making. When we are creating a random forest like a tree, we are going to figure it out how the noun phrases are dependent on adjective, how adjectives are dependent on determinants, how determinants are dependent on prepositions. We find the dependencies among them and we'll find out the right clusters that okay, these are the clusters. We are determinant, adjective and nouns come together always. These are all sentences. These are the sentences in which we have determinant, adjectives and noun. These are sentences where noun phrases, verbs and noun phrases are coming. So we can do clusting kind of algorithm internally. So chunking is dividing the whole set of data into individual words and looking at their context how they're coming in a form of a treel like structure and a treel like structure is a complete hierarchy. Noun phrase can consist of like in the previous case you can see noun phrase was having determinant adjective and noun. Here noun phrase is noun again. So noun phrase is something which is most commonly coming as determinant adjective and noun. In the previous case also the little mouse determinant adjective and noun. Let's take one more example. She's walking quickly to the mall and here we tokenize it. We define the grammar as preposition verb any form of a verb VB VBD VB zbg and then any form of a RB. So we define our own grammar that I want to filter out the words in a tree according to this rule which I have given. Then I'm parse this using regular expression parser and send these tokens to a creating a tree. Now the answer you get here is you get first PRP preposition as per our rule. Then you get the verb of any form. Then you get a an adjective. And rest of the words which are not lying into this like to dt and n. If you see 2dt and nn. So dt is basically a determiner and nn is a noun. So we already know that two is a two. So two is classified as two. So what we are doing here we are classifying a sentence as per the rule which we are trying to fix up and building up a hierarchy. Let's say change this rule up to some extent. Let's say I make this rule as I want to have. So let's say I remove this VB. Now you can see I have changed the rule. I have said I want a preposition then verb then adjective. So you can see first I got a preposition then I get any form of a verb then VB which is quickly and then any adjective which is there in that or not or list of the things the tree gets changed depending on the grammar it is coming as per the like the first one we get a determinant we get a adjective we get a noun. So we get a determinant, adjective and noun, determinant, adjective and noun. So whatever the grammar is going to be there, it is going to branch out to categorize. So dt has one branch, adjective has one branch, noun as another branch. And if you do not able to create a branch, it will come to the base branch as a sentence. So in this case we have a preposition as a separate branch. We have a verb adjective as a separate branch. We don't have an adjective. It is not coming. That's reason. And let's say if you don't give any kind of a rule, it will do it as per its own wish. He drives fast on highways. So first preposition come, then verb come, then RB is what? Adverb come. RB is adverb. Then adverb comes, then in comes in is preposition. I is again a preposition that comes and then noun comes. And if you might have provided some kind of a rule there, let's say I give the same kind of a rule which was there in the top. Then things would have changed. If I would have given this rule, the things would have changed in this graph. Now this is known as chunking. We have discussed this all about chunking. So let's talk in detail about this. So let's consider a such scenario. A little mouse ate the fresh cheese. Convert the sentence into token and add POS tags to the same. The PS tag is adding the whether what type of word it is. It's adjective noun or any adverb it is going to add it. Now we'll create a grammar from the noun phrase and we'll mention the tags that we want in our chunk phrase. Here we have created regular expression matching this chunk. We want a determinant. We want adjective without a noun. Now we will have to pass the chunk. Hence we'll make the chunk parser and pass out noun phrase string to it. We pass the complete string to a regular expression parser. The parser is now ready. We will use to pass within our chunk parser to create our sentence. So we'll pass it to the chunk parser. So first of all we run the regular expression parser to determine what all our determinants are join adjectives and nouns and then we pass it to chunk parser to create a tree for us. The tokens that match a regular expression are chunked together into noun phrase. So we have given a noun phrase right? We have said okay these are consolidatedly become should become a noun phrase and we get np here np noun phrase we call it as what np. So noun phrase is this word. So in the previous example if I determine give some word to them like for example it is a verb phrase we get as a word as verb phrase. If I give it as noun phrase np it is a noun phrase we call it as a noun phrase. So my grammar says she is a noun phrase. NP is a verb phrase. Noun phrase. So all the noun phrases get distinguished by this rule. So we say that my noun phrase should contain a preposition, verb and a adjective. So it is your choice what kind of rule you want to put in and according to your grammar which you put in system is going to create a tree for you. So maybe it is not a noun phrase. If I'm not in good in English, so I will say it is not a noun phrase. Maybe in the previous example we say noun phrase is determinant, adjective and noun. And if I'll give it as verb phrase, system will call it as a verb phrase. But is you as a teacher going to train the machine that what is noun phrase, what is verb phrase, how the sentence should look like, what it should contain. So we need to define that grammar. So she is wearing a beautiful dress and we supposed tokenize it and pass it to passer. So convert this sentence to tokens and add PS tags. Now we'll we will now have a pass the chunk here. We'll create a chunk passer and pass a noun present to it. So it is using the chunk passer default passer and it is automatically figuring out what is a noun phrase for us. So noun phrase generally have determinants adjectives and nouns and it is finding out preposition verb and verbs creating our own tree for us. So if you do not have a passer you can use the default passer or you do not have a grammar there with you can ignore that system will automatically do for you. Similarly let's create a verb phrase. So we define a verb phrase as preposition then verb of any form then adjective. I'll create another passer. We'll pass through a verb phrase. Create another sentence and we'll tokenize and add a p text to it. Again, pass it to the passer and we'll create a verb phrase containing all this. A verb phrase where a pronoun followed by two verbs followed by adverb are chunked together into a verb phrase. So as per my rule which I'm defining system is chunking out that part from the whole sentence. So let's consider another sentence below. He drives fast on highways and we use a default passer. It gives a verb phrase. Verb phrase consists of preposition, verb and the verb. First step is tokenize it. That is word tokenizer. Second step is pass it to post tag. That's a verb, adjective or noun or pronoun. Third step is to pass it to chunk parser to make a tree. So there are three steps. Now there's another sub program or subprocess called chinking. Chunking is dividing the whole set of big sentence into verb phrases or the grammar that we have in mind and divided into a independent sentences or independent tokens. What is chinking? Helps us to define what lead from a chunk. So if you want to have some meaning from the chunk, we call it as chinking. Underquence of tokens which are not included in the chunk. So it is finding a insights or context like let's say we did stemization and limitization. What was the difference? Chinking is going into the context but chunking is not doing that. Chunking is just dividing the words into phrases or verb phrase or noun phrase. Chinking is a more deep dive. Chinking in Python. Let's create a chinking grammar string containing three things. Chunk name, the name of the chunk that we want to pass. Regular expression sequence of a chunk and our So it contest consists of three parts. The name of a chunk, regular expression sequence of a chunk. So whatever the regular expression we want to put in the chunk of filter the data and regular expression sequence whatever we want to make out of it. So let's say our chunk is going to divide the sentence into preposition, verb of any form and adverb. I want to keep whatever can come later on. So let's see this example. Let's say we use a regular expression parser now has been passed to regular tokenize it. So you'll see on comparing the syntax tree of chin parel chunk you can see that token quickly that is at verb is chilled out of the chunk. So in the previous case we were getting this word quickly as a part of the chunk. Right? So if you see this what is the sentence she is walking quickly to the mall that was sentence. So this was the sentence she was walking quickly to the mall. Let me give this see when I use this grammar preposition verb of any form and adjective to the chunk parser. It was giving quickly as a part of verb phrase. You can see this it was a part of verb phrase. But when I pass it to chunk parser, the adverb has been moved out of it. Why it has been moved? Because we define the grammar that whatever the chunk you are having take adverb out of it. You can see the curly braces which is out outer side closing. We have curly braces of opposite sides, we have created one or more adverbs. So whatever the adverbs are going to be, it's going to be chunked out from there. So we get the answer here adverb coming out of it. So syntax if you compare previous one, the adverb was coming as a verb phrase. Now it is not coming as verb phrase. It is coming out of it quickly. We'll now create a parser from NLTK regular expression. We'll pass ching grammar to it. So if you use NLTK grammar and pass a ching grammar, it is going to keep quickly out of the chunk because we have given in the rule that I won't want this adverb there in the chunk. Now we'll discuss about another topic in LTK for analyzing sentence known as contextf

### What is Deep Learning? [9:16:30]

free grammar. So this is a big domain guys. There's a subject called automator. If you research more on that subject, automata is one field wherein how system is understanding natural language is beautifully explained there. So I'll take one web page and I'll explain you what is contextf free grammar. Types of grammar tutorials point. So let me discuss what are different types of grammar. So introduction to grammar. So the grammar denotes syntactical rules for conversation in natural languages. Linguistics have attempted to define grammars since the inception of natural language like English, Sanskrit, Mandarin etc. The theory of formal grammar finds its applicability extensively in the field of computer science. Nom Chomsky gave a mathematical model in 1956 for writing computer languages. He's a father of automator Nom Chomsky and there's a book in automator generally you'll get it from the Indian authors or foreigner authors in the market. The book name is automator and it has got all the rules there in what is grammar? A grammar can be written by four pupils N, T, S and P where N is a set of variables or non- terminal symbols. T is a set of terminal symbols. S is a special variable called start symbol which belongs to N. So start symbol is also one of the non- terminal terminals symbols. T is a production rules or the terminals or the non- terminals. What are the rules which need to put? So in order to form any grammar, we should have set of variables called non- terminal symbols which are not ending ones. some terminals which are the ending ones, some special variables which are known as a start variables and some rules to define it. A production rule has a formula a pointing to beta where a and b alpha or beta represents a subset of whole set of strings. So let's say for example we have a grammar s a b s gives to a b a gives to b. Now S, A and B are non- terminal symbols. Here the terminal symbols. Here S is a start and P is a rule production rules. So rule says what? Rule says that non- terminal symbols giving to some more non- terminal symbols and non- terminal symbols giving end to terminal symbols. Now if I'll take this let me take this and analyze it more. So we have let's say this rule S pointing to A B A pointing to A and B pointing to B. Now what is the start symbol? start is AB. So I'll pick up this one and I can always replace A with what? In the next line I can say A is pointing to A. This can be done. Similarly in the next line we can put B pointing to B. So sentence is having AB. This is the final answer. Now if you go down let's say we have another rule. Now the production rules are S is giving rise to this. This this and epsylon. Epsylon means empty. So what is the start rule? Now what you can do is you can substitute a with what? A with a and capital b. You can do that. Now you put small a and a with this. Right? Small a and capital a you can substitute with a b. Now again small and a you can substitute it this one. So I'll put s pointing to a and a would be this one. Again this a and capital a would be substituted by what? A b. I have done that and B was already there. So this is going to be the replacement. Again I'll go one more level up. I can substitute this A with small A with what? Double A, capital A, B and previous B's. Again I can go up to the next level. Right? We can go like this. Keep on going. And finally this a camd can be replaced by epsylon. Absylon means empty. So we can get a final rule final string as this string. So this is the final string which we get from this rule. So we will discuss another important topic in LTK is known as context free grammar. Context free grammar is a layman terms as simple grammar where certain rules describe possible combination of word and phrases. We have seen that certain rules are describing the words. Context free grammar is a pupil with four values n sigma r and s. N is a finite set of terminal symbols. Sigma is the alphabet of R are the rules and S is a starting point. So we have seen all this. It generates a language by capturing consistency and ordering. For example, sentence is consisting of noun phrase and a verb phrase. Noun phrase determined nominal. Nominal deter nominal is pointing to noun. Verb phrase is pointing to verb. Determinant is a noun is flight. Verb is left. So let's say we have a word having these things as a rules given there. Sentence always consists of noun phrase and a verb phrase. Noun phrase can contain a determinant. Nominal can contain a noun. So again this nominal which we are defining here can contain a noun or we can say in this terms let me put it in easier words. So like this. So according to this start point is let's say AB forget about noun and verb. A is pointing to C and D. D is pointing to small D noun which is the most nominal one. B is pointing to verb clause pointing to B and C is also capital here because determinant is capital. We have a rule for C. Determinant is pointing to a suppose some X. Noun is pointing to flight. So third rule is D. D is again capital to K. K is pointing to some flight maybe some word. Verb is again pointing to VP that is fourth one. It is again B or M. M is pointing to left. So some kind of rule like this. So we have a certain set of rules. Based on that we are creating a sentences. This is what it says. A sentence can contain a noun phrase and a verb phrase. Again noun phrase can contain what within noun phrase what things are coming can contain what again that within that will contain what so this says that there are set of units s np and vp are in the language s consist of np followed by immediately vp doesn't say that that's like only kind of s nor does it say like there's only one place that np and vp can occur. So there can be a combination of them as we have seen a bb any number of combinations are coming. So we can create n number of sentences. So from this sheet which we have done in the past we can have n number of sentences there possibility right the minimum one and the maximum one. So we had n number of sentences we can create a triple b this is another sentence this is another sentence. So from the rule we can create multiple sentences right. So this is the intention of natural context free grammar. So let's implement NLP in this. Let's say we define contextf free grammar. There's a library called cfg inbuilt library. And from the string we define a rule. Sentence can contain a noun phrase and verb phrase. Verb phrase can contain a verb and a noun. Verb can be saw and met. Noun phrase can be John or Jim. Noun can be dog or cat. Basically cfg you can have almost all the permutation of sentences as long as the above conditions are met. It should have a noun phrase followed by a verb phrase. Verb phrase will be having a verb and a noun. A verb can either be saw or met. A noun can be either jump or gym and dog or cat. So we defined a rule and based on that rule we can create any number of sentences and context free grammar productions are these many verb can be verb and a noun. Verb can be saw, verb can be met, noun phrase can be John, noun phrase can be gym, noun can be dog, noun can be cat. Productions is going to do what? All the permutation combinations which are possible with these rules. Productions function is going to give that. Now we've learned most of the concept. Let's automate the entire process of text paraphrasing. Now we create a function that will take sentence as an input and will tokenize it and tag it. Also will define context grammar for the same sentence. So this cfg paraphrase sentence is taking a sentence from you. It is going to do what? word tokenize it for one incent. So if it is NNP it is going to add / single quotes over there. If it is a verb again it is putting a single quotes over there. If it is a noun it is putting a single quotes else it is passing. So any word which is reading after tokenizing if it is a noun because we have already done POS tagging if poss tagging if found it's a noun phrase it's a verb phrase in the past tense or a present tense or if it is a noun simple we're going to put it in a single quote and we define a context free grammar as sentence should have noun phrase verb phrase contain noun and verb noun phrase can have anything verb noun anything in the format of s np the one which we have defined above the whenever these nouns are coming put it in a single quotes. Whenever the verb is coming, another noun We define the rules for this. A sentence to generate John saw a long white boat. If you pass it, it will give a result as John saw. Why should you use time series analysis? So first of all, in time series analysis, you just have one variable that is time. Now you must have seen there is a lot of algorithms present. Then why do we need one more algorithm that is time series? So let me explain you this with an example. Now let's take an example of a supervised learning. So under supervised learning we have linear regression or logistic. So there we have an independent variable and we have a dependent variable. So there what we do we deduce a function or you can say a mapping function of how one variable is related to another and then we can go ahead with analysis part. But in time series analysis you just have one variable that is time. So for example, you own a coffee shop and it's quite a successful coffee shop in the town. So what do you do? You try to see how many number of cups of coffee you sell every month. For that what you will do, you'll add up all the sales of your coffee. Now let's say you started this coffee shop in the first month that is the January. So what you'll do, you'll record the data month and then you'll sum it up. So you will have all the data till the present month. But what if you want to know the sales the next month or the next year. Now imagine guys you just have one variable that is sales and you need to predict that variable in accordance with time. So in such cases where you just have time and you need to predict the other variable you need time series analysis. Now we know why do we need time series analysis. Let's move ahead and understand what exactly time series is. So time series is a set of observations or you can say data points which are taken at a specified time. Now over here at your x-axis you have the time and on the y-axis you have the magnitude of the data. So if you try to plot time series plot on the x-axis you will always get the time which is divided into equal intervals. So you cannot create a time series if one data point is at week level and other are different. There should be equal interval. Let's say a day, a week, a month, a year, a decade and a century. So that is a constant thing that a time series require. Now let us see the importance of time series analysis. Now first and foremost is business forecasting because your past defines what is going to happen in future. So let's say you'll be seeing a lot of traders in the Sensex who are trying to predict what will be the price of the stock market tomorrow. So that is nothing but a business forecasting. You also see a lot of retailers who tries to know how many number of goods they are going to sell the next day. So all of this can be achieved with time series analysis. Now this is not just limited to one domain like retail or finance but it is applicable almost everywhere. Now it is also help us to analyze the past behavior. So here you can analyze in which month did the sales went up or when was the dip. So here you can easily understand your past data. So with every dip and a peak there is a business reason attached to it. So you can understand this with respect to time. For example, some festival is there and you're selling chocolates. So your sales will increase during a festival. So you need to think about the seasonality part also. Now don't worry guys, we'll be having a complete discussion on seasonality as well. So now coming back, it also helps you to plan the future operations. So you can analyze the past and then you can forecast your future using this algorithm that is time series analysis. Now apart from all this you can also evaluate current accomplishment. So this means you can determine which goals you have met in the current scenario. Let's say you have predicted okay I'm going to sell around 100 chocolates in a day but did you actually do that? So all of this can be analyzed using time series analysis. Moving ahead let us see the different components of time series. Now most of time series have trend, seasonality and irregularity associated with them and some of them do have cyclic patterns also but it is not compulsory that there has to be pattern present. So let us discuss each one of them in detail. Now the first is trend. So trend is nothing but a movement to relatively higher or lower values over a long period of time. So when the time series analysis shows a general pattern that is upward we call as a uptrend. Also, if the trend exhibits a lower pattern that is down, we'd call it as a downtrend. And if there was no trend, we call it as a horizontal trend or you can say a stationary trend. So, now let me explain you better with an example. So, there is a new township that has been constructed. Okay? And people are going to come and live over there. So, what happens? A hardware guy comes up and opens up a shop there. So, people who will be coming up will definitely buy stuff from there. Now, once all these houses are settled up or it's been occupied, the need of hardware reduces. So the trend may go down. So let's say the sales were up in the first year and by another one year or maybe in 6 months it has gone down. So that is the trend guys. So for some amount of time selling was high and then it got down. But this is not a pattern. This is something that is happening year on year. But trend is something that happens for some time and then it disappears. Then we have seasonality. So here seasonality is basically upward or downward swings. But this is quite different. It's a repeating pattern within a fixed time period. So for example, Christmas happens every year on 25th December. Let's say you're on the business of chocolates. So every year on year chocolates are served more and more in the last week of December. Now this is because Christmas is there and you'll be noticing this across the year that is from past 2 years, 4 years, 6 year, 10 years and so on. So it's a repeating pattern within a fixed time period. While in trend that is not the case. Now let me take another example. Let's say ice cream this time. So ice cream sales will go comparatively higher in summers rather than in winter. So that is again a seasonality. Then we have irregularity or it is also called as noise. So these are eretic in nature or you can say unsistatic. It is also called as residual. So this happens basically for short duration and it is non-re repeating. So here let me give you an example. So let's say there's a natural disaster. flood in your town out of nowhere in one year. Now a lot of people are buying medicines and ointment for relief. But after some time when everything is settled up the sales of those ointments have gone down. So this is something that no one could have predicted. It's going to happen erratically. You don't know how much number of sales are going to happen. So you cannot force you about the event that the flood is happening. Okay. So this is some random variation. So this is what irregularity is. Now moving ahead we have cyclic. So cyclic is basically repeating up and down movements. So this means you can go over more than a year. So they don't have a fixed pattern. So they can happen any time let's say in 2 years then fourth year then maybe in 6 months. So they keep on repeating and they are much harder to predict. Now moving ahead let's discuss when not to apply time series analysis. So first of all you cannot see time series analysis when the values are constant. So let me take the same coffee example over here. So let's say the sales of number of coffee in the previous month were 500. Then this month also the sales number is almost the same that is 500. Now I wanted to predict the number of sales in the next month. Now in such cases where the values are constant as in our case the number of sales of 500 in the previous month and then in this month also we have the same number and now we want to predict it for the next month. So in such cases where the values are constant time series cannot be applied. Similarly if you have values in the form of functions let's say you have sin of x or cos of x. So for example in this case you have a x value and you can get the value by just putting it in the function. So there is no point of applying time series analysis where you can calculate the values by just using a function. Now you can apply time series to these as well. But again there is no point of applying it if you have a formula before that or the values are just constant. So these are the cases when you should not apply time series. Moving ahead let us see what is stationerity. So no matter what guys how much you try to avoid the stationerity part it will always be there in time series. So here time series requires the data to be stationary. So any kind of statistical model that you apply on time series the data should be stationary. So let's understand what exactly it is. Now most of the models work on the assumption that time series is stationary. Now if the time series has a particular behavior over time there is a very high probability that it will have that it will follow the same in the future. Also the theories and formulas that are related to stationary series are more mature and easier to implement as compared to a non-stationary series. Now there are two major reasons behind the non-stationary of a time series. So first is trend which is basically the varying mean over time. Secondly we have seasonality. So this is the variation of a specific time frames. But did you guys get the answer to this question what exactly stationary or how exactly stationality is defined? So stationarity basically has a very strict criteria. The first one is it should have a constant mean. Now here the mean should be constant according to the time. Secondly we have constant variance. So again variance should be equal at different time intervals. And thirdly we have autocoariance that does not depend on time. So for those of you who don't know what mean is I'll not go into the details but I'll just explain you in a nutshell. So mean is basically the average. Then variance is just the distance from the mean. So each point's distance from the mean should be equal. And then we have autocoariance that should not depend on time or it should be equal as well. So for example, let's say you're standing at time t. Okay. And your previous time period was t minus1 or t minus 2. Let's say there are previous two time periods. So the values at t minus 2 or t minus1 t they should not have any kind of correlation between them which is basically dependent on your time period. So that is nothing but autocoariance. So when these three conditions are met then we can say our series is stationary and then we can apply time series analysis over it. Now to check the stationarity in Python we have two popular test. Now first is rolling statistics and second is ADCF or you can say augmented dick fuller test. Now in rolling statistics we can plot the moving average or you can say moving variance and see if it varies with time. Now by moving average or variance I mean that any instance t will take the average or variance of a time window. Let's say if you want to know for the last year that is for last 12 months or anything and also guys this is more of a visual technique so you cannot deploy this kind of stuff on production but it is quite useful for the PC purpose. Then we have ADCF or you can say augmented dick fuller test in the world of data science. So dicker test which is another statistical test for checking stationerity. Now here you have the null hypothesis which is time series is non-stationary and once you perform this test you'll get a result which comprises of a test statistic and some critical values for different confidence level. Now here it is said that if the test statistic is less than the critical value we can reject the null hypothesis and say that the series is stationary. So don't worry guys, I'll be explaining this again when we go to a demo part. But I hope you guys are clear with what exactly is stationary and how we can check the stationerity. All right. So now let me just move on to my next topic. So now we'll discuss what exactly is ARMA model. Now ARMA is one of the best model to work with time series data. So this is basically the combination of two models that is AR plus MA and it's quite powerful guys. So once you combine both of these model you get the ARMA model. Now your AR model stands for auto reggressive part and MA model stands for moving average. So AR is a separate model. MA is a separate model and what binds it together is the integration part that is indicated by I. So AR is nothing but the correlation between the previous time period to the current. So what does this mean? Now let's take this into consideration that you are standing at a time period t and there are previous time periods like tus1 tus 2 tus 3. Now if you find any correlation between t minus 3 and t that is nothing but the auto reggressive part. So as I told you earlier that there is always some kind of noise or irregularity attached in a time series. So we need to figure out that noise. In fact we need to average that out. Now whenever we try to average it out the crust and traps that are present in that noise smoothen out and we can have average forecast of that noise. You can actually never predict when a next customer is going to come in and buy 100 items at once. So we try to smoothen it up by taking its average. Now ARMA model has three parameters. It has P, it has Q and it has D. So P basically refers to your auto reggressive lags. Then Q stands for moving average and D is the order of differentiation. So we have each parameter for each of the models. So if we take the integration by just one order, then the value of D would be one. If we differentiate it in the order of two, then we have the value D equals to two. So that is how we can predict these values P, Q and D. And each of them has a different method to it. So if you want to predict the value of P you'll be using NPCF graph that is nothing but a partial autocorrelation graph. Then to predict Q value we need to plot ACF plot that is autocorrelation plot and D I have already told you to make data stationary we use some kind of differentiation. So the order differentiation defines the value D. So I guess enough of theory part. So now let's quickly jump onto the demo and let's see how we can implement all of these things. So now we'll have a look to a demo and we'll forecast the future. So here we have a problem statement where there's a airline which has the data of passengers across months. So here what you need to do you need to build a forecast to determine how many number of passengers are going to abboard these airlines at the month level in the future. So here we have month or you can say dates. So here we have dates from 1949 till 1960 and we have the number of passengers traveling per month. So now we have this kind of data and we need to analyze what will be the number of passengers if we have to predict it for next 10 years. So now let me just go to my Jupyter notebook and let us see how my predictions look like. So guys this is my Jupyter notebook wherein I have the code and we'll be implementing all the things that we have discussed till now. So first of all we'll be importing all the necessary libraries. So here we have imported numpy then we have imported pandas for data analysis part and you can say data processing. Then we have imported mattplot lip for data visualization creating plots and all those things. Then in order to implement mattplot li we have also written percentage mattplot li inline for jupyter notebook. So not get a particular plot open in a new window. Everything will be there in your jupyter notebook itself. And then I have just defined a size. So now let me just run this. Next what I've done I have imported my air passengers data using pandas. So we have a function of read CSV in pandas that is represented with PD. So we have substituted this in a variable data set and then what we have done we have just passed those strings in a datetime format. So here we have set our data month wise. So using pandas we have a function to date time. So over here you can specify month and then you can just set this as your index. So here you have index variable as month. Next what I have done I have imported date time and then I have just printed the top five values. So this is how my data looks like. I have month as my index and then I have number of passengers as my second column. So this data I've already showed you in the presentation wherein I have the data from 1949 till 1960. So I have just printed the head of it. So now let me just print the tail. So let's say I want to know the last five data entries. So here we have data till 1960 and we have the number of passengers. Next what we have done we have simply plotted a graph between them. So guys in time series we have date and we have another variable. So here my other variable is number of air passengers. So here we have date on my x-axis and number of passengers on my y-axis and then we have

### Introduction to Keras [9:40:24]

simply plot that graph. So now let me just run this. So this is how your data look like. So here if you notice you have a trend. So your next step is to check the stationerity. So I'll give you 10 second guys and think whether this data is stationary or not. So just think and give me a reply whether this data is stationary or not. Right Shivani? So this data is non-stationary. So here you can see the trend is going up. So let's say if you want to calculate the mean at 1951. So here your mean will lie somewhat over here. And let's say if you want to calculate the mean of this year that is 1960. So here your mean will be somewhere here. So here you can see that you have a upward trend and the mean is not constant. So this tells me your data is not stationary. So now I have told you guys that there are two tests which basically helps you in checking the stationarity of the data. So here we have rolling statistics as well as we have ADCF. So let us go through each one of them. So here we'll be first going to the rolling statistics. So here we have rolling mean and we have rolling standard deviation. So here as you can see we have a window of 12 that is nothing but the window of 12 months. So let's say you have Jan 1949 and you place the value of Jan 1950 with the value of 1949. So this gives you the rolling mean at yearly level and you have to do the same with the standard deviation as well. So in Python to calculate mean and standard deviation you have a function dot mean and you have std. So this will automatically calculate mean and standard deviation. So here if you notice your first 11 rows is nan that is not a number. Now this is because we have calculated all the averages of these 11 and given over here. And similarly you can do the same for the next ones. Next if you just scroll a little bit you'll see it's a long data set and you have the same result for standard deviation as well. So it's the same procedure guys. Average has been calculated and then just given out. So here you must be having a question why only 11 values are nan. So over here we have just given a window of 12. Let's say you have given a window at daily basis or you have at a day level then your window size would be 365. So here my data is at monthly level so the forecast will be on monthly only. Now similarly if you have data at day level then probably your window can be 365. So I hope you get the reason why I'm giving the wind as 12 and why we are calculating the mean and standard deviation. Then what we have done we have simply plotted this rolling statistics bar. So here we have the original data which is just plotted by the color blue. Then we have the mean data. So here we have just plotted the mean for what we have just calculated above and then we have given the color red to it. Similarly we have plotted the same for standard deviation and we have given a color black to it. After that we have just given a legend. We have given a title to it. And now let me just run this code for it. So over here you can see we have a plot somewhat like this. So guys blue line is my original data and as you can see I have my mean in red and I have a rolling standard deviation in black color. So over here you can conclude that your mean and even your standard deviation is not constant. So here your data is not stationary. So guys this is my rolling statistics method is again a visual technique. So here we have already concluded that this is not a stationary data set. Now let me perform dicky fuller test as well. So to perform dick fuller test in python you have to import from stats model. tsa stat tools import AD fuller. Now this is the function which has been provided in dicker test. So here I have a function that is ad filler. I have passed the data set into it which is the number of passengers and then I have just given a lag which is equals to AIC. Now AIC is basically a k information criterion. Now what does this AIC mean? So AIC gives you the information about what you wanted in time series the exact value the actual value and it analyzes from the difference between them. So don't just worry about this guys for now just think about this as a metric and see what happens when we just run this dick fuller test. So when we'll run this we'll have values to test statistics. We have t value number of lags that has been used and number of observations used and then we have printed the values in a loop. So now let me just run this cell as well. So this first statement will basically print all the values. Now I have a state statistic value, a p value, number of lags used, number of observation and we have critical value at different percentages. So here your null hypothesis says that your p value should be always less. So here we have a very large value that is 0. 9. So this should be somewhat around 0. 5. So that would be a great thing. Also a critical value should also be more than the test statistic. So here we cannot reject the null hypothesis and we can say that data is not stationary. Then what we'll do? We'll estimate the trend. So here also with the results of Dicki Fuller, we got to know that the data is not stationary. So here what we have done, we have taken a log of the index data set. So index data set is nothing but the data set which has index as time or the data which has been set monthly wise. So here we have just taken a log and let me just run this for you. Now if you see here numbers on your y-axis have changed because the scale itself has changed. Here we have taken the log but here your trends remains the same whereas the value of y has been changed. Next let us calculate the moving average with the same window. But keep in mind guys that this time we'll be taking up with the log time series. So again we'll be having window equals to 12 that is nothing but the 12 months and then we'll be just plotting the graph with the log time series. So here data is already in the logged form. So now let me just print it. So here you can conclude that mean is not stationary but it is quite better than the previous one. But again it is not stationary because it's moving with the time and this trend is again an upward trend. So we can say that the data is not stationary again. Next what we'll do we'll get the difference between the moving average and the actual number of passengers. So we have mean and the actual time series that we have. Now why are we doing this? Now the reason is that unless we perform all these transformation we'll not get the time series as stationary. So now you must be having a question as to whether it's the standard way to make a time series stationary. No it's not guys because it depends on your time series as in how you can make it stationary. Like sometimes you have to take log sometimes you might want to take a square of it sometime cube roots. So it all depends on data what it holds. So here we're going to log scale. So we are going to take MA and then subtract both of them. So here we have the log scale and we have the moving average and then we have just printed the head of it that is the top 12 values. Then what we have done we have just removed the N values. So that is done by just typing drop NA and in the brace you can write in place. r and then just print the head of it. So here we have the month and we have the number of passenger. So here we have the numbers which is basically the difference. Then moving ahead, I have purposely put an actual code of this ADCF test. So ADCF is augmented deifier test. So above I have just applied a simple ADCF function but this is the whole code guys. So you have to perform this whenever you have to determine whether time series is stationary or not. So here I have defined a function which is test stationary and I have performed both the test. I have determined rolling statistics as well as performed dick fuller test. So over here I have used the Windows 12 and then I have plot rolling statistics as well I have performed the dickflow test as well. So let me just run this and I'll just run the function as well. So now if you see you have the original data as blue lines then you have standard deviation in black line and you have rolling mean in red line. So here you can visually notice that there is no such trend or you can say it is much better than what we used to see earlier. So here we have rolling standard deviation and we have rolling mean. Now let me see the ADCF results as well. So here if you notice your p value is relatively less. In earlier cases we used to have 0. 9 something and over here you have p value at 0. 02. Now if you notice your critical value and your test statistics value is almost equal which basically helps you to determine whether your data is stationary or not. So I hope by now you got the idea between the dicker test and the rolling statistics test as to how you can determine whether the data is stationary or not. Next what I have done I have calculated the weighted average of time series. Now why I have done this because we need to see the trend that is present inside the time series. So that is why we have calculate the weighted average of time series. So now let me just run this and you'll get to know why I'm talking about this. So as you can see here as the time series is progressive the average is also progressing towards the higher side. So here your trend is upward and keeps on increasing with respect to time. Moving ahead, let's see another transformation where we have a log scale and then we'll subtract the weighted average from it. So in a previous scenario, we have subtracted simple mean but in this we'll be using weighted mean and then we'll check for stationarity. So here we have just subtracted them and then pass the variable in the test station function that we have just defined it over here. So over here it will go through both of the tests and then it will display the results. So over here I'll just run the cell. So over here you can notice that your standard deviation is quite flat. It is not moving here and there and in fact you can also say that this doesn't have any trend also if you notice the rolling mean it is quite better than the previous one. Now let me just see the results of ADCF test as well. So over here you have a very less value of P that is P is equals to 0. 005. So your TS is again stationary which means that your time series is again stationary. So here you can use both this transformation to check whether your data is stationary or not. So now we know that our data is stationary. Now what we'll do we'll shift the values into time series so that we can use it in the forecasting. So what we have done earlier we have subtracted the value of mean from the actual value. Now what we'll do we'll use the function called as shift to shift all of those values. So here let me just run this plot. So this is how the plot looks like. Now here we have taken a lag of one. So here we have just shift the values by one or you can say different your time series once. So guys if you remember I talked about the ARMA model. So ARMA model has three models in it that is the AR model which stands for auto reggressive. Then we have MA model that is for moving average and I is for the integration. So ARMA model basically takes three parameters and D there stands for the integration part or you can say how many times you have differentiated a time series. So here your value becomes one. Now next what I have done I have simply dropped the nan values. So here if you just run this code you'll see that output is quite flat. So here your null hypothesis or the augmented dicky fuller test wherein we'll take the null hypothesis is rejected and hence we can say that your time series is stationary now. So here we can say that you again have blue as the original data. You have red as your rolling mean and you have black as your standard deviation. So visually also we see that there is no trend present and it's quite flat. So here we can say that your time series is stationary. Now let us see the components of time series. So here you first need to import from stats model. tsa. se seasonal import seasonal decompose. So here your seasonal decompose segregates three components that is trend seasonal and residual. So here what we have done we have simply plotted these graphs and let us see how all these graphs looks like. So let me just run this. So this is how your output look like. This is my original data which we saw that there was a trend. So this is my trend line. So this is going upward in which you can say it's quite linear in nature. Along with that we have seasonality also present in high scale. So we have a seasonality graph over here. And then we have the residuals as well. So residuals are nothing guys the irregularities that is present in your data. So they do not have any shape any size and you cannot find out what is going to happen next. So it's quite irregular in nature. Now what we're going to do we'll check the noise if it's stationary or not. So over here we take the residual and we'll save it in a variable that is decompos data and again I'll just pass it to the same function that we have just created above which is test stationary and inside this test stationary function we have two tests that is rolling statistics and ADCF test. So now let me just run this cell and this is how your graph looks like. So looking at the output visually you can say that this is not stationary. That is why we have to have your moving average parameter in place so that it smoons it out to predict what will happen next. Now we know the value of D. But how can you P and Q that is the value of auto reggressive lags and the value of moving average. So here as I told you guys we need to plot ACF graph and PACF graph. So in order to calculate the values of P, we need to plot PACF graph and in order to calculate the value of Q, we need to calculate ACF graph. So ACF basically refers to your autocorrelation graph and your PSCF stands for partial autocorrelation graph. So in Python we first need to import these two graphs that is from stats model. tsa. stat tools import ACF and PSF. Then using this function ACF and PSF we are just passed in a data set and we have preferred a method that is OS. So there are various methods but we usually prefer OS. So OS is ordinary le square method. Then what we have done we have simply plot ACF graph and we have plotted the PSF graph. So now let me just run this and let's determine how you can calculate P value and Q value. So guys this is my autocorrelation graph and this is my partial autocorrelation graph. Now in order to calculate the P and Q values you need to check that what is the value where the graph cuts off or you can say drops to zero for the first time. So if you look closely you have it touches the confidence level over here. So here if you see your P value is almost around two and similarly if you look at this graph you'll see that it cuts it over here or it drops to zero over here and then the value of Q also becomes two. So this is how you can calculate the value of P and Q using PACF graph and ACF graph. Next we have the value of P. Q and we have the value of D. So what we can do we can simply substitute these values in the ARMA model. So here what I've done I first imported the model ARMA and then using the function ARMA I have the order listed over here. So I have P value as two I have different it so my D value becomes one and my Q value is again two. So here I have just plotted the graph and then calculated the RSS which is the residual sum of squares. So here let me just run this graph. So here you can see the residual sum of squares is quite good that is 1. 02. So here you have plotted the values of PQ and D as 2 and 1. Now you can also play around with these P and Q parameters. Now let's say I want to change the parameters to 21. So if I do that, let me just run this again. So here if you see once I have just changed the value to 21, my RSS score has been increased. So greater the RSS, the better it is for you. Now let me again change it to 012. Now in that case also my RSS has been increased to 1. 4. So here you need to take care of the RSS part. So the greater the RSS the better it is for you. So here we'll just revert back to 212 wherein we have the value of P as 2, Q as two and we have taken only one difference. So the value would be one. Now let's take the moving average model in consideration. So here our P value is zero. Now for our model you have to do 2 1 0. Next for AR model what you can do? you have to do 21 wherein you have the value of Q as zero. So here I have 21 and let me just run it for you. So here you can see that your RSS has again reached 1. 5. Now we have seen that with respect to AR that is your auto reggressive part your RSS is 1. 5. Now if I again go to MA wherein I have the values as 012 the RSS score is 1. 4. So here we conclude that with respect to auto reggressive part we have the RSS as 1. 5 with respect to moving average we have the RSS is 1. 4 and if we combine both of them and make ARMA out of it that is this part that is 212 we have very less RSS. So let me just run this as well. So here when I substitute the values as 212 that is P and Q value is equals to 2 and D we have taken as one. So here ARMA model gives you RSS of 1. 02 and 02 which is quite good. Next what we'll do let's fit them in a combined model that is ARMA. So here we have seen that with respect to AR we have RSS is 1. 5 with respect to MA that is moving average we have RSS as 1. 4 and when we apply the combined model that is ARMA the RSS or you can say the residual sum of square is dropped to 1. 02. So here let's do some fitting on the time series on what data we have. So here we have just converted the fitted values into a series format and then we have just printed the head of it. So over here we have the month as well as the predictions over here. Next what we'll do we'll find it the cumulative sum and then we'll find the and then we're going to have the predictions done for the fitted values. So now to calculate the cumulative sum we have the function called as sum and then again we have just printed the head. So this is my result and finally we're going to have the predictions done for the fitted values and then we have just printed the head of it. Next we also keep in mind that after performing these transformation we also need the exponent of the whole data so that it comes back to the original form from where we have just started using it. So in order to know the values in that form you need to take the exponent of it. So these are the three steps which are very important for data transformation. So you'll be finding cumulative sum. we'll do the predictions and we'll also calculate the exponent of it so as to get your data in your original format. Now after that we just plot the actual values to how our model has fitted. So you can see that the orange line is basically the model that we have fitted and here you can see at only the magnitude is varying whereas the shape has been properly captured by the ARMA model. Now how we can do predictions guys? Now there is a function in Python that is predict. Now before predicting the values let me first see my data that how many rows are there in my data set. So this is my data set name. So here we have the data set from 1949. We have the number of passengers. It will go on till 1960 and we have 144 rows into one column. So here we got to know that we have 144 rows. So what if I want to predict it for next 10 years. So what will be my prediction? Now here you have to see that how many number of data points would you want? So let's say if you want to predict for 10 years. So the number of data points would be 120 that is 12 into 10. So here if you want to predict it for 10 years you have 120. So using that plot. predict function I can actually predict the future. So here using this function I'll give the first index of the time series and then the number of data points you want the time series for. So I have 144 rows plus 120 because I want it for 10 years. So 144 + 120 is equals to 264. So I'll write it over here. Now let me just comment this for now and let me just run it. So over here if you can see my blue is the forecasted value and this gray part is your confidence level. So now whatever happens or however you do the forecasting this value will not exceed the confidence level. So this is how you can see that for the next 10 years you have the prediction somewhat like this. So this is how you can do prediction and if you don't want to see the graph you can actually write in the data point. So here I want the prediction for 10 years. So I have just typed in the steps that is equals to 120 and you get the result in an array format. So that is how you can perform a lot of operations with this data and predict it for let's say 6 months, 12 months, next year, 10 years and it's totally up to you guys. Whatever topics that I've covered, I hope these are clear to you. So now let me just go back to my presentation and let's see what all we are left with. So here we have just built a model wherein we have forecasted the demand for the next 10 years. So in a data set we have the date in the monthly basis and we have the number of passengers. Now we will discuss the key differences between data science and data analytics. The first key difference is data science involves programming, statistics and machine learning. Whereas data analytics involves statistical analysis, data visualization and business intelligence tools. And the next one is the objective of data science is to cover the new patterns and insights through complex algorithms. Whereas in data analytics focuses on descriptive analysis for decision support and process optimization. In data science, it is concerned with a long-term strategical decisions based on future predictions. Whereas in data analytics, it is concerned with the short-term operational decisions based on current insights. Data science deals with the explorations and new innovations whereas data analytics makes a use of existing resources. Data science is utilized in predictive modeling, artificial intelligence and automation. Whereas data analytics is used in business intelligence, reporting and process optimization. And finally, data science deals with the unstructured data whereas data analytics deals with the structured data. Since we got an overview of the key differences between data science and data analytics, let's move on to some essential technical skills required for data science. The first skill is programming skill. You must be familiar with a variety of languages including Python, C or C++, Java and SQL. But Python is being a most commonly required coding language in data science roles. And the next one is statistical analysis and mathematics. Machine learning requires statistical analysis concepts like linear regression and logistical regression as well as data scientist must understand mean mode, median, variance and standard deviation also and should be familiar with the probability distribution over and under sampling and also dimension reduction. And then we have machine learning skill. As a data scientist, it is essential to learn machine learning and deep learning with the advanced models like random forest and also some of the key machine learning algorithms like logistic regression, nave base, decision tree, linear regression and also k nearest sniper. And the next skill required is data wrangling. Data wrangling means cleaning, transforming and organizing the raw data to ensure it is a structured and ready for analysis. This is a foundational skill for effective data science workflows such as pandas, numpy and open refine. And also data scientist must familiar with the database management tools like MongoDB, MySQL and also Oracle. And the final skill is data visualization. After all these skills, not only we need to know how to analyze, organize and categorize the data. But we should also develop data visualization skills. It would help if we were comfortable with the technologies like Tableau, Microsoft Excel and PowerBI. Next, we will see the skills that are required for data analytics. The first skill we required is SQL and database knowledge. SQL is a database coding language used to manage large data sets in relational databases. Proficiency in SQL allows crude operations which means create, read, update and delete and also querying the data from databases. And the next skill is data integration and analysis. Data integration seamlessly combine informations from the various sources resulting in a unified or a comprehensive data set. They extract meaningful insights, uncover patterns and transform the raw data into actionable intelligence. And the next skill is spreadsheet skills. Here Microsoft Excel is a powerful data management tool for its quick analytics and easy data storage. Organizations and employers prefer it for it analysis with the advanced Excel knowledge for data manipulation as well as for visualization. And the next skill is business acumen and domain knowledge. Business acumen refers to understanding organizational goals and strategies. Whereas domain knowledge refers to understanding a specific industry or sector. These abilities enable analyst to align data insights with the business objectives making actionable recommendations and decision making. And the final skill is communication skill. It is one of the important skill which we required for each and every roles in the industry. So data analyst collaborate with the various departments to create a profitable solutions for organizations. Excellent communication skills include active listening, verbal and written communications. So basically data analyst communicate insights, suggest solutions and write clear performance reports. Let's see the career opportunities for data science and some of them are data scientist, machine learning engineering, data engineering and also business intelligence manager. Now we will discuss the career opportunities for data analyst learners and some of them are data analyst, business intelligence analyst, operations analyst and also market research analyst. Coming to the final section of this video, let us look at the five top companies that are hiring for data science. Some of the companies are IBM, Accenture, Amazon, Microsoft and Cloudera. Moving forward, we will see the companies that are hiring for data analyst as well. Some of the companies are Accenture, IBM, Amazon, Oracle and Tableau. Finally, let's see the salary status for both data scientist and data analytics. So first we'll see for data scientist the average annual salary for data scientist in India is 14 lakhs perom whereas in USA it is around $156,000. Next we will see for data analyst in India is 7 lakhs perom whereas in USA it is around $82,000. — So let's start with the questions here directly. We'll in the beginning start to focus on some of the fundamental uh questions which is more for you to understand what is data science than like uh a particular interviewer asking you that question. For instance uh many people wonder what data science is all about. Right? Though there are many online sources and blogs which describes data science in a very nutshell this is what it boils down to right. A person who is very good in understanding the computer algorithms a person who understands the statistics and mathematical ideas and applying these two knowledges from the computer science and mathematics into a particular application right a business application where somebody sees a value coming out from the data. So how do you apply that? So that's how kind of data science approaches. So when you combine these two powerful concepts coming in from computer science and mathematics on a real world application, the sort of outcome that comes from particular data science project goes in a direction where people see a return on investment. Right? So the people you bring in, the technology ideas you are working on, all of it should kind of give you some return on uh the investment you have put in. So that's where industries have started to looking at uh data science. So various subjects which are very important for you to know statistics, computer science, applied mathematics and then subjects like linear algebra, calculus and a few more. Right? So fundamentally from computer science algorithms and data structures might be very useful from the mathematics or statistics and things like calculus, linear algebra, matrix factorization and concepts like that and from the application it is more from your experiences from the industry. If you have worked for retail you know how the business process in retail works right. So people often also ask from the technology end do we need any sort of experiences in Python? language, right? Python or like for example R programming as well. Uh so Python is one of the most uh looked out for kind of programming skills particularly when you want to build solutions in data science domain and with availability of libraries like NumPy, Pandas, Python has now established its ground very strongly in giving a very robust framework for designing data science solutions. Right? And in particular things which it has like listed, dictionaries, tuples and stats are one of its own uh sort of one of its uh capabilities which sets Python in its own league of programming languages suitable for upcoming out with data science solutions. So there are many more other libraries as well apart from this for building machine learning algorithms but these are some common uh libraries that you would normally find uh people using it and also with distributions like Anaconda Python has shown its capabilities even for a production grade uh solutioning right wherein you make sure that all the dependencies that one particular library has for building a data science solution is all in one place so quite a popular programming language age uh in the recent kind of happenings. Although our programming is also equally good uh in terms of producing a quick prototype for most of your modeling tasks but Python is moving itself into a production grade where things can be deployed after the sort of prototype into a production environment and it can face the customers from day one. So that sort of capabilities are coming up with Python. Right? So let's talk something a bit more specific with the data. Right? So when people are doing any sort of data analysis, they normally face something that we know by the name selection bias. Right? So what is actually a selection bias? The fundamental uh place where you start doing a data analysis is by

### Data Science Roadmap [10:11:51]

selecting a representative sample. Right? So that's where we like normally start doing any analysis. So when you like working for a company which has let's say 1 billion records in their databases like 1 billion is like very large number which represents various customer uh customers data or it might also be depending on which feature you are working on and so on. So if you collect all of those it might very easily come to 1 billion records which is nothing but in a structured form number of rows. So with that enormous amount of volume in the data any analysis that you take up might have to have lot of filters like saying I only want to analyze a particular feature in my products let's say right and I only want my customers from XYZ region which is let's say top four region or top five regions so you might like put many features or filters like this but later on u if you would like to do a kind of an analysis which covers most of your customer base Right? There comes the tricky situation of not being able to use the entire volume of 1 billion records. At the same time, you want to do a really good study uh or analysis based on uh what data you have. So in statistics, we normally use this idea of doing a kind of a randomized selection, right? So with this randomized selection, we make sure that out of these 1 billion records, we are choosing a small subset, let's say, of 1 million, which is a true representation of the entire population, right? But what happens with this is while we do this 1 million record selection in a randomized way there are chances that you might have certain bias in the analysis obviously because you are not using the entire population. So selection bias in this particular sort of a characteristics while you are doing a sampling on a large population of data. Very common example for this is uh if you want to do a exit poll analysis of particular election even before the election results are coming up and you have not chosen a representative sample for doing that exit poll analysis right by that you I mean you only have asked some questions for a selective few from the particular constituency and they have a opinion towards one candidate but that doesn't represent the entire opinion of the population in that constituency. So selection bias is like very important to handle and most of the time people employ things like randomized selection or selection sampling techniques like uh stratified sampling and so on. So by that you can minimize this uh selection bias. So these are some very generic questions. So let's start with some sort of statistical questions and also how to deal with different types of data. Okay. So doing any sort of data with a structured information. uh so structured information I mean there are many rows and like uh many columns and it looks more like a table data right so with such a data in place there are like these two different formats one saying the long and the wide so let me like show you an example here you have a record of two customers right and you store just two values uh which is the height and the weight so these are like in columns so with the these two customers these height and weight being a separate column is one format like a long format let's say transform this by having only one particular column which will say attribute right by that I will bring in these two columns as one column by calling that as an attribute and put the values in one column so this sort of format is called the long one. So what really happens is instead of having two separate columns for two of your attributes, you put both of those into one column. And by doing that there are a lot of benefits with respect to the task you have in your hand. Particularly in data visualization, certain data visualizations would need you to not put your attributes as a separate column but rather as one column which can have the attribute names. So which might then go into building your legends. Right? So these kind of techniques are kind of very common formats between like the long and the wide and very frequently will people like deal with both these uh data formats depending on what task they are doing particularly when we are building visualization dashboards. Okay. Talking a bit more uh on the data analysis perspective uh people like uh know that in stats normal distribution uh kind of is that one like the godfather of the distributions. You have many distributions which normally people try to see if uh is present in my data or not. But the moment people kind of see normal distribution coming up in any data things are kind of uh bit easy to understand. And in typical case any distribution any data distribution that you would like to find out given a data for analysis it gives us a lot of characteristics around what the data is about. If I'm let's say analyzing the salaries of the employees in my company, right? I might see that there are some employees who are in that like that thick crust in uh the center where majority of the people are sitting with a moderate level of salary ranges and then there are these sort of extremes on the left and the rights. So you are like very commonly people refer whenever you talk about salaries to a bell curve right and then they start talking about top 25% of the performers in my company the bottom 25 and the middle one which you like are uh sort of the normal performance. So this sort of bell curved distribution is very commonly understood and as well as used in doing many data analysis. So is the other distributions as well but normal distribution has its own significance. So when somebody asks you around uh anything around normal distribution the first thing you should visualize is the symmetrical bell-shaped curve right and the moment you get that bell-shaped curved in your imagination start thinking of certain properties like what is the mean of a normal distribution what is the standard deviation and in particular a special case of normal distribution which we call the standard normal distribution. So in that standard normal distribution we know the mean is exactly zero all the time and the standard deviation is exactly one. Right? So there are different places where normal distributions are kind of used. And if you are comfortable with ideas like central limit theorem or the law of large numbers you might want to relate that as well to normal distribution particularly the central limit theorem right but the idea is a distribution which is symmetrical around the mean. So that's what a normal distribution is. So depending on which variable you are trying to analyze in a given data set whether it could be like an employee salary or your sales in a for a business of yours right or your uh number of uh let's say interactions of the customers on your product. So any variable that you define can have a symmetrical bell-shaped uh distribution which we normally refer by normal distribution. And the moment we understand that something follows a normal distribution, all these properties of that distribution is like revealed. So that's sort of the importance of doing any analysis around the distribution of data and normal distribution like very common one. And in many statistical techniques in even model building exercises, if you have anything in a normal distribution, many other possibilities of applying certain modeling techniques comes out evidently there. And also there are many other modeling techniques in stats and machine learning where there is an fundamental assumption that things should follow a normal distribution. If it is not following then the model is wrong. So there are many use cases of knowing what normal distribution is. But in simple terms it's a symmetrical bell-shaped curve. Right? AB testing. So quite a popular approach people particularly who are working with product right and what happens is u when you are as a company having lot many features inside a product for instance if LinkedIn is a company which has a web page right it has lot of features inside that you have some jobs portal in LinkedIn you have places where you can connect to your professionals uh in the similar industry you can also read about uh the post people are doing in the portal and so on. So there are different places in the website with many features. So what happens is if LinkedIn as a company is looking for some changes for instance changing the entire website's design and aesthetics or changing one particular feature inside their website right so these changes are normally accompanied uh by sort of a process called AB testing. So what happens as an analyst you might be working with LinkedIn and say on a fine day they might come out with a new feature new design or new sort of changes in their website. So you as a person would tell okay this is my framework for testing this new change by defining a metric. So my metric would be in simple terms saying if I change this website from A to B is my number of footprints on the website going to go down or not? So this is my metric and if I successfully establish the fact that after rolling out this new website my

### Data Science vs Data Analytics [10:21:05]

number of customers who are going to visit my website is not going to go down. I can be confident that okay fine this works now roll out this new feature. So in this framework we normally have uh two set of users to identify the particular uh risk associated with uh getting this new feature in into the platform and in which we in a randomized way put one user group and expose them to an older website and another the newer new features right or the new website. And when we compare the results of a particular metric like the number of clicks or the number of purchases and so on, we should be able to see that these two groups uh are either exactly the same or quite different. And if it is on the negative side of the difference, we say the feature is not good. And even if when the difference is not at all there, we say even if we bring this new feature, nothing is going to happen. So this AB testing framework is quite robust in its own um way right and a very common question if you have like worked as data analyst or uh if you are expecting to be like sitting for this data analyst kind of a role uh knowing AB testing framework is very important. Okay. So sometimes also when you do these kind of AB testing sort of analysis uh people normally come out with sort of saying uh what should be the sample size of my uh users whom I should be getting to participate in my AB testing uh framework or also when you are building some models you might uh see that there are certain statistical measures which has to be evaluated by the end of model building exercise like if you're building a machine learning model let's say and you want to see if the those metrics on which I'm evaluating are really good or not right so in that sensitivity is one of those uh methods or like metrics which we normally evaluate and I'm going to show you some uh something that we normally refer by the name uh confusion matrix. So I'll spend some time explaining this and then come to what do we mean by like sensitivity. So let's say you are building a model right a model for predicting whether a particular customer is going to purchase from my platform within 1 month or not right so very simple uh problem statement which might include many variables that we would bring in and then finally build this model saying okay this is my final model which says with 90% accuracy whether my customer is going to buy from my uh platform let's say an online e-commerce platform within next month or not. So this is my model. So now as you without going into the details of the model, let's assume that after you build the model, you have got the results. So while we analyze and evaluate what the result is all about, we might come out with a confusion matrix. So what it says? So obviously when you are building a model which follows a supervised way of learning, you will see from the historical data after people have purchased in the platform within next 1 month whether they are buying or not. So I can obviously create a really good uh training data set right which will contain the information after people let's do the first transaction with me in next one month whether they are buying the next product or not. So with that data set I train my model and uh the way confusion matrix puts that is by saying my actual data says something and you have predicted something right. So let's kind of get into the details of that. So this particular box which [snorts] says my actual data says the customer will buy and you are also predicting the same. So this we call the true positive. The prediction is actually true right in a positive direction saying uh the purchase happened. Now like move or diagonally opposite to this TP which is this TN which is true negative which means the prediction of uh your model saying the customer will not buy is actually matching with the actual data as well. Right? So both of these values the true positive and the true negatives for the two cases is the right predictions from your model. But consider the off diagonal elements the false negative and the false positive. In these two cases, this is sort of an error. Why? Because your actual data says the customer is not going to buy in this case, let's say, but your prediction says he's going to buy. So, the prediction is actually positive whereas uh the actual data is on the negative side. So, this is a false prediction, right? So, you have your uh in this case your FP which is uh the false positive cases. And on the off diagonal element if you look at this one which is FN which is your false negative for the cases where you are predicting the customer is not going to buy but in the actual data it says that customer actually has brought the product. So in this case the model is wrong. So the type one and type two errors needs to be taken care of when you are building any machine learning model. If these errors are low then your model is going to move towards that 100% accuracy mark. But normally any machine learning model has its own limitations. So particularly there is one metric which we refer by calling sensitivity. So what happens is these true positives and true negatives needs to be controlled. Right? If my model is very good and true positive ones like the positive cases of when the customer is buying but it is doing very bad job when the customer is not buying the cases in which then the model has some sort of issues right. it is doing only good in one place but doing very bad on the other case. So I need to find out that by some metric. So sensitivity help us to find that out which is in simple terms as a ratio between the true positive in the denominator we have all the cases of positive predictions. So imagine now if the type one error is going to grow my sensitivity is going to come down. So if my true positives are like very high the sensitivity will also be high. So this is sort of what we call uh the statistical power. If this sensitivity is really good uh I would say that my positive cases are predicted well and the exact opposite of the sensitivity is what we know by specificity. Uh so we need to make sure that in a very good machineing model sensitivity and specificity both are balanced. Right? So in very simple terms this is what will be like mean here the ratio of true positive by the total positive events there and as I mentioned both of these uh sensitivity and specificity play an good role when you want to evaluate a model's output right and one more common problem so these questions might be immediately following one another when people ask you about uh sensitivity and specificity as you know it's about the machine learning model's output right when the model is done you understand whether the model is good or not. So in those cases we also come across some uh kind of issues like overfitting and underfitting given machine learning model. Right? So these words are very common and the idea is depending on the complexity of your model you might see that you want to adapt very sort of exactly to your data points or you might want to do a generalization. So for instance here if I have these uh red and uh blue dots here right and if I draw a curve like this which separates the red from the blue and when this separation happens I'm building actually a classifier using some sort of modeling technique but now imagine by drawing a smooth curve like the one which is uh given in black you might be overgeneralizing it right by which I mean there might be some red dots on the other side of this boundary. You can obviously see that. But the moment I am kind of a bit more flexible by drawing this green boundary which actually covers all that issues which is coming out with these red dots on the other side of the boundary. So this green boundary has like taken care of that. But the problem when we are building any model right the idea is you need to generalize to the pattern found in the data. So if you don't do that generalization well you are underfitting but if you do that generalization like too specific you are overfitting. So this curve might be represented by some polomial right but that zigzag kind of a polomial might be bit more complex than a smooth curve like the one which is shown in the black. So you need to be very careful when you are building a model particularly in the cases of uh regression models where it is represented by a line and a polomial you need to make sure that the polomial is not so complex at the same time not so simple then you will either end up in an overfitting situation versus an underfitting situation. So we need to have a good balance between these two right. So in summary when kind of statistical questions comes in it would mostly covering things like uh some basic statistical properties as you would be like very aware of like things like standard deviation averages how to interpret median how to interpret quartiles right the first quartile second quartile and so on and how what do you mean by percentiles right these are some basic questions a bit more complex in nature might be discussions around sensitivity overfitting, underfitting. These are like statistical ideas. So you want to prepare maybe from a basics level using the properties like standard deviation, mean and so on till things like overfitting, u underfitting, statistical, the sensitivity and specificity kind of ideas. So that will like make your ground a bit more stronger when you are going for the interviews. And these are at least the bare minimum for you to understand in the statistical concepts, right? Even anything less than that you might like face some difficulties in the interview. Okay. But let's also talk about now uh

### Data Science for Non Programmers [10:31:05]

questions which are related a bit more uh on data analysis. So let's see how what kind of data analysis questions which might pop up in the interview. Okay. So some generic questions like this people normally do analysis on structured data which is in rows and columns. But there might be cases when uh the data is not so well structured and those places the data might be textual for instance uh in Twitter right if you doing any sort of algorithms like sentiment analysis quite commonly known uh algorithms so in that case the sentiment analysis could be for a brand for a election campaign or maybe something else around your product features and so on. Uh so text analytics in its own really large domain and in Python as well as R there are number of uh libraries. So in particular R has libraries like TM right the text mining package. Python as well we have packages like pandas packages like the numpy ones right and also packages like NLTK which is built only for natural language processing. So it can deal with many different sort of text mining approaches or text analytics approaches. So in comparison if you talk about as I said the robustness in Python is much more than in R. But in terms of features both are powerful enough with the libraries and uh packages that it offers. Right? One of the fundamental uh sort of starting place when you do any analysis. So when you are given a data set and you are asked to do some sort of basic analysis of what that data tells you maybe typically of questions like I am in a retail business and my sales in a particular region is going down. So this is an analysis that is expected out of you and you need to dig through in understanding what really is the problem in the sales going down. So if this is sort of the data this which is given to you, you might want to first look at the transactional data which is present in the system. Then you might also want to go to uh outside of your network. Maybe you might get the uh sentiments of your customers from social media platforms and so on. So there will be different sources of data that you will collect. But often times uh collecting the data is not only the task right and not like only building a model or doing statistical analysis might like come very later in the stage. But what comes before that after you have collected the data is to make sure that the integrity of the data is maintained you get rid of all the unwanted noise from the data and then finally prepare the data for doing the sort of modeling exercise or doing descriptive analytics on top of it. So this cleaning and understanding the data doing lot of explorations with plot in essence takes close to 70 to 80% of your time in any data analysis task. So if a company maintains the data in a very well structured way uh this kind of heavy time which we spend on data analysis might be red or the data cleaning might be reduced otherwise you need to like take this up for any new project that you take up which for which data is not available in prior or you don't have uh like any pipeline which do these this cleaning you have to write it down of your own. So very important if you don't do the cleaning part and understanding the data well the analysis or the models that you build uh might end up giving you a very bad uh performance right so very important as I said 80% of the time people normally spend on this uh task right and often times when you are analyzing things like the example I told my sales are going down what do I do it is not possible to come out with such answers to complex problems like this with just one variable, right? So you might also want sometimes to move beyond one variable and talk about let's say how to do multivariate or biariate kind of analysis. So often times this question comes up uh where you like asked to distinguish between this univariate, biariate and multivariate analysis. And the idea is very simple in any sort of analysis. uh it is not only one variable which kind of decides uh the end output of your analysis but there are multiple factors involved. So when involved you might also want to look at things like correlation. There are multiple variables you want to see if there is any correlation between these things. Sales are going down but because of what is it because my sales representatives are not going to the market or is it my products are bad or is there some other reasons? So with all the variables in one place, you might want to go and dig deeper to see if there is any relationships coming in the variables or not. And when we collectively get all these variables together and do some uh sort of a coherent analysis around the problem, you come out with a really crisp answers to what you are trying to analyze, right? And moving on, there is also times when people uh do some sort of grouping, right, with the data. You do a sampling, right? you get a data set in your system or in whichever servers you're doing the analysis but in that there might be lot many number of times when uh even the randomized sampling of getting the true representative from the population like that might not work well right so in those cases you might want to do some sort of systematic sampling or uh maybe a cluster based sampling as well wherein you might decide to say I want to analyze the issue with only five regions in my mind and with the five regions I'm going to form different clusters or in the systematic sampling you might also want to say uh that with the five regions that I have got I might want to analyze uh only for one product right which is not uh doing that good in the sales so these kind of sampling techniques like the cluster based one or the systematic sampling techniques and there are different names for this people might be able to give a very good uh interpretation of what really went wrong in whichever ever sort of analysis you're doing. So one example is like sales going down but you can adapt this to other analysis as well. But the idea is instead of doing a randomized sampling by which we are not very sure which kind of data is coming in uh the data set which we want to use for analysis. But if you do it in a cluster basis a cluster or a systematic sampling you know exactly which clusters or which sort of uh regions in this example you are like analyzing and in your end of the analysis you'll be very able to say uh this is not like a randomized sample that I have taken but from these five regions. So there are many different ways of uh doing clustering uh cluster or sort of the systematic sampling which kind of helps in this particular final end results of your anal to put your end results in the right perspectives right instead of doing a randomized sampling. Okay, one more uh quite a useful sort of an idea kind of widely borrowed from the field of linear algebra and this is a bit related to what we earlier saw between moving from one variable to multiple variables, right? An e value and egon vectors kind of a concept borrowed from linear algebra helps us to bring in some way a linear combination of different variables together. For instance, in some complex analysis, it might happen so that given a data set, it might have many columns, right? Let's assume you have uh a data set with 1 million rows and let's assume 10,000 columns. So, and these 10,000 columns are some features. There are complex problems like that. But in most often, like most of the time, not all the 10,000 variables are useful, right? The input variables. So what we can do is we might want to transform this data set in a lower dimensional space by which we mean this 10,000 columns can be reduced to let's say only 100 columns right so value and egon vectors are these ideas which helps us in this transformation and the idea is can this 100 variables be represented as some sort of linear combinations of the 10,000 variables and if I'm able to do that my dimensionality is reduced the time I take to do the analysis is kind of also reduced and uh the representability which will come with only 100 variables will go up. Right? So quite a powerful idea. Egan values and the EGEN vectors and as I said the EGN vectors is kind of that linear combinations of many uh kind of variables there and uh this calculations around uh EG vectors normally happens for a correlation or a coariance kind of a matrix which as you know the measure correlation is also about uh how two variables are related or how strongly two variables are correlated Right? So that's why we are also saying uh this egon vectors can help us to compress the sort of data that we have. Right? And that's because of one egon vector can be representing a 100 column 100 variables together. Right? So that's sort of how it works. quite a powerful idea and commonly used methods for reducing the dimensions of an large data set like the PCA principal component analysis is actually based on value and igon vectors. So if somebody asks you about igon value and vectors in an interview also talk about the PCA principal component analysis which is actually based on these two concepts. So that gives them a good idea to the interview view like you know about values and vectors and you are also able to think of its application like in PCA right so we talked about this false positive and the false negative cases in our confusion matrix example so this is exactly the same we also talked about the type one and type two error okay so but let's now also drill further and say examples or kind of scenarios when The false positives are important in scenarios where the false negatives are sort of important and the by the term importance we mean are we like allowed to do this mistake if you are building a machine learning model are we even allowed to do a mistake on either of the cases the positive or the negatives so for instance here if I take an example in a medical uh domain where we have let's say a process called chemotherapy which is uh normally given to cancer patient patients which is a radioactive kind of a therapy which kills the cancerous cells right so it is very focused sort of a therapy on the cancerous uh cells so what will happen if you like predict let's say you are building a model for detecting cancers right given a CT image and this model would obviously not be 100% correct all machine learning model has its limitations but you are here required to predict this whether a patient has cancer or not and based on that radiologist might decide that whether the chemotherapy is right for this patient or not. But imagine now if you have predicted somebody to be positive for cancer but the patient is actually not having uh the cancer cells there right so in those cases you might end up saying let's go ahead with the chemotherapy but the side effects of chemotherapy are like very adverse right because you're giving these therapies on the healthy cells if the patient is not having the cancers so in these cases sort of the false positives becomes a bit more important so it will absolutely fine if your model says the patient doesn't have cancer if even if there is like this slight possibility of cancer present in the cells of the patients but in that case you are not like exposing the patient with a chemotherapy there right which is like more harmful than saying that the patient doesn't have the cancer right so in these cases the false negative though the false negative itself is not so good in this case but at least with this particular example the false positive gets the importance than the false negative but both are bad as you know right from the confusion matrix discussions. So in simple terms it is better to not expose the patient with a false positive with a treatment like this chemotherapy like treatment then it is like much better than saying you don't have cancer. Okay and a very similar example uh in some other context might also come up. So if you would like to think of some other examples in the same context. Okay. So where is the other case now which is the false negative right so we talked about the false positives importance but uh there might be also cases where false negative might become a bit more important there and for examples uh like this if you are let's say building a model where you want to convict a particular criminal bases all the records and the arguments which has happened in the code right and let's see what would happen if you make a criminal go free right because your model says it's false negative the though the person is actually a criminal but because your model predicted based on all the evidences you had the person is not a criminal so you're letting a criminal like walk free in society so that kind of is more harmful than convicting that person and maybe for a prolonged period you might also want to get more evidences and build a stronger case. So it is fine to keep a suspect in behind the uh bars for a longer period than letting the suspect go free there when we like know that there it might be a case of a criminal going free from the judicial systems there right so in these cases the other case becomes more important so keep in mind it is very easy to get yourself confused between this false negative and false positive but if you keep an example in mind always and like don't give any room for confusion there though it is a confusion matrix based on which these two ideas comes in you will be able at every time put these examples in front and talk from that. So if you start to explain what false negative is in terms of the formula, you might get confused. But if you take an example and then explain things are much more clear for you as well as the person who is hearing that in the interview, right? And in cases when both are important typically the one which uh relates to banking industry you are building a model which will decide whether to give a loan to a person or not bases many of the input attributes that around which you have like collected the application from the customer. But here you say if the customer is really good and you are missing the opportunity there of not giving the loan versus the customer is really bad in terms of its his or her credit history and you are giving the loan. In both the cases in one case you're losing the business in the other cases you're taking a risk in which you will lose your money. Right? So in this case and in this example both kind of has an equal role. So if your positive or false negatives are in either of these is high you will end up uh losing some chunk of your money there. Keep these three examples in mind and every time uh you get to like hear false positive false negatives things should not be like confusing at all. Okay. So now let's also talk about building a machine learning model. So far we have discussed about what happens after building the model right. But let's step one uh like get one step back and see how do we normally build a machine learning model and what kind of processes we normally follow. So when we are like building a machine learning model we know that we need to given a data set we need to divide it into different buckets or different parts. So the commonly known uh divisions that we like have is your training data right then we also divide the data into something called test data and sometimes people also divide uh or keep one portion of your large data set which is called the validation data. So often times people confuse between the test data and the validation data. Right? So what happens is in the training process there are certain models wherein while you are training you will use the training data obviously but in the process of training you can also involve something like a validation step right which will make sure that during the process there is one part dedicatedly given for the validation of the model and when the model is done you might see that the final model is well trained on the data at the same time validated but when the model is completely done then only you get into uh a process that we call testing right so you can imagine like this you have a data of thousand uh records you keep some uh 700 records for training 100 record for uh validation and the remaining 200 records for testing so there are three splits right if I would like to explain this with an example this is how it will happen so there is a process called k-fold cross validation in which k can be any number uh like between five and 10. Mostly there are like standards saying five-fold cross validation or 10-fold cross validation. And the idea is when you are building your model, you will work with a training set right and a validation set in which a small portion of the data you keep for validation and the rest you use for training. So what you see here test set can be like replaced with a validation set right and you can see that this is a rolling uh sort of subset and you keep changing it in each fold. So you go for the first fold of the iteration you keep one validation set and the rest of it is the training set and the next fold you move this window to another subset and the rest is training data and so on. And when this model is done using this k-fold cross validation approach in the end you will get a model which you can then use on the testing data to see if the accuracies are good or not. So this kind of brings in a lot of performance improvements. People have also found validation set to be an really good uh way of tuning the parameters. In many machine learning models as you might know there are something called parameters typically in the neural network models. uh and these parameters need to be tuned as the model is kind of proceeding. So we cannot use the testing data set for tuning this parameters. So validation set kind of comes very handy in those cases. Right? So we [snorts] just talked about the cross validation. So when you keep moving this validation set in each of the fold first fold, second fold, third fold and the validation set is keep changing you kind of do this process of cross validation. Right? And the idea behind uh doing cross validation is uh is to see how well is your uh final model generalizes uh to the data that you have. So independent of which data you use for training your model should generalize uh because often times it happens when you train your machine learning model it works very good in the training part but when it comes to the testing the model does very bad. The same problem with the overfitting and the uh underfitting cases. So in this approach of cross validation you have made sure that your train model has trained on various subsets of the data right and in the process we also have this small validation set so that every stage of the cross validation process you're able to use a different subset. So that means you have trained your model very well. So irrespective of which data you use your model is going to do well in the testing cases. So that's how cross validation brings in uh the capabilities. Okay. So with the two pillars like the statistical analysis and the basic uh data analysis, I hope you are getting some sense of how people ask uh a particular question coming from either the model building perspective or from uh normal data analysis perspective. Right? So we're going to know now go a bit more deeper into questions which might relate directly to machine learning. So these are all the most questions that you have seen so far are either saying how do we do analysis after the model is built or how do we normally perform uh simple data analysis like AB testing frameworks and so on. But what if you are asked something very particularly uh from a machine learning domain right? So the next uh set of questions will cover those part like people might also start with the basics like what do you mean by machine learning? So the idea I think must be very clear you are given a set of uh data points particular to a given uh domain right and you would like to build a learning algorithm which will take the historical data and predict something for the future. So as we talked about in many examples finding whether a convict is actually convict or not like predicting if given the evidences if person is a convict there or not or predicting whether we should be giving a loan to a customer or not right predicting the onset of cancer in a given patient's by using the given patients historical records and so on and now this algorithms are now even becoming more complex like it is starting to work on speech data face data which are mostly used for biometric authentication systems. Right? So many use cases coming up from various industries. Right? So in machine learning the most uh commonly used two types of learning the supervised and the unsupervised learning. There are like two other types as well. The semi-supervised and the reinforcement learning. And the idea is around if you are given a set of input attributes do you have a label which can help to learn the input attributes around any data points that you have or you don't have. Right? So if you have then the approach could be a supervised learning approach versus if you don't have then the approach is more an unsupervised. So some examples of the algorithms are like support vector machine regression a base decision trees all of these are the supervised learning algorithms. So in a very simple terms if I say you are given an input attributes for identifying let's assume a sort of given an image of different fruits based on the characteristics of the fruit can you like identify whether it is apples bananas or the oranges so if I'm given that label with me the model will run keeping in mind that given these characteristics this is an apple this is an orange and this is a banana but in the other case if you go for a cluster string approach where such a label of saying that it is a apple, banana and orange is not available. Then we might just simply segregate the data points with the input features maybe with color with texture or with uh the shape right with that we can maybe say that particular fruit with this shape is like uh you know into a bucket which we can like call banana or another kind of a spherical shape it might be apple or an orange. So depending on the presence of a label uh we can say either to use supervised or an unsupervised learning and both of these approaches are quite common and sometimes there are certain algorithms which can have the both ways. It can also learn in an unsupervised manner and the supervised manner. So depending on how you model the problem the fundamental difference comes from the fact on whether we have the label or not. Okay. So when we talk about the supervised learning algorithms other name for supervis kind of one of the types in the supervised learning algorithms is uh classification right and the classification is around given a set of input attributes and the label is uh like categories right for instance if it is fruit bananas apple and oranges like if it is uh a customer whom we want to predict into either kind of a defaulter type or a good customer. So the classes are two like one which is saying the customer is a is going to be a defaulter. The other says the customer is going to be a good customer. Same is true for uh when you want to build an classification algorithm for detecting cancer whether the patient has a cancer or not has a cancer right or if you want to detect let's say a malicious content or a malicious file which might be a virus chosen or uh warm or something else. Right? So in that case the classes are now many. So more than one class can also be there. But the fundamental idea is we are following a supervised learning algorithm but the type of problem we are solving is a classification problem. So instead of saying classification algorithm, we can also say it's a classification problem using supervised learning algorithm. So these are the uh various types of classification algorithms like the linear regression, decision tree, then you have the support vector machines and so on. Right? So now let's talk about one of the types of classification algorithm called the logistic regression. Right? So very commonly used algorithms and banks or companies as big as like American Express sort of have leveraged the logistic regression algorithms to quite a extent and they have like built a really robust uh implementations of these algorithms particularly in banking sector cases like predicting whether a customer is going to be a defaulter or not given if I issue the customer a credit card or give a loan right these kind of decisions can very robustly be taken from a logistic regression algorithm and these algorithms are best suited for two class problems or a binary problem right where you have either yes or no quite a common technique as I mentioned and in all the possible cases wherever you have these binary classes of problems you might use a logistic regression a political leader winning a collection or not somebody getting a success in an examination or not and as I mentioned uh whether to give a loan to a customer basis whether he or she is going to become a default or not and many of these like binary types. So keep in mind logistic regression works best for classification problems with two class. So one of another widely used uh algorithms like the recommener systems and uh like I think this particular algorithm doesn't need introduction. So this is that common nowadays. uh take an example in Amazon you have a product which you are browsing the products which comes in bottom of the widget which says you may also like or customers who brought this also brought this right so these sort of uh recommendations is actually from uh a recommener system which is running in the back end if you like take YouTube example if you watch one video the next videos like starts to come one after the other right that again is a recommener uh system working in behind the scene or if you take Netflix if you watch a particular movie, it starts to adapt right to a movie which you might like. Netflix also uses recommener systems. The applications are like coming uh more and more uh as sophisticated systems are building in right. Facebook uses it for uh recommending friends, right? You have a set of friends and based on your uh data coming from the contact mail list, Facebook starts to curate friend suggestions, right? And all of these are uh algorithms which might benefit the business in some way or the other. For Amazon, it is if you give a recommendation below a page, people might buy more than one product in a transaction. For Facebook, uh they will grow their network of people, right? Their connection between the uh users are going to go stronger and hence obviously the kind of ads that Kufu is Facebook wants to sell will also starts to grow, right? and more the users more the connections more is the sort of interactions you know about the connections as well as the behaviors that people show in a social network. So the fundamental idea behind all these recommener systems is to get a meaningful uh comparison between two users or between two items right for Amazon between uh any two products what's the similarity if the similarity is really high recommend that product to the for a given uh product in uh

### Data Science Interview Questions [10:59:00]

consideration or if you find that two users are very similar in let's say Facebook you might want to show to each other that you have another friend whom you might want to connect right so there are like many such uh use cases which comes out the moment you get into the deeper understanding of recommended system but the fundamental idea is how do I compare two items the items might be product people movies and so on and how do I compare two users in simple terms so this is what a recommendation system works on so there are quite famous examples like the collaborative filtering approaches user based collaborative filtering algorithms or the item both of the algorithms are quite commonly used in recommener systems. And nowadays people have also moved on to Latin factor-based models like the SVD, singular value decompositions and many others. — And with this we have come to an end to this full course on data science. If you enjoyed listening to this full course, please be kind enough to like it and you can comment on any of your doubts and queries. We will reply to them at the earliest. And do look up for more videos and playlist and subscribe to Idua's YouTube channel to learn more. Thank you for watching and happy learning.
