Logistic Regression (and why it's different from Linear Regression)

Logistic Regression (and why it's different from Linear Regression)

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI

Оглавление (2 сегментов)

Segment 1 (00:00 - 05:00)

Hello everyone. In this video, we will see how logistic regression can be used for classification tasks in machine learning. Things like classifying a picture as a dog or as a cat. We'll explore how logistic regression is different from linear regression. What the intuition behind it really is and how to use it in practice to make predictions for classification tasks. Make sure to like and subscribe if you like these kind of videos and let's get started. Let's start with this simple task. Based on how many hours a student has studied for the final exam, we want to predict whether that student has passed or failed. For example, can we predict whether this student who has studied for 7 hours is going to pass or fail? Like any other supervised learning task, we need training data to learn from. Say we have records from previous years of students. We know how much they studied and whether they passed or not the final exam. And as a side note, I know many of you watch these videos during exam season, so I'm really sorry if this example hits a little too close to home. Anyway, if we use linear regression here, we will predict the output as a linear function of our features plus possibly a bias or constant term. And here we have just one feature which is the number of hours studied. But we can easily consider more features. For example, the number of cups consumed by each student in the week before the final exam. Linear regression searches for the coefficients that will make this line be as close as possible to our training data points. And we measure how close we are to a particular data point by taking the difference between the outcome recorded in our training data set and our model's prediction and take the square. If you look at the average of this distance over all data points, we get a loss function that is called the mean squared error. And linear regression finds the coefficients that minimize this loss function. And if you're curious, there is a closed form formula for that. Once that's done, we can see what outcome linear regression predicts for our student that has studied for 7 hours. One glaring issue you might have noticed here is that linear regression can output any real number. For example, here it outputs 1. 1. even though our target is a binary yes or no answer. And that's a problem. I mean, sure, we might treat zero as fail and one as pass, but what do we do with something like 1. 3 or minus. 3? Those don't really make sense for classification. And this is where logistic regression comes in. Logistic regression still uses a linear combination of the input features to make its predictions just like linear regression. But instead of returning that result directly, it squashes it to fit between zero and one. It does this by using the sigmoid function, a smooth shaped curved that maps any number to a value between 0 and one. And now we can interpret any output we get as a probability. Zero means 0% chances of passing. One means 100% chances of passing and something in between like. 3 means 30% chances of passing. We will see that this probability intuition is important to understand how logistic regression works internally. And now that we have learned about the shape of the function used by logistic regression to make its predictions, we can just compute the mean squared error loss function and minimize it. Right? While this is a perfectly valid strategy, it turns out that when the thing we are trying to predict is a probability of an outcome, there is another loss function that is more adapted. Let's see how. Say you have two biased coins. For the first coin, you predict that it gives head 20% of the time, but it turns out the real probability of head is only 10%. For another coin, you predict that it gives head 60% of the time, but it turns out that the real probability is only 30%. Mean squared error will penalize you for the second prediction more than the first because you were off by 30% in your second prediction, while in the first prediction, you were off by only 10%. But intuitively, we want to say that in both cases, we were equally wrong because we overshoot by a factor of two. So, we should be penalized equally. For this reason, instead of looking at probabilities directly, we actually want to take their logarithm. So for example, if the real probability is P and you predict 2 * P, then we take the log and look at the difference, which by the rules of logarithm will give log 2. And if you overshoot by 1. 5, you get penalized by log of 1. 5. And this no matter what the real probability is. So log space seems to match our intuition about probabilities better. So what does this mean for the loss function of logistic regression? Recall that we have training data of multiple students and for each one of them we

Segment 2 (05:00 - 08:00)

know whether they passed or failed. Now for any given value of the coefficients and bias term of our logistic regression model we can compute the probability of passing the exam for each one of these students. Let's call this probability P. Now of course some students passed and some students failed the exam. For the ones who passed, we look at the probability P. And for the ones who failed, we look at 1 minus P. By abusing notation a little bit and using these two expressions as boolean variables, we can write down this quantity, which is just a convenient way of representing what our logistic regression model thinks the probability of the real outcome for each student is. The larger this number is, the better our logistic regression model is at making predictions. We now take the log of this probability since that's the natural way to look at probabilities and sum over all training data points or students. And since we usually think of loss functions as something that we want to minimize rather than maximize, let's add a negative sign in front. This loss function is called cross entropy. And logistic regression aims to make this loss function as small as possible, which in turn makes the sigmoid curve as close to the training data points as possible. A nice property of this cross entropy is that if we ever predict 0% for an outcome that actually happens, that results in an infinite loss. For example, if a student passed the exam, but our model predicts 0% chance of passing for that student, then the loss function will be plus infinity. This incentivizes our model not to be overconfident in its answers. Unfortunately, unlike in linear regression, there isn't a closed form formula for computing the best coefficients for logistic regression. We usually use something like gradient descent that will adjust the coefficients in small steps, each time reducing the loss just a little bit until we converge to a good solution. We will not go into the details of all of that here, but instead we will see how to use logistic regression in Python. In fact, you can train a logistic regression model in just a few lines of code. You import the scikitlearn library, declare your model, your data, and your labels, and call the fit function to train your logistic regression model. Once the model is trained, we can use it to classify new data points. For example, we can pass in the data for this poor student that has been studying for 12 hours and consumed five cups of coffee and it will return a number between 0 and one which represents the probability of passing the exam. And as one might expect for this hardworking student, this number is very close to one. So to recap, logistic regression is a simple but powerful tool for binary classification. It models probabilities using a linear function of features and passes them through a sigmoid function. It adjusts the weights through gradient descent to make accurate predictions. And now you know the intuition, the math, and even the code behind logistic regression. That's logistic regression by Visually Explained. Thanks for watching and see you next time.

Другие видео автора — Visually Explained

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник