Regularization in a Neural Network | Dealing with overfitting
11:40

Regularization in a Neural Network | Dealing with overfitting

AssemblyAI 20.11.2021 111 017 просмотров 4 066 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
We're back with another deep learning explained series videos. In this video, we will learn about regularization. Regularization is a common technique that is used to deal with overfitting. But how it works and why it helps with overfitting is sometimes hard to understand. Get your free speech-to-text API token 👇 https://www.assemblyai.com/?utm_source=youtube&utm_medium=referral&utm_campaign=yt_mis_6 We go over the techniques of regularization such as L1, L2 and Dropout regularization, learn the underlying logic of regularization and understand the connection between this technique and neural networks. 00:00 Introduction 00:35 The purpose of regularization 02:54 How regularization works 05:01 L1 and L2 regularization 07:29 Dropout regularization 09:13 Early-stopping 10:03 Data augmentation 11:18 Get your Free AssemblyAI API link now!

Оглавление (8 сегментов)

Introduction

what do you do if your model overfits well regularization of course but regularization can be a little bit of a complicated topic so in this video we will talk about regularization what regularization is how and why it works for neural networks and we will go into details of some of the regularization techniques like l1 l2 and dropout regularization this video is part of the deep learning explained series by assembly ai is a company that is making a state-of-the-art speech to text api if you also want to get a free api token use the link in the description why do we use regularization we use

The purpose of regularization

why do we use regularization for overfitting we use regularization to fix overfitting in our neural networks or other machine learning models so what is overfitting is when your model really closely fits the training data so much so that it is not able to generalize really well to the real world so if you look at this example if we have only one input and output it overfitted model would look something like this it really closely follows all the input values and believes that this perfectly summarizes the real world whereas that's not really the case of course how can you understand if your model has overfitting or not well we do that by looking at the difference between validation loss and training loss we will look at as we train our model more and more if as we use more epochs after some time validation loss will start getting higher whereas the training losses keeps getting lower and we know from that point on we started to overfit because our model learns more closely what is being represented in the training data set whereas it is not able to really understand what's going on with the validation data set or generally in the real world it is not really able to understand the pattern and the data well anymore but how can regularization solve this so let's take one step back and understand what happens when your model is overfitting it means that your model has high variance let's remember what variance is it is basically how much your predictions will change on your model if the training data that you're training your model with changes a little bit so that is how sensitive is your model on the training data that you're giving it to because if it's more sensitive it means that it is overfitting but what causes this high variance well one thing that causes high variance is the flexibility of the models because the more parameters that you have in your model the more flexibility the more things that you can tweak in your model it is more likely that you will have high variance some examples of models that have high variance or high flexibility and thus high variance are random forests or neural networks for example and that's exactly where regularization comes in regularization is a way to limit this flexibility that the model has in hopes to avoid overfitting this could be done

How regularization works

in a bunch of different ways but for neural networks specifically this means to lower the weights of the network you might be asking of course what does high weights have anything to do with overfitting well let's look into that now so let's take this example let's say we have this neural network it has one hidden layer and it has four neurons and in all of these connecting connections between the neurons we have weights what does those weights mean what is what are they signifying so when you think about it these weights are basically the importance of this the output of the certain neurons so if it's right after the input layer it means the importance of this input if it's in between in the hidden layers it means the importance of the output of this neuron and let's look back at an example of what it looks like when we are overfitting if we have only one input and only one output what happens is the model that we're training the function really closely follows what the input values are so what that means is that when you're overfitting you're exaggerating the importance of a certain input or for certain data point and how can you do that in a neural network if you have a really high weight for a certain input or a certain data point that would mean that you're exaggerating the importance of that data point so how can you avoid this if you lower the weights so that's why while we're fighting with overfitting in neural networks what we aim to do with regularization is to lower the weights that we have in the neural network let's see what type of solutions we have in the regularization so there are two types of regularization techniques the first one is as we talked about constraining the model the constraining the flexibility of the model or you could call it then we have fewer degrees of freedom and the options to do that is either using l1 or l2 regularization dropout regularization or early stopping that's another option that you can use another thing you can use to avoid overfitting and kind of it's in the also the bag of regularization is using data augmentation so let's go one by one on these techniques and see how they work

L1 and L2 regularization

the idea behind l1 and l2 regularization is to adding the weights into the loss calculation or the cost function to punish the network for having higher weights they just have slightly different approaches of how they do this so when you're using l1 regularization you can also see it mentioned as l1 norm or lasso all over the internet or in the sources that you're looking into you are adding the sum of the absolute values of the weights to the loss so it doesn't matter if your weights are negative or positive you sum them all up using the absolute values and then you add it to the cost and what happens is as a result l1 regularization encourages the network to have weights as low as zero and what happens when you that might mean that some of the outputs will not even be considered in the network calculation so it might look like this even though you used to have four neurons in this hidden layer because this connection is now zero this neuron is basically not really looked into yet so as a result you might end up with a more sparse network ltl regularization is a little bit more different again you might see it mentioned as ridge regression or weight decay all over the sources that you might find and here what we do is we get the sum again of all the weights but they're squared values so again you're basically making the negative ones positive but at the same time you're exaggerating the effects of the higher values compared to the lower values and as a result you're going to have differences in the weights of the network but you're not really going to have a sparse network at the end of l2 regularization both of these techniques l1 and l2 regularization have a parameter called alpha that you need to tune what alpha does is determines how much attention to pay to this penalty so for l1 it was just the sum of all the weights the absolute values of weights for l2 it was the squared values of the weight sum of so this alpha parameter tells us how much attention to pay to this penalty if you make the penalty too strong then you're going to be regularizing a little bit too hard and it is likely that your model might underfit your model while you're trying to avoid overfit you might get caught in underfitting if you make the penalty too weak though then your model will be allowed to overfit the data because you're not really penalizing it enough so this is one parameter that you need to tune while you're using regularization for on your network then we have dropout regularization what we

Dropout regularization

do here is in every training step each neuron has a chance of p of being inactive a probability and that is called the dropout rate is something again we need to determine before we start training the network basically what happens is let's say we have this network we have two hidden layers and we have four neurons each if we say every neuron has a one-fourth of chance of being inactive in a given step then you might get you know one neuron missing from here the first one the second one again some neurons missing from the first and the second you might have more less than one-fourth of all the neurons missing or more neurons missing of course but on average you're always going to have one fourth of neurons missing on your while you're using dropout regularization during training time and one little trick here one little thing that you need to understand about dropout regularization is that when you're training some nodes or some neurons are going to be missing but when you're testing they're not going to be missing so as a result what you need to do is to use something called the key probability which is one minus a dropout rate and you need to multiply your inputs with this key probability because if you think about it because we are missing a dropout rate amount of or in this example one-fourth of all the neurons on average during the training steps we are basically going to be receiving three inputs about for the test time we are going to have all of our neurons and then we are going to be receiving four so that's why we need to multiply our inputs with the key probability to make sure that the model is able to predict correct things and it's not really rigged in any way and the next one we

Early-stopping

have is early stopping we've seen this graph before the more you train your network the less the training loss is going to get but after a while validation loss is going to start getting higher and as we said that's the point we start overfitting what you can do with early stopping is to stop it at that point and say okay after 30 epochs my network starts overfitting so i'm just going to stop the overfitting process though i have to say this is kind of a controversial technique not everyone agrees that this is a good way of dealing with overfitting because what we want to do is we want to train our model make sure that it converges to a solution and then have a separate process on top of it to deal with overfitting rather than stopping training early because combining these two processes namely training and mitigating overfitting will cause further confusion down the line and the

Data augmentation

last technique that we have is data augmentation so it's always a good idea to feed more data into your network feed more different but that might not always be possible for you might simply not have enough data but especially if you're working with images like these let's say you are trying to understand or find dog faces in images but you always have dogs sitting in a certain position really straight 90 degrees to the camera and looking directly into the camera but you want your model to tolerate other pictures of dogs you wanted to be able to generalize in the real world so what you can do is to flip or rotate or do other transformations to your images to basically have a more enriched training data or if you want your model to be tolerable to the changes and exposure saturation or even just colors in general then you can you apply some other transformations to your data increase the saturation increase the hue invert the colors make it black and white and then feed all of this data into your network so that it will be able to still recognize that this is a parrot even if the colors are completely off and that's how data augmentation will help us regularize and overcome overfitting all right so that's it for

Get your Free AssemblyAI API link now!

regularization i hope everything was clear if you like this video please give us a like and maybe even subscribe to be one of the first people to know when we publish a new video and if you have any questions or comments i would love to hear it in the comments section before you leave don't forget to go grab your free api token using the link in the description have a nice day and i'll see you in the next video

Другие видео автора — AssemblyAI

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник