Neural Networks Summary: All hyperparameters
17:19

Neural Networks Summary: All hyperparameters

AssemblyAI 14.02.2022 14 749 просмотров 462 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
The correct hyperparameter settings are critical to the success of a Feedforward Neural Network. In this video we take a high-level look on all main hyperparameters of Neural Networks. We see where in the lifecycle of the NNs they belong, what they mean and also how to set them using Python and Keras. 👇 Get your free AssemblyAI token here https://www.assemblyai.com/?utm_source=youtube&utm_medium=referral&utm_campaign=yt_mis_19 Intro 00:00 Input & output layers 01:01 Hidden layers 03:48 Activation functions 04:57 Weight initialization 06:34 Regularization 07:52 Loss functions 10:21 Optimization algorithm & learning rate 11:14 Batch size & Epochs (Number of iterations) 13:13 Wrap-up 16:12 Keras weight initializers: https://keras.io/api/layers/initializers/ Keras regularizers: https://keras.io/api/layers/regularizers/ Keras loss functions: https://keras.io/api/losses/ ▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬▬ 🖥️ Website: https://www.assemblyai.com 🐦 Twitter: https://twitter.com/AssemblyAI 🦾 Discord: https://discord.gg/Cd8MyVJAXd ▶️ Subscribe: https://www.youtube.com/c/AssemblyAI?sub_confirmation=1 🔥 We're hiring! Check our open roles: https://www.assemblyai.com/careers ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ #MachineLearning #DeepLearning

Оглавление (10 сегментов)

Intro

the exact hyper parameter settings of a deep feed forward neural network can make the model or break the model so in this video we will learn about what each of these hyper parameters mean how they work and also how to implement them using keras and python i always like working with visualization so that's why i made this diagram to show you where in the training life cycle of neural networks each hyperparameter belongs so in this network how everything works is that we get a data set that is a subset of the whole data set that we have it's a one batch and with this batch the network the output of the network is calculated using this equation and based on that a loss is calculated based on the loss function that we have it could be a bit different and based on the optimization algorithm and the learning rate that we said we calculate how much the weights and biases so the parameters of the network should be updated so i want to start with number of neurons in the input layer and output layer number of

Input & output layers

neurons in the output layer number of neurons in the input layer is going to be determined based on your data so if you have 10 features in your data set you're going to have need to have 10 different input neurons in your input layer if you have a image let's say a 20 to 20 pixel image you're going to need to have 400 of these input neurons so the number of neurons in your input is going to be determined based on your data whereas the number of neurons in your output layer again is going to depend on your data but this can change a little bit and you can kind of tweak it and change it based on your problem so let's say uh what you're doing is a binary classification you can either have only one output neuron where it if it outputs something closer to zero that means it's the it's one class if it outputs something close to one that's the other class so you can tweak it a little bit and change it based on your preferences if you have four different classes that you want to classify things into then it's probably better to have four different neurons or if you are trying to do regression for example then you're probably good with just one output neuron all right let's see how we can do this in keras so it's quite simple to create a model with keras this is basically the wrapper of a neural network you're saying that you want to create a sequential neural network and what we're going to do is to fill it inside with the different types of layers that we want in there so the first layer that i want in there is the input layer so i'm going to create it in a new line let's say i'm working with image data that is 28 to 28 so what i can do in my input layer is to say that my input shape is going to be 28 to 28 and what i want to happen is to flatten this and create 784 neurons in my input layer it is quite simple to create layers with keras you can call different types of layers flatten is the one where if you have a matrix of values that it flattens it to be one long array and we will see about the other layers in a second whereas to create a output layer what i'm going to do is to create a layer that's called a dense layer and i can add as many neurons as i want in there if as i said i'm doing a regression problem then i can just have one neuron out there if i am trying to classify these images into four different categories and i can have four of them and what else i need to specify here is the activation function uh of this layer so for now i can say i want this to have a softmax activation function so as you can see it's actually quite straightforward so the next thing to decide is the

Hidden layers

number of hidden layers and number of neurons in the hidden layers this is not determined based on the data like we did in the number of neurons in the input layer and output layer because it's going to depend on how complex of a network you need for the specific problem the more complex of a problem that you have the deeper you want your network to be and you might also want to improve or increase the number of neurons in your hidden layers one rule of thumb though is that if you need more complexity if you if your model is under fitting for example it's always better to add more layers before adding more neurons to the layers like we talked about in the output layer actually the hidden layers also can take the shape of dense layers so if you want to add a hidden layer what you need to do is in between the input and the output layer you just need to add more layers so let's say i want to add three layers and if you want to change the number of neurons that are in these layers you can just add a number in there and then that means that you are adding neurals in these hidden layers and this is basically how you determine the number of hidden layers and number of neurons in your hidden layers so the next thing

Activation functions

that we want to set up is the activation function can be different for all the layers of the network so you do not have to set one activation function for the whole network one really important point here is that the hidden layers cannot take linear activation functions because if you do a linear activation what's going to happen is that your input is going to be equal to your output at the end of the day your neural network is not going to be more than a simple linear transformation of your input but we want something more complicated than that so that's why in hidden layers we use other activation functions like sofmax sigmoid or relu or hyperbolic tangent function but for the output layer you can use linear transformation so let's say if you're doing a regression then maybe you actually want a raw number so in that case a linear activation function will be fine and setting the activation function in your layers is quite simple so all you have to do is go to your dense layers and then set the activation to the name of whatever you want it to say so let's say for these ones we use relu for another layer you might want to use the hyperbolic tangent function you might use value again or for example for the last layer maybe the output layer you want to use the softmax function so depending on your problem the output activation function would again change so you have to decide what kind of output that you're looking for based on your problem and then decide the activation function of the output layer next let's look into the

Weight initialization

weight initialization technique so before anything happens in this network we have to set up the parameters right so biases most of the time are said to be zero and then they're updated during the training but we cannot set the weights to be zero because if we do that what's going to happen is that the weights all of the weights are going to be updated in the same way and we're not going to be able to achieve the complexity that we want through this deep forward narrow feed forward neural network so that's why there are different types of ways or techniques that you can use to initialize your weights on your network and we talked about this before actually in the weight initialization techniques video so if you want to learn more about that go ahead and check that video it is quite simple to set the weight initializer all you have to say is kernel initializer to whatever name of the initializer that you like to choose you can go and check out the kernel initializer list on keras website and see what options you have let's say you want to do it with the hey the initializer with the normal distribution and the default value for this if you don't set anything else is the glory uniform initializer just so you know but if you want to set it to something else as we talked about in the initialization video also based on your activation function then you have to specify it separately

Regularization

another thing that you can do is to use a regularization technique so this is kind of like a branched out sort of hyper parameter because you can either use or not use regularization and you can use different types of regularization techniques and inside the type of regularization technique that you use there is another hyper parameter called alpha so you might need to decide that or if you're using a different regularization this could be called dropout rate but it will have in itself some other hyper parameters that you might need to tune regularization is basically what we do to stop the network from overfitting if you want to learn more about it the different techniques again go and check out our video on regularization it is quite simple to how we set the initializer again we say kerner regularizer a little bit of a hard word one way to call it is by using the name so if you want to use l1 or l2 regularization you can say l1 or l2 another way to do that if you want to specify the alpha parameter that we talked about is to call a function from the keras library so simply say keras regularizers l2 and then specify the alpha parameter or if you want to use more than one regularizer let's say you want to use l1 and l2 what you can do is again call kernel set the kernel regularizer parameter and then set keras regularizers i made a mistake here l1 underscore l2 and then if you want to set the alpha parameters simply you just need to say l1 for example there is another way to use regularization that's called dropout regularization and to use your part regularization you just need to create a whole new layer in between the layer that you want to use the regularization for and the next layer and we're going to call the licorice dropout layer for this dropout regularization and i just need to specify the rate at which i want the dropout to happen after completing the architecture of the network that you want what we need to do is call two other keras functions to compile and then train this model and the rest of the hyper parameters will be determined inside these two functions so let's see what those are

Loss functions

the next thing that we want to do is to set the loss function and again this will be determined based on the type of problem that you have there are a bunch of different options in the keras documentation you can see all the alternatives i will leave a link to all the carrots documentation also in the description below and basically you can go and read and based on the type of problem that you have you can see which one is appropriate for your problem what we need to do is just go inside the compile function and then set the loss function that we want so one example could be sparse categorical cross entropy this list function is used when you have two or more different labels that you're trying to classify your input into but as i said go and check out the documentation to see based on your problem what kind of loss function that you need to use so next we

Optimization algorithm & learning rate

have the optimization algorithm is the thing that determines how we should update the weights of the network or the parameters of the network to achieve better performance in the next round the most commonly used optimization algorithm is the stochastic gradient descent it is also the default value in keras but there's also many other different functions that you can use that is already built in the keras library there are two different ways how you can set the model optimizer so one way is to just call optimizer and then the string name of the function that you want to use a string name of the optimizer that you want to use so if you want to use gradient descent sgd is the term that you should use but there's another way how you can do this so instead of saying sgd you can also call like we did in the keras regularizers instead of just calling the string name is to call a keras optimizers and let's say this time we want to use adam and the advantage of this one is now that you specify the optimization algorithm that you want to use in by calling the keras function you can now specify the learning rate that you want like we did in the regularizers too right when we use just a string name that's all you can determine every for everything else a default value is used but if you want to specify the alpha you need to call the keras regularizers l2 like specifically like that and same here if you want to set the learning rate inside this optimizer you have to call it through the keras function so and then what you can do is to just set the learning rate to whatever you want so now we've seen how to set up the optimization algorithm and the learning rate there are different ways how you can set the learning rates for example you can do learning rate scheduling but that's a little bit more complicated and that will make this video very long so i will make a separate whole separate video for learning rescheduling so now

Batch size & Epochs (Number of iterations)

that we've seen how to set up the architecture of the network and also what kind of loss function to pay attention to and also what kind of procedure to use to update the parameters of the network the next thing that we want to look at is the training the actual training process of the network and for that we're going to look at the batch size so the number of data points that we use and the number of iterations so how many times do we run the whole data set through the network so batch size basically tells us how many different groups to divide the data set into so it says how many data points needs to be in one of these subgroups so let's say we have a data set with a thousand data points if we set the batch size to be 500 that means we will have two batches in total so let's say in this example here we have a thousand data sets sorry a thousand data points and we divide we make the bad size to be 250 so that's why we have four batches how learning happens in a deep neural network is that we take one batch so in this example 250 data points and we run it through the network we calculate the output we calculate the error and then we using the optimization algorithm and the learning rate we update the weights of the network and then we do the same thing for the next batch so we run the whole batch through the network we calculate the output and then the error and update the weights and when you do it four times with four different batches that means you have done one epoch so you have completed one iteration over the network so on epoc means running the whole data set through the network once so that's why the number of iterations or epochs are different than the number of batches that we have so now let's see how we can set it in keras inside the fit function what we first need to do is of course to give our network uh the data set right the x train and y train and on top of that you can set the bet size to be whatever you want normally smaller bite size the better there has been some work done into showing that larger batch sizes could actually work and produce good results but generally we use small bed sizes from 2 to 64. so let's say we use a bite size 32 in this case and most of the time we use the increments of two because of how the computer is our computers are built so that's why you know because of the binary nature of computers we opt for batch sizes that are increments of two and here this is also where we set the epoch so we can set the epoch to be whatever you want really we can set it to be 30 and if you run your network and you realize that your network is underfitting you can increase the epochs to be whatever number that you want and then evaluate again another thing even though it's not a hyperparameter is to set the validation data inside the fit function so what you need to do is if you separated a validation data set separately before and your data preparation phase you can include them here

Wrap-up

so these are basically the main hyper parameters of a network as you've seen some of them are going to depend on the data that you're using and other hyper parameter settings so for example um the activation function that you're using is going to affect the weight initialization technique that you use so there could be some interdependencies in between your hyper parameters and some hyper parameters are going to be trial or error and basically seeing what works better for your problem so for example the bet size or number of iterations but no matter what you're doing always make sure to check the documentation of the library that you're using and what the basic information that you got with this video today i'm sure you'll be on your way to create your first neural network in a confident way hey thanks for watching i hope this video was helpful for you if you liked it don't forget to give us a like and maybe even subscribe to show your support we would also love to hear from you in the comment section below with any of your questions or comments and before you leave don't forget to go get your free api token for assembly ai using the link in the description thanks for watching again and i will see you in the next video

Другие видео автора — AssemblyAI

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник