Weight Initialization for Deep Feedforward Neural Networks

4:10

Weight Initialization for Deep Feedforward Neural Networks

AssemblyAI 31.01.2022 20 431 просмотров 566 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

Weight initialization, even though a minor concern, has serious effects on the deep feedforward neural networks we train. Thanks to Xavier Glorot and Yoshua Bengio, we are aware that using a normal distribution for initializing weights with mean of 0 and variance of 1 contributes to the unstable gradients problem. That's why new techniques have been proposed to overcome these issues. In this video we learn what these techniques are, how they are different from each other and what their perfect activation function matches are. 👇 Get your free Assembly AI token here https://www.assemblyai.com/?utm_source=youtube&utm_medium=referral&utm_campaign=yt_mis_17

Оглавление (5 сегментов)

Intro

unstable gradients are one of the main problems of deep neural networks and one way how we can fix this is making sure that we are using the correct initializer for our network so in this video let's see what are the options for initializers that we can use and in which cases to use them this video is brought to you by assembly ai is a company that is making a state-of-the-art speech to text api if you'd like to give it a try for free go ahead and get your free api token using the link in the description so when we

Weight Initialization

start a network what we do is initialize the parameters of the network this includes the weight parameters and also the bias parameters are most of the time said to be zero at the beginning of training and then during training they're updated to take different values but for weights we cannot initialize them to be zero because if you your network will basically be rendered useless it will just be a linear transformation of what you have so for that we need to come up with different strategies to initialize the weights of the network originally what has been used most of the time was to initialize the weight so that the random distributions mean sits at zero and the standard deviation is one but in their 2010 paper cult understanding the difficulty of training deep neural networks yeshua banjo and xavier glorot found that this type of weight initialization technique in combination with logistic sigmoid function might be causing the unstable gradients problem so since then a bunch of new weight initialization techniques have been proposed so let's look into them one by one the thing that separates these

Gloret

weight initialization techniques is what they set the mean to be and variance to be the first way initialization technique is called gloret or xavier initialization but this technique will set the random distribution's mean to be zero and the variance to be one over fan average so the fan average is calculated as the average of fan in and fan out and fan in is basically the number of inputs that come into this layer and fan out is number of outputs also known as the number of neurons in this layer galore initialization is a default technique for weight initialization that is used in keras deep learning library and it is best to use it with linear 10h softmax or logistic activation functions next we have hey or key initialization i'm not actually 100 sure how to pronounce the name of this one with hey initialization again we set the mean of the randomness to be zero and the variance to be two over fen in so basically two over a number of inputs to this layer and this way technique is best used with the value or the variance of the value activation function and the last

Lacuna

commonly used weight initialization technique is lacun initialization again we set the mean to be 0 here and the variance to be one over fen n so again the number of inputs to this layer and lacuna initialization is best used with cellular activation function so here's a quick recap of what we just learned here are the three different commonly used way initialization techniques what their variants need to be set to and which activation functions they work best with normally using hey initialization with

Recap

the elu or any type of other value activation function will really help you deal with the unstable gradients problem but still you might encounter the problem further in the training so in that case you might need to do some batch normalization and if you don't know what batch normalization is go and check out our video on batch normalization but that is all about different types of weight initialization techniques they're actually quite straightforward and it's very easy to use one or the other when you're using keras as a deep learning library so make sure next time you are building a project that you choose the correct weight initialization technique for your project thank you for watching i hope you enjoyed this video don't forget to give us a like and maybe even subscribe to show us your support we would also love to hear from you in the comment section with any of your comments or questions about this video but before you go away don't forget to go grab your free api token for assembly ai using the link in the description thanks for watching again and i will see you in the next video

Другие видео автора — AssemblyAI

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник