What is Layer Normalization? | Deep Learning Fundamentals
5:18

What is Layer Normalization? | Deep Learning Fundamentals

AssemblyAI 07.02.2022 48 630 просмотров 1 414 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
You might have heard about Batch Normalization before. It is a great way to make your networks faster and better but there are some shortcomings of Batch Norm. That's why researchers have come up with an improvement over Batch Norm called Layer Normalization. In this video, we learn how Layer Normalization works, how it compares to Batch Normalization and for what cases it works best. 👇 Get your free AssemblyAI token here https://www.assemblyai.com/?utm_source=youtube&utm_medium=referral&utm_campaign=yt_mis_18 ▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬▬ 🖥️ Website: https://www.assemblyai.com 🐦 Twitter: https://twitter.com/AssemblyAI 🦾 Discord: https://discord.gg/Cd8MyVJAXd ▶️ Subscribe: https://www.youtube.com/c/AssemblyAI?sub_confirmation=1 🔥 We're hiring! Check our open roles: https://www.assemblyai.com/careers ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ #MachineLearning #DeepLearning

Оглавление (6 сегментов)

Intro

batch normalization has been a groundbreaking step into making neural networks faster and better but it doesn't always work but all different kinds of neural networks for example recurrent neural networks so that's why we have layer normalization and improvement over batch normalization and we will see how it works in this video is part of the deep learning explained series by assembly ai is a company that is making a state-of-the-art speech-to-text api if you like to give it a try go ahead and get your free api token using the link in the description

Problems with batch normalization

there are a bunch of problems with batch normalization so the first one is that it's very hard to use it with sequence data because if the sequences are of varying length batch normalization gets very complicated to calculate on top of that it's very hard to use bias normalization with small batch sizes because the whole point of partial normalization is to calculate the normalization values like the average and standard deviation on the batches so if you have very small batch number if you have a you're not going to calculate the mean and every standard deviation that actually represents the whole data set and on top of that it's very hard to parallelize a network that you use batch normalization in so most of these

What is layer normalization

problems happen because of the dependency that bash normalization has on batches and layer normalization removes that dependency and calculates the normalization based on the layers instead of the batches to quickly summarize what layer normalization does in one sentence we can say input values in all neurons in the same layer are normalized for each data sample and that's why under layer normalization all neurons in the same layer will have the same normalization terms so the same mean and the same variance so let's see how this works in practice so here i will show you how batch normalization is calculated between two layers and here i will show you how layer normalization is calculated between two layers so with batch normalization let's say we have two layers in between them we're going to do some batch normalization the first layer has four neurons and the next layer has five neurons what happens is with batch normalization let's say our batches consist of three data points we calculate the output of the prior layer for each of these three data points that are in the same batch and before we pass it on to the next layer what we do is for all of these batches we calculate the average and the mean and use that to normalize the values for all of the outputs of all of the single neurons and then these values is passed to the next layer whereas with layer normalization again let's say we have the exact same structure we have three neurons in one layer and the next layer has four neurons and even if we have the by size of three again let's say we calculate the values or the outputs of the from the prior layer like we did before and so far everything is the same but from this point on what we're going to normalize is the vertical values so instead of getting the values from three different batches that correspond to the same neuron the output of the same neuron we're going to calculate and normalize the values per data point and then again like with it last time after the normalization happens we're going to pass these values to the next layer so as you can see there is no dependency on batch size in layer normalization no matter how big or small your back size is you're just going to normalize values per your data point one other advantage that layer

Training time vs test time

normalization has over batch normalization is because it doesn't depend on batches we do the exact same calculations during training time and test time this was a little bit different in batch stabilization and if you don't know how that exactly works go ahead and watch our batch normalization video to have a better understanding what the difference is between training time by normalization and test that's exactly why

Why layer normalization is better

normalization and that's exactly why layer normalization is better for rnns it's because it's no longer about the batch but about the layer that we're doing the calculations on or in rnn terms the time step that we're doing the calculations on

Summary

so to sum up basically layer normalization gives us a chance to do normalization on recurrent neural networks because it is able to deal with different types of lengths of sequences on top of that when we're doing layer normalization we can choose whatever batch number that we want no matter how small or big and finally with layer normalization parallelization is no longer a problem because when you're using batch normalization then you would need to have extra communication and synchronization between the different computers to be able to parallelize correctly whereas with layer normalization every neuron has its own calculations so you do not need to have that extra layer of communication one downside of layer normalization is that it does not always work really well with convolutional neural networks so if you want to use a cnn architecture you might want to opt for batch normalization instead and that's it for layer normalization if you like this video don't forget to give us a like and maybe even subscribe to show us your support if you have any questions or comments leave it in the comment section below we would love to hear from you before you go away don't forget to go grab your free api token from assembly ai using the link in the description thanks for watching and i will see you in the next video

Другие видео автора — AssemblyAI

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник