Recurrent Neural Networks (RNNs) Explained - Deep Learning

9:15

Recurrent Neural Networks (RNNs) Explained - Deep Learning

AssemblyAI 15.01.2022 13 062 просмотров 288 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

In this video, we learn what Recurrent Neural Networks (RNNs) are and how they work. Get your Free Token for AssemblyAI Speech-To-Text API 👇https://www.assemblyai.com/?utm_source=youtube&utm_medium=referral&utm_campaign=yt_pat_7 Recurrent Neural Networks - or short RNNs- are an essential part of deep learning. They are widely used and are incredibly effective in many different kinds of applications, especially in the field of natural language processing, like text classification or text generation. But they can also be used when working with images or video data. So in this video, we learn what RNNs are and how they work. Resources: https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks https://iamtrask.github.io/2015/11/15/anyone-can-code-lstm/ http://karpathy.github.io/2015/05/21/rnn-effectiveness/ http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Оглавление (14 сегментов)

Intro

recurrent neural networks or short r ends are an essential part in deep learning they are widely used and are incredibly effective in many different kinds of applications especially in the field of natural language processing like for text generation or text classification but they can also be used when working with images or video data so in this video we learn what rnns are and how they work this video is part of the deep learning explain series by assembly ai which is a company that creates a state-of-the-art speech-to-text api and if you want to try their api for free then grab your free api token using the link in the description below and now let's get started first let's have a quick look at

What are RNNs

one definition of rnands rns are a special class of neural networks that operate on sequence data and allow previous outputs to be used as inputs while having hidden states so let us reiterate they are also neural networks they have something to do with hidden states and they can keep track of previous outputs in a sequence so if this definition is not clear for you right now don't worry we will come back to this in a moment but first let's have a look at why rns works so great and for this we look at one example let's do a quick test and complete the sentence the color of the sky is blue or another one i grew up in germany that's why i'm fluent in german and i bet you guessed the correct words because for our brains it should not be a problem to determine the missing word and we know the word because we know the previous text we don't just look at one word and throw away the previous information we remember the previous context and then use this information and this is exactly what rnns do so if we try to do this word prediction with a normal neural net then this would not work because one major shortcoming of neural nets is that they cannot keep track of previous outputs and this is what rns tried to fix now let's go back

What is sequence data

to the original definition by now it should be more clear what is meant with sequence data and previous outputs for example a sequence data can be a sentence where the rnn looks at each word separately encodes it to a mathematical representation and can then remember the previous outputs but how exactly does this work and what are these hidden states so a normal neural net gets an input

Loops

then has different hidden layers and returns an output now our ends have loops in them allowing the information to persist so we can also unroll this loop and look at this

Sequence Data

picture in a different way we can now interpret this as a sequence data with different time steps we have the first input where we do calculations in the hidden layers and can then get an output and then we pass the information from the hidden states onto the second time step and here again we get an input then we do calculations where we now can use the previous information and then again can get an output and pass on the information and so on now each time step contains the information from all previous steps so the last step has information about all the previous context and then for example we can use this very last output here and pass it on to a simple linear classification layer and do the word prediction like in the example from before so by now i hope the explanation and motivation of rns is clear to you another reason why iron ends are so powerful is because of their

Flexibility

flexibility now normal neural nets or convolutional nets have the limitation that they don't remember previous states and moreover they accept a fixed size vector as an input and produce a fixed sized vector as an output for example one image can be the input and the probability of different classes is the output so this is what we get with traditional neural networks and recurrent neural nets on the other hand operate on sequence data so they operate over a sequence of vectors and this could be sequences in the input the output or in both and this allows for many different applications for example we can have a one-to-many relationship which can be used for music generation so we get one input but many outputs

Manytomany

outputs we can also have a many to one relationship so we have many inputs but only one output this is for example used for sentiment classification

RNN Cheat Sheet

and we can have different types of many-to-many relationships for example here for name entity recognition or for machine translation and by the way these images are taken from the recurrent neural networks cheat sheet that you can find on the official stanford website and i will put the link in the description for you now how do

How RNNs learn

rnns learn and basically the same as other neural nets they learn by the back propagation algorithm and to be more specific by back propagation through time so we do a full forward pass through all the time steps and then a full backward pass through the entire sequence and calculate the gradients so nothing really special here and by the way if you want to learn more about backpropagation in detail then we have another video for you on our channel that we link here now this long way of traveling can have one major problem the more layers we have to travel through the more multiplicative gradient calculations we have and this can lead to a so-called vanishing gradient or also to a exploding gradient

Vanishing and exploding gradients

and this in turn means that these information get lost and it gets difficult to capture long-term dependencies so the more time steps we have the less information we can keep from time steps that are further in the past so let's again have a look at the two

Example sentences

sentences from the beginning the color of the sky is blue and i grew up in germany that's why i'm fluent in german in the first example the relevant information are sky and blue so they are close together but in the second one maybe we can guess the word should be a language because we see the word fluent but to guess the correct language we also have to read the previous sentence so we have to go further back in time so the more we have to go back the more difficult it can be for our rnn because of this problem with the vanishing gradients but luckily there are existing solutions that tackle exactly this problem long short term memory or lstm

Existing solutions

and gated recurrent units or gru's are two special variants of rnns they are capable of learning long-term dependencies using a mechanism called gates so they often outperform simple rns and are important to know but getting into more detail here would be too much for this video but i'm sure we will cover them in a later video in the series but also having these two improvements does not mean that simple rnns can be skipped because on the downside they are computationally much more expensive so in many cases a simple rn will be just fine now at the end i also want to

Python code

quickly show you how you can use this in code so both deep learning frameworks python and tensorflow make this pretty easy for you because they have existing layers that you can use for example here is the pytorch code so here you can set up this r and n layer and you find this in the torch dot n module and the same way you will also find the lstm and the gru layer and then what is important to note is that when we call this layer in our forward pass we also have to put in a hidden state so here we initialize the hidden states with a zeros tensor and then we pass it to our r and n layer and then we get two things back one is the output and one are the encoded hidden states so yeah this is how it works in pi torch and in tensorflow the api looks different but here again you have the layers already available for example you can use the simple rnn or also lstm and gru are available and then you can put this in a for example a sequential model

Outro

all right i hope i could explain rnns in a simple way if you still have questions then let me know in the comments if you enjoyed the video then please leave us a thumbs up and consider subscribing to our channel and then i can recommend to watch this video if you haven't already about back propagation and then i hope to see you next time bye

Другие видео автора — AssemblyAI

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник