But what is a neural network? | Deep learning chapter 1

18:40

But what is a neural network? | Deep learning chapter 1

3Blue1Brown 05.10.2017 22 596 013 просмотров 533 765 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

What are the neurons, why are there layers, and what is the math underlying it? Help fund future projects: https://www.patreon.com/3blue1brown Written/interactive form of this series: https://www.3blue1brown.com/topics/neural-networks Additional funding for this project was provided by Amplify Partners For those who want to learn more, I highly recommend the book by Michael Nielsen that introduces neural networks and deep learning: https://goo.gl/Zmczdy There are two neat things about this book. First, it's available for free, so consider joining me in making a donation to Nielsen if you get something out of it. And second, it's centered around walking through some code and data, which you can download yourself, and which covers the same example that I introduced in this video. Yay for active learning! https://github.com/mnielsen/neural-networks-and-deep-learning I also highly recommend Chris Olah's blog: http://colah.github.io/ For more videos, Welch Labs also has some great series on machine learning: https://youtu.be/i8D90DkCLhI https://youtu.be/bxe2T-V8XRs For those of you looking to go *even* deeper, check out the text "Deep Learning" by Goodfellow, Bengio, and Courville. Also, the publication Distill is just utterly beautiful: https://distill.pub/ Lion photo by Kevin Pluck Звуковая дорожка на русском языке: Влад Бурмистров. Thanks to these viewers for their contributions to translations German: @fpgro Hebrew: Omer Tuchfeld Hungarian: Máté Kaszap Italian: @teobucci, Teo Bucci ----------------- Timeline: 0:00 - Introduction example 1:07 - Series preview 2:42 - What are neurons? 3:35 - Introducing layers 5:31 - Why layers? 8:38 - Edge detection example 11:34 - Counting weights and biases 12:30 - How learning relates 13:26 - Notation and linear algebra 15:17 - Recap 16:27 - Some final words 17:03 - ReLU vs Sigmoid Correction 14:45 - The final index on the bias vector should be "k" ------------------ Animations largely made using manim, a scrappy open source python library. https://github.com/3b1b/manim If you want to check it out, I feel compelled to warn you that it's not the most well-documented tool, and has many other quirks you might expect in a library someone wrote with only their own use in mind. Music by Vincent Rubinetti. Download the music on Bandcamp: https://vincerubinetti.bandcamp.com/album/the-music-of-3blue1brown Stream the music on Spotify: https://open.spotify.com/album/1dVyjwS8FBqXhRunaG5W5u If you want to contribute translated subtitles or to help review those that have already been made by others and need approval, you can click the gear icon in the video and go to subtitles/cc, then "add subtitles/cc". I really appreciate those who do this, as it helps make the lessons accessible to more people. ------------------ 3blue1brown is a channel about animating math, in all senses of the word animate. And you know the drill with YouTube, if you want to stay posted on new videos, subscribe, and click the bell to receive notifications (if you're into that). If you are new to this channel and want to see more, a good place to start is this playlist: http://3b1b.co/recommended Various social media stuffs: Website: https://www.3blue1brown.com Twitter: https://twitter.com/3Blue1Brown Patreon: https://patreon.com/3blue1brown Facebook: https://www.facebook.com/3blue1brown Reddit: https://www.reddit.com/r/3Blue1Brown

Методичка по этому видео

Структурированный конспект

Что такое нейронная сеть? Структура глубокого обучения простым языком

Наглядное объяснение структуры нейронной сети: нейроны, слои, веса, смещения и сигмоида на примере распознавания рукописных цифр

Оглавление (48 сегментов)

Introduction example

This is a three. It's sloppily written and rendered at an extremely low resolution of 28x 28 pixels. But your brain has no trouble recognizing it as a three. And I want you to take a moment to appreciate how crazy it is that brains can do this so effortlessly. I mean this, and this are also recognizable as threes. Even though the specific values of each pixel is very different from one image to the next, the particular light sensitive cells in your eye that are firing when you see this three are very different from the ones firing when you see this three. But something in that crazy smart visual cortex of yours resolves these as representing the same idea while at the same time recognizing other images as their own distinct ideas. But if I told you, hey, sit down and write for me a program that takes in a grid of 28x 28 pixels like this and outputs a single number between 0 and 10, telling you what it thinks the digit is. While the task goes from comically trivial to dauntingly difficult, unless you've been

Introduction example

This is a three. It's sloppily written and rendered at an extremely low resolution of 28x 28 pixels. But your brain has no trouble recognizing it as a three. And I want you to take a moment to appreciate how crazy it is that brains can do this so effortlessly. I mean this, and this are also recognizable as threes. Even though the specific values of each pixel is very different from one image to the next, the particular light sensitive cells in your eye that are firing when you see this three are very different from the ones firing when you see this three. But something in that crazy smart visual cortex of yours resolves these as representing the same idea while at the same time recognizing other images as their own distinct ideas. But if I told you, hey, sit down and write for me a program that takes in a grid of 28x 28 pixels like this and outputs a single number between 0 and 10, telling you what it thinks the digit is. While the task goes from comically trivial to dauntingly difficult, unless you've been

Introduction example

This is a three. It's sloppily written and rendered at an extremely low resolution of 28x 28 pixels. But your brain has no trouble recognizing it as a three. And I want you to take a moment to appreciate how crazy it is that brains can do this so effortlessly. I mean this, and this are also recognizable as threes. Even though the specific values of each pixel is very different from one image to the next, the particular light sensitive cells in your eye that are firing when you see this three are very different from the ones firing when you see this three. But something in that crazy smart visual cortex of yours resolves these as representing the same idea while at the same time recognizing other images as their own distinct ideas. But if I told you, hey, sit down and write for me a program that takes in a grid of 28x 28 pixels like this and outputs a single number between 0 and 10, telling you what it thinks the digit is. While the task goes from comically trivial to dauntingly difficult, unless you've been

Introduction example

This is a three. It's sloppily written and rendered at an extremely low resolution of 28x 28 pixels. But your brain has no trouble recognizing it as a three. And I want you to take a moment to appreciate how crazy it is that brains can do this so effortlessly. I mean this, and this are also recognizable as threes. Even though the specific values of each pixel is very different from one image to the next, the particular light sensitive cells in your eye that are firing when you see this three are very different from the ones firing when you see this three. But something in that crazy smart visual cortex of yours resolves these as representing the same idea while at the same time recognizing other images as their own distinct ideas. But if I told you, hey, sit down and write for me a program that takes in a grid of 28x 28 pixels like this and outputs a single number between 0 and 10, telling you what it thinks the digit is. While the task goes from comically trivial to dauntingly difficult, unless you've been

Series preview

living under a rock, I think I hardly need to motivate the relevance and importance of machine learning and neural networks to the present and to the future. But what I want to do here is show you what a neural network actually is, assuming no background, and to help visualize what it's doing, not as a buzzword, but as a piece of math. My hope is just that you come away feeling like the structure itself is motivated and to feel like you know what it means when you read or you hear about a neural network, quote unquote, learning. This video is just going to be devoted to the structure component of that. And the following one is going to tackle learning. What we're going to do is put together a neural network that can learn to recognize handwritten digits. This is a somewhat classic example for introducing the topic. And I'm happy to stick with the status quo here because at the end of the two videos, I want to point you to a couple good resources where you can learn more and where you can download the code that does this and play with it on your own computer. There are many variants of neural networks and in recent years there's been sort of a boom in research towards these variants. But in these two introductory videos, you and I are just going to look at the simplest plain vanilla form with no added frills. This is kind of a necessary prerequisite for understanding any of the more powerful modern variants. And trust me, it still has plenty of complexity for us to wrap our minds around. But even in this simplest form, it can learn to recognize handwritten digits, which is a pretty cool thing for a computer to be able to do. And at the same time, you'll see how it does fall short of a couple hopes that we might have for it.

Series preview

living under a rock, I think I hardly need to motivate the relevance and importance of machine learning and neural networks to the present and to the future. But what I want to do here is show you what a neural network actually is, assuming no background, and to help visualize what it's doing, not as a buzzword, but as a piece of math. My hope is just that you come away feeling like the structure itself is motivated and to feel like you know what it means when you read or you hear about a neural network, quote unquote, learning. This video is just going to be devoted to the structure component of that. And the following one is going to tackle learning. What we're going to do is put together a neural network that can learn to recognize handwritten digits. This is a somewhat classic example for introducing the topic. And I'm happy to stick with the status quo here because at the end of the two videos, I want to point you to a couple good resources where you can learn more and where you can download the code that does this and play with it on your own computer. There are many variants of neural networks and in recent years there's been sort of a boom in research towards these variants. But in these two introductory videos, you and I are just going to look at the simplest plain vanilla form with no added frills. This is kind of a necessary prerequisite for understanding any of the more powerful modern variants. And trust me, it still has plenty of complexity for us to wrap our minds around. But even in this simplest form, it can learn to recognize handwritten digits, which is a pretty cool thing for a computer to be able to do. And at the same time, you'll see how it does fall short of a couple hopes that we might have for it.

Series preview

living under a rock, I think I hardly need to motivate the relevance and importance of machine learning and neural networks to the present and to the future. But what I want to do here is show you what a neural network actually is, assuming no background, and to help visualize what it's doing, not as a buzzword, but as a piece of math. My hope is just that you come away feeling like the structure itself is motivated and to feel like you know what it means when you read or you hear about a neural network, quote unquote, learning. This video is just going to be devoted to the structure component of that. And the following one is going to tackle learning. What we're going to do is put together a neural network that can learn to recognize handwritten digits. This is a somewhat classic example for introducing the topic. And I'm happy to stick with the status quo here because at the end of the two videos, I want to point you to a couple good resources where you can learn more and where you can download the code that does this and play with it on your own computer. There are many variants of neural networks and in recent years there's been sort of a boom in research towards these variants. But in these two introductory videos, you and I are just going to look at the simplest plain vanilla form with no added frills. This is kind of a necessary prerequisite for understanding any of the more powerful modern variants. And trust me, it still has plenty of complexity for us to wrap our minds around. But even in this simplest form, it can learn to recognize handwritten digits, which is a pretty cool thing for a computer to be able to do. And at the same time, you'll see how it does fall short of a couple hopes that we might have for it.

Series preview

living under a rock, I think I hardly need to motivate the relevance and importance of machine learning and neural networks to the present and to the future. But what I want to do here is show you what a neural network actually is, assuming no background, and to help visualize what it's doing, not as a buzzword, but as a piece of math. My hope is just that you come away feeling like the structure itself is motivated and to feel like you know what it means when you read or you hear about a neural network, quote unquote, learning. This video is just going to be devoted to the structure component of that. And the following one is going to tackle learning. What we're going to do is put together a neural network that can learn to recognize handwritten digits. This is a somewhat classic example for introducing the topic. And I'm happy to stick with the status quo here because at the end of the two videos, I want to point you to a couple good resources where you can learn more and where you can download the code that does this and play with it on your own computer. There are many variants of neural networks and in recent years there's been sort of a boom in research towards these variants. But in these two introductory videos, you and I are just going to look at the simplest plain vanilla form with no added frills. This is kind of a necessary prerequisite for understanding any of the more powerful modern variants. And trust me, it still has plenty of complexity for us to wrap our minds around. But even in this simplest form, it can learn to recognize handwritten digits, which is a pretty cool thing for a computer to be able to do. And at the same time, you'll see how it does fall short of a couple hopes that we might have for it.

What are neurons?

As the name suggests, neural networks are inspired by the brain. But let's break that down. What are the neurons, and in what sense are they linked together? Right now, when I say neuron, all I want you to think about is a thing that holds a number. specifically a number between zero and one. It's really not more than that. For example, the network starts with a bunch of neurons corresponding to each of the 28 * 28 pixels of the input image, which is 784 neurons in total. Each one of these holds a number that represents the grayscale value of the corresponding pixel, ranging from zero for black pixels up to one for white pixels. This number inside the neuron is called its activation. And the image you might have in mind here is that each neuron is lit up when its activation is a high number.

What are neurons?

As the name suggests, neural networks are inspired by the brain. But let's break that down. What are the neurons, and in what sense are they linked together? Right now, when I say neuron, all I want you to think about is a thing that holds a number. specifically a number between zero and one. It's really not more than that. For example, the network starts with a bunch of neurons corresponding to each of the 28 * 28 pixels of the input image, which is 784 neurons in total. Each one of these holds a number that represents the grayscale value of the corresponding pixel, ranging from zero for black pixels up to one for white pixels. This number inside the neuron is called its activation. And the image you might have in mind here is that each neuron is lit up when its activation is a high number.

What are neurons?

As the name suggests, neural networks are inspired by the brain. But let's break that down. What are the neurons, and in what sense are they linked together? Right now, when I say neuron, all I want you to think about is a thing that holds a number. specifically a number between zero and one. It's really not more than that. For example, the network starts with a bunch of neurons corresponding to each of the 28 * 28 pixels of the input image, which is 784 neurons in total. Each one of these holds a number that represents the grayscale value of the corresponding pixel, ranging from zero for black pixels up to one for white pixels. This number inside the neuron is called its activation. And the image you might have in mind here is that each neuron is lit up when its activation is a high number.

What are neurons?

As the name suggests, neural networks are inspired by the brain. But let's break that down. What are the neurons, and in what sense are they linked together? Right now, when I say neuron, all I want you to think about is a thing that holds a number. specifically a number between zero and one. It's really not more than that. For example, the network starts with a bunch of neurons corresponding to each of the 28 * 28 pixels of the input image, which is 784 neurons in total. Each one of these holds a number that represents the grayscale value of the corresponding pixel, ranging from zero for black pixels up to one for white pixels. This number inside the neuron is called its activation. And the image you might have in mind here is that each neuron is lit up when its activation is a high number.

Introducing layers

So all of these 784 neurons make up the first layer of our network. Now jumping over to the last layer. This has 10 neurons each representing one of the digits. The activation in these neurons, again, some number that's between 0 and 1, represents how much the system thinks that a given image corresponds with a given digit. There's also a couple layers in between called the hidden layers, which for the time being should just be a giant question mark for how on earth this process of recognizing digits is going to be handled. In this network, I chose two hidden layers, each one with 16 neurons. And admittedly, that's kind of an arbitrary choice to be honest. I chose two layers based on how I want to motivate the structure in just a moment. And 16, well, that was just a nice number to fit on the screen. In practice, there is a lot of room for experiment with a specific structure here. The way the network operates, activations in one layer determine the activations of the next layer. And of course, the heart of the network as an information processing mechanism comes down to exactly how those activations from one layer bring about activations in the next layer. It's meant to be loosely analogous to how in biological networks of neurons, some groups of neurons firing cause certain others to fire. Now, the network I'm showing here has already been trained to recognize digits. And let me show you what I mean by that. It means if you feed in an image lighting up all 784 neurons of the input layer according to the brightness of each pixel in the image, that pattern of activations causes some very specific pattern in the next layer, which causes some pattern in the one after it, which finally gives some pattern in the output layer. And the brightest neuron of that output layer is the network's choice, so to speak, for what digit this image represents.

Introducing layers

So all of these 784 neurons make up the first layer of our network. Now jumping over to the last layer. This has 10 neurons each representing one of the digits. The activation in these neurons, again, some number that's between 0 and 1, represents how much the system thinks that a given image corresponds with a given digit. There's also a couple layers in between called the hidden layers, which for the time being should just be a giant question mark for how on earth this process of recognizing digits is going to be handled. In this network, I chose two hidden layers, each one with 16 neurons. And admittedly, that's kind of an arbitrary choice to be honest. I chose two layers based on how I want to motivate the structure in just a moment. And 16, well, that was just a nice number to fit on the screen. In practice, there is a lot of room for experiment with a specific structure here. The way the network operates, activations in one layer determine the activations of the next layer. And of course, the heart of the network as an information processing mechanism comes down to exactly how those activations from one layer bring about activations in the next layer. It's meant to be loosely analogous to how in biological networks of neurons, some groups of neurons firing cause certain others to fire. Now, the network I'm showing here has already been trained to recognize digits. And let me show you what I mean by that. It means if you feed in an image lighting up all 784 neurons of the input layer according to the brightness of each pixel in the image, that pattern of activations causes some very specific pattern in the next layer, which causes some pattern in the one after it, which finally gives some pattern in the output layer. And the brightest neuron of that output layer is the network's choice, so to speak, for what digit this image represents.

Introducing layers

So all of these 784 neurons make up the first layer of our network. Now jumping over to the last layer. This has 10 neurons each representing one of the digits. The activation in these neurons, again, some number that's between 0 and 1, represents how much the system thinks that a given image corresponds with a given digit. There's also a couple layers in between called the hidden layers, which for the time being should just be a giant question mark for how on earth this process of recognizing digits is going to be handled. In this network, I chose two hidden layers, each one with 16 neurons. And admittedly, that's kind of an arbitrary choice to be honest. I chose two layers based on how I want to motivate the structure in just a moment. And 16, well, that was just a nice number to fit on the screen. In practice, there is a lot of room for experiment with a specific structure here. The way the network operates, activations in one layer determine the activations of the next layer. And of course, the heart of the network as an information processing mechanism comes down to exactly how those activations from one layer bring about activations in the next layer. It's meant to be loosely analogous to how in biological networks of neurons, some groups of neurons firing cause certain others to fire. Now, the network I'm showing here has already been trained to recognize digits. And let me show you what I mean by that. It means if you feed in an image lighting up all 784 neurons of the input layer according to the brightness of each pixel in the image, that pattern of activations causes some very specific pattern in the next layer, which causes some pattern in the one after it, which finally gives some pattern in the output layer. And the brightest neuron of that output layer is the network's choice, so to speak, for what digit this image represents.

Introducing layers

So all of these 784 neurons make up the first layer of our network. Now jumping over to the last layer. This has 10 neurons each representing one of the digits. The activation in these neurons, again, some number that's between 0 and 1, represents how much the system thinks that a given image corresponds with a given digit. There's also a couple layers in between called the hidden layers, which for the time being should just be a giant question mark for how on earth this process of recognizing digits is going to be handled. In this network, I chose two hidden layers, each one with 16 neurons. And admittedly, that's kind of an arbitrary choice to be honest. I chose two layers based on how I want to motivate the structure in just a moment. And 16, well, that was just a nice number to fit on the screen. In practice, there is a lot of room for experiment with a specific structure here. The way the network operates, activations in one layer determine the activations of the next layer. And of course, the heart of the network as an information processing mechanism comes down to exactly how those activations from one layer bring about activations in the next layer. It's meant to be loosely analogous to how in biological networks of neurons, some groups of neurons firing cause certain others to fire. Now, the network I'm showing here has already been trained to recognize digits. And let me show you what I mean by that. It means if you feed in an image lighting up all 784 neurons of the input layer according to the brightness of each pixel in the image, that pattern of activations causes some very specific pattern in the next layer, which causes some pattern in the one after it, which finally gives some pattern in the output layer. And the brightest neuron of that output layer is the network's choice, so to speak, for what digit this image represents.

Why layers?

And before jumping into the math for how one layer influences the next or how training works, let's just talk about why it's even reasonable to expect a layered structure like this to behave intelligently. What are we expecting here? What is the best hope for what those middle layers might be doing? Well, when you or I recognize digits, we piece together various components. A 9 has a loop up top and a line on the right. An eight also has a loop up top, but it's paired with another loop down low. A four basically breaks down into three specific lines and things like that. Now, in a perfect world, we might hope that each neuron in the second to last layer corresponds with one of these subcomponents. That anytime you feed in an image with, say, a loop up top, like a 9 or an 8, there's some specific neuron whose activation is going to be close to one. And I don't mean this specific loop of pixels. The hope would be that any generally loopy pattern towards the top sets off this neuron. That way, going from the third layer to the last one just requires learning which combination of subcomponents corresponds to which digits. Of course, that just kicks the problem down the road because how would you recognize these subcomponents or even learn what the right subcomponents should be? And I still haven't even talked about how one layer influences the next. But run with me on this one for a moment. Recognizing a loop can also break down into subpros. One reasonable way to do this would be to first recognize the various little edges that make it up. Similarly, a long line like the kind you might see in the digits 1 or four or seven. Well, that's really just a long edge. Or maybe you think of it as a certain pattern of several smaller edges. So maybe our hope is that each neuron in the second layer of the network corresponds with the various relevant little edges. Maybe when an image like this one comes in, it lights up all of the neurons associated with around 8 to 10 specific little edges, which in turn lights up the neurons associated with the upper loop and a long vertical line, and those light up the neuron associated with a nine. Whether or not this is what our final network actually does is another question, one that I'll come back to once we see how to train the network. But this is a hope that we might have a sort of goal with the layered structure like this. Moreover, you can imagine how being able to detect edges and patterns like this would be really useful for other image recognition tasks. And even beyond image recognition, there are all sorts of intelligent things you might want to do that break down into layers of abstraction. Parsing speech, for example, involves taking raw audio and picking out distinct sounds which combine to make certain syllables, which combine to form words, which combine to make up phrases and more abstract thoughts, etc. But getting back to how any of this actually works, picture yourself right now designing how exactly the activations in one layer might determine the activations in the next. The goal is to have some mechanism that could conceivably combine pixels into edges or edges into patterns or patterns

Why layers?

And before jumping into the math for how one layer influences the next or how training works, let's just talk about why it's even reasonable to expect a layered structure like this to behave intelligently. What are we expecting here? What is the best hope for what those middle layers might be doing? Well, when you or I recognize digits, we piece together various components. A 9 has a loop up top and a line on the right. An eight also has a loop up top, but it's paired with another loop down low. A four basically breaks down into three specific lines and things like that. Now, in a perfect world, we might hope that each neuron in the second to last layer corresponds with one of these subcomponents. That anytime you feed in an image with, say, a loop up top, like a 9 or an 8, there's some specific neuron whose activation is going to be close to one. And I don't mean this specific loop of pixels. The hope would be that any generally loopy pattern towards the top sets off this neuron. That way, going from the third layer to the last one just requires learning which combination of subcomponents corresponds to which digits. Of course, that just kicks the problem down the road because how would you recognize these subcomponents or even learn what the right subcomponents should be? And I still haven't even talked about how one layer influences the next. But run with me on this one for a moment. Recognizing a loop can also break down into subpros. One reasonable way to do this would be to first recognize the various little edges that make it up. Similarly, a long line like the kind you might see in the digits 1 or four or seven. Well, that's really just a long edge. Or maybe you think of it as a certain pattern of several smaller edges. So maybe our hope is that each neuron in the second layer of the network corresponds with the various relevant little edges. Maybe when an image like this one comes in, it lights up all of the neurons associated with around 8 to 10 specific little edges, which in turn lights up the neurons associated with the upper loop and a long vertical line, and those light up the neuron associated with a nine. Whether or not this is what our final network actually does is another question, one that I'll come back to once we see how to train the network. But this is a hope that we might have a sort of goal with the layered structure like this. Moreover, you can imagine how being able to detect edges and patterns like this would be really useful for other image recognition tasks. And even beyond image recognition, there are all sorts of intelligent things you might want to do that break down into layers of abstraction. Parsing speech, for example, involves taking raw audio and picking out distinct sounds which combine to make certain syllables, which combine to form words, which combine to make up phrases and more abstract thoughts, etc. But getting back to how any of this actually works, picture yourself right now designing how exactly the activations in one layer might determine the activations in the next. The goal is to have some mechanism that could conceivably combine pixels into edges or edges into patterns or patterns

Why layers?

And before jumping into the math for how one layer influences the next or how training works, let's just talk about why it's even reasonable to expect a layered structure like this to behave intelligently. What are we expecting here? What is the best hope for what those middle layers might be doing? Well, when you or I recognize digits, we piece together various components. A 9 has a loop up top and a line on the right. An eight also has a loop up top, but it's paired with another loop down low. A four basically breaks down into three specific lines and things like that. Now, in a perfect world, we might hope that each neuron in the second to last layer corresponds with one of these subcomponents. That anytime you feed in an image with, say, a loop up top, like a 9 or an 8, there's some specific neuron whose activation is going to be close to one. And I don't mean this specific loop of pixels. The hope would be that any generally loopy pattern towards the top sets off this neuron. That way, going from the third layer to the last one just requires learning which combination of subcomponents corresponds to which digits. Of course, that just kicks the problem down the road because how would you recognize these subcomponents or even learn what the right subcomponents should be? And I still haven't even talked about how one layer influences the next. But run with me on this one for a moment. Recognizing a loop can also break down into subpros. One reasonable way to do this would be to first recognize the various little edges that make it up. Similarly, a long line like the kind you might see in the digits 1 or four or seven. Well, that's really just a long edge. Or maybe you think of it as a certain pattern of several smaller edges. So maybe our hope is that each neuron in the second layer of the network corresponds with the various relevant little edges. Maybe when an image like this one comes in, it lights up all of the neurons associated with around 8 to 10 specific little edges, which in turn lights up the neurons associated with the upper loop and a long vertical line, and those light up the neuron associated with a nine. Whether or not this is what our final network actually does is another question, one that I'll come back to once we see how to train the network. But this is a hope that we might have a sort of goal with the layered structure like this. Moreover, you can imagine how being able to detect edges and patterns like this would be really useful for other image recognition tasks. And even beyond image recognition, there are all sorts of intelligent things you might want to do that break down into layers of abstraction. Parsing speech, for example, involves taking raw audio and picking out distinct sounds which combine to make certain syllables, which combine to form words, which combine to make up phrases and more abstract thoughts, etc. But getting back to how any of this actually works, picture yourself right now designing how exactly the activations in one layer might determine the activations in the next. The goal is to have some mechanism that could conceivably combine pixels into edges or edges into patterns or patterns

Why layers?

And before jumping into the math for how one layer influences the next or how training works, let's just talk about why it's even reasonable to expect a layered structure like this to behave intelligently. What are we expecting here? What is the best hope for what those middle layers might be doing? Well, when you or I recognize digits, we piece together various components. A 9 has a loop up top and a line on the right. An eight also has a loop up top, but it's paired with another loop down low. A four basically breaks down into three specific lines and things like that. Now, in a perfect world, we might hope that each neuron in the second to last layer corresponds with one of these subcomponents. That anytime you feed in an image with, say, a loop up top, like a 9 or an 8, there's some specific neuron whose activation is going to be close to one. And I don't mean this specific loop of pixels. The hope would be that any generally loopy pattern towards the top sets off this neuron. That way, going from the third layer to the last one just requires learning which combination of subcomponents corresponds to which digits. Of course, that just kicks the problem down the road because how would you recognize these subcomponents or even learn what the right subcomponents should be? And I still haven't even talked about how one layer influences the next. But run with me on this one for a moment. Recognizing a loop can also break down into subpros. One reasonable way to do this would be to first recognize the various little edges that make it up. Similarly, a long line like the kind you might see in the digits 1 or four or seven. Well, that's really just a long edge. Or maybe you think of it as a certain pattern of several smaller edges. So maybe our hope is that each neuron in the second layer of the network corresponds with the various relevant little edges. Maybe when an image like this one comes in, it lights up all of the neurons associated with around 8 to 10 specific little edges, which in turn lights up the neurons associated with the upper loop and a long vertical line, and those light up the neuron associated with a nine. Whether or not this is what our final network actually does is another question, one that I'll come back to once we see how to train the network. But this is a hope that we might have a sort of goal with the layered structure like this. Moreover, you can imagine how being able to detect edges and patterns like this would be really useful for other image recognition tasks. And even beyond image recognition, there are all sorts of intelligent things you might want to do that break down into layers of abstraction. Parsing speech, for example, involves taking raw audio and picking out distinct sounds which combine to make certain syllables, which combine to form words, which combine to make up phrases and more abstract thoughts, etc. But getting back to how any of this actually works, picture yourself right now designing how exactly the activations in one layer might determine the activations in the next. The goal is to have some mechanism that could conceivably combine pixels into edges or edges into patterns or patterns

Edge detection example

into digits. And to zoom in on one very specific example, let's say the hope is for one particular neuron in the second layer to pick up on whether or not the image has an edge in this region here. The question at hand is what parameters should the network have? What dials and knobs should you be able to tweak so that it's expressive enough to potentially capture this pattern or any other pixel pattern or the pattern that several edges can make a loop and other such things. Well, what we'll do is assign a weight to each one of the connections between our neuron and the neurons from the first layer. These weights are just numbers. Then take all of those activations from the first layer and compute their weighted sum according to these weights. I find it helpful to think of these weights as being organized into a little grid of their own. And I'm going to use green pixels to indicate positive weights and red pixels to indicate negative weights where the brightness of that pixel is some loose depiction of the weights value. Now, if we made the weights associated with almost all of the pixels zero, except for some positive weights in this region that we care about, then taking the weighted sum of all the pixel values really just amounts to adding up the values of the pixel just in the region that we care about. And if you really wanted to pick up on whether there's an edge here, what you might do is have some negative weights associated with the surrounding pixels. Then the sum is largest when those middle pixels are bright but the surrounding pixels are darker. When you compute a weighted sum like this, you might come out with any number. But for this network, what we want is for activations to be some value between 0 and 1. So a common thing to do is to pump this weighted sum into some function that squishes the real number line into the range between 0 and 1. And a common function that does this is called the sigmoid function, also known as a logistic curve. Basically, very negative inputs end up close to zero, very positive one, and it just steadily increases around the input zero. So, the activation of the neuron here is basically a measure of how positive the relevant weighted sum is. But maybe it's not that you want the neuron to light up when the weighted sum is bigger than zero. Maybe you only want it to be active when the sum is bigger than say 10. That is you want some bias for it to be inactive. What we'll do then is just add in some other number like -10 to this weighted sum before plugging it through the sigmoid squishification function. That additional number is called the bias. So the weights tell you what pixel pattern this neuron in the second layer is picking up on and the bias tells you how high the weighted sum needs to be before the neuron starts getting meaningfully

Edge detection example

into digits. And to zoom in on one very specific example, let's say the hope is for one particular neuron in the second layer to pick up on whether or not the image has an edge in this region here. The question at hand is what parameters should the network have? What dials and knobs should you be able to tweak so that it's expressive enough to potentially capture this pattern or any other pixel pattern or the pattern that several edges can make a loop and other such things. Well, what we'll do is assign a weight to each one of the connections between our neuron and the neurons from the first layer. These weights are just numbers. Then take all of those activations from the first layer and compute their weighted sum according to these weights. I find it helpful to think of these weights as being organized into a little grid of their own. And I'm going to use green pixels to indicate positive weights and red pixels to indicate negative weights where the brightness of that pixel is some loose depiction of the weights value. Now, if we made the weights associated with almost all of the pixels zero, except for some positive weights in this region that we care about, then taking the weighted sum of all the pixel values really just amounts to adding up the values of the pixel just in the region that we care about. And if you really wanted to pick up on whether there's an edge here, what you might do is have some negative weights associated with the surrounding pixels. Then the sum is largest when those middle pixels are bright but the surrounding pixels are darker. When you compute a weighted sum like this, you might come out with any number. But for this network, what we want is for activations to be some value between 0 and 1. So a common thing to do is to pump this weighted sum into some function that squishes the real number line into the range between 0 and 1. And a common function that does this is called the sigmoid function, also known as a logistic curve. Basically, very negative inputs end up close to zero, very positive one, and it just steadily increases around the input zero. So, the activation of the neuron here is basically a measure of how positive the relevant weighted sum is. But maybe it's not that you want the neuron to light up when the weighted sum is bigger than zero. Maybe you only want it to be active when the sum is bigger than say 10. That is you want some bias for it to be inactive. What we'll do then is just add in some other number like -10 to this weighted sum before plugging it through the sigmoid squishification function. That additional number is called the bias. So the weights tell you what pixel pattern this neuron in the second layer is picking up on and the bias tells you how high the weighted sum needs to be before the neuron starts getting meaningfully

Edge detection example

into digits. And to zoom in on one very specific example, let's say the hope is for one particular neuron in the second layer to pick up on whether or not the image has an edge in this region here. The question at hand is what parameters should the network have? What dials and knobs should you be able to tweak so that it's expressive enough to potentially capture this pattern or any other pixel pattern or the pattern that several edges can make a loop and other such things. Well, what we'll do is assign a weight to each one of the connections between our neuron and the neurons from the first layer. These weights are just numbers. Then take all of those activations from the first layer and compute their weighted sum according to these weights. I find it helpful to think of these weights as being organized into a little grid of their own. And I'm going to use green pixels to indicate positive weights and red pixels to indicate negative weights where the brightness of that pixel is some loose depiction of the weights value. Now, if we made the weights associated with almost all of the pixels zero, except for some positive weights in this region that we care about, then taking the weighted sum of all the pixel values really just amounts to adding up the values of the pixel just in the region that we care about. And if you really wanted to pick up on whether there's an edge here, what you might do is have some negative weights associated with the surrounding pixels. Then the sum is largest when those middle pixels are bright but the surrounding pixels are darker. When you compute a weighted sum like this, you might come out with any number. But for this network, what we want is for activations to be some value between 0 and 1. So a common thing to do is to pump this weighted sum into some function that squishes the real number line into the range between 0 and 1. And a common function that does this is called the sigmoid function, also known as a logistic curve. Basically, very negative inputs end up close to zero, very positive one, and it just steadily increases around the input zero. So, the activation of the neuron here is basically a measure of how positive the relevant weighted sum is. But maybe it's not that you want the neuron to light up when the weighted sum is bigger than zero. Maybe you only want it to be active when the sum is bigger than say 10. That is you want some bias for it to be inactive. What we'll do then is just add in some other number like -10 to this weighted sum before plugging it through the sigmoid squishification function. That additional number is called the bias. So the weights tell you what pixel pattern this neuron in the second layer is picking up on and the bias tells you how high the weighted sum needs to be before the neuron starts getting meaningfully

Edge detection example

into digits. And to zoom in on one very specific example, let's say the hope is for one particular neuron in the second layer to pick up on whether or not the image has an edge in this region here. The question at hand is what parameters should the network have? What dials and knobs should you be able to tweak so that it's expressive enough to potentially capture this pattern or any other pixel pattern or the pattern that several edges can make a loop and other such things. Well, what we'll do is assign a weight to each one of the connections between our neuron and the neurons from the first layer. These weights are just numbers. Then take all of those activations from the first layer and compute their weighted sum according to these weights. I find it helpful to think of these weights as being organized into a little grid of their own. And I'm going to use green pixels to indicate positive weights and red pixels to indicate negative weights where the brightness of that pixel is some loose depiction of the weights value. Now, if we made the weights associated with almost all of the pixels zero, except for some positive weights in this region that we care about, then taking the weighted sum of all the pixel values really just amounts to adding up the values of the pixel just in the region that we care about. And if you really wanted to pick up on whether there's an edge here, what you might do is have some negative weights associated with the surrounding pixels. Then the sum is largest when those middle pixels are bright but the surrounding pixels are darker. When you compute a weighted sum like this, you might come out with any number. But for this network, what we want is for activations to be some value between 0 and 1. So a common thing to do is to pump this weighted sum into some function that squishes the real number line into the range between 0 and 1. And a common function that does this is called the sigmoid function, also known as a logistic curve. Basically, very negative inputs end up close to zero, very positive one, and it just steadily increases around the input zero. So, the activation of the neuron here is basically a measure of how positive the relevant weighted sum is. But maybe it's not that you want the neuron to light up when the weighted sum is bigger than zero. Maybe you only want it to be active when the sum is bigger than say 10. That is you want some bias for it to be inactive. What we'll do then is just add in some other number like -10 to this weighted sum before plugging it through the sigmoid squishification function. That additional number is called the bias. So the weights tell you what pixel pattern this neuron in the second layer is picking up on and the bias tells you how high the weighted sum needs to be before the neuron starts getting meaningfully

Counting weights and biases

active. And that is just one neuron. Every other neuron in this layer is going to be connected to all 784 pixel neurons from the first layer. And each one of those 784 connections has its own weight associated with it. Also, each one has some bias, some other number that you add on to the weighted sum before squishing it with the sigmoid. And that's a lot to think about. With this hidden layer of 16 neurons, that's a total of 784* 16 weights along with 16 biases. And all of that is just the connections from the first layer to the second. The connections between the other layers also have a bunch of weights and biases associated with them. All said and done, this network has almost exactly 13,000 total weights and biases. 13,000 knobs and dials that can be tweaked and turned to make this network behave in different ways. So when we talk about learning

Counting weights and biases

active. And that is just one neuron. Every other neuron in this layer is going to be connected to all 784 pixel neurons from the first layer. And each one of those 784 connections has its own weight associated with it. Also, each one has some bias, some other number that you add on to the weighted sum before squishing it with the sigmoid. And that's a lot to think about. With this hidden layer of 16 neurons, that's a total of 784* 16 weights along with 16 biases. And all of that is just the connections from the first layer to the second. The connections between the other layers also have a bunch of weights and biases associated with them. All said and done, this network has almost exactly 13,000 total weights and biases. 13,000 knobs and dials that can be tweaked and turned to make this network behave in different ways. So when we talk about learning

Counting weights and biases

active. And that is just one neuron. Every other neuron in this layer is going to be connected to all 784 pixel neurons from the first layer. And each one of those 784 connections has its own weight associated with it. Also, each one has some bias, some other number that you add on to the weighted sum before squishing it with the sigmoid. And that's a lot to think about. With this hidden layer of 16 neurons, that's a total of 784* 16 weights along with 16 biases. And all of that is just the connections from the first layer to the second. The connections between the other layers also have a bunch of weights and biases associated with them. All said and done, this network has almost exactly 13,000 total weights and biases. 13,000 knobs and dials that can be tweaked and turned to make this network behave in different ways. So when we talk about learning

Counting weights and biases

active. And that is just one neuron. Every other neuron in this layer is going to be connected to all 784 pixel neurons from the first layer. And each one of those 784 connections has its own weight associated with it. Also, each one has some bias, some other number that you add on to the weighted sum before squishing it with the sigmoid. And that's a lot to think about. With this hidden layer of 16 neurons, that's a total of 784* 16 weights along with 16 biases. And all of that is just the connections from the first layer to the second. The connections between the other layers also have a bunch of weights and biases associated with them. All said and done, this network has almost exactly 13,000 total weights and biases. 13,000 knobs and dials that can be tweaked and turned to make this network behave in different ways. So when we talk about learning

How learning relates

what that's referring to is getting the computer to find a valid setting for all of these many, many numbers so that it'll actually solve the problem at hand. One thought experiment that is at once fun and kind of horrifying is to imagine sitting down and setting all of these weights and biases by hand, purposefully tweaking the numbers so that the second layer picks up on edges, the third patterns, etc. I personally find this satisfying rather than just treating the network as a total black box. Because when the network doesn't perform the way you anticipate, if you've built up a little bit of a relationship with what those weights and biases actually mean, you have a starting place for experimenting with how to change the structure to improve or when the network does work, but not for the reasons you might expect. Digging into what the weights and biases are doing is a good way to challenge your assumptions and really expose the full space of possible solutions. By the

How learning relates

what that's referring to is getting the computer to find a valid setting for all of these many, many numbers so that it'll actually solve the problem at hand. One thought experiment that is at once fun and kind of horrifying is to imagine sitting down and setting all of these weights and biases by hand, purposefully tweaking the numbers so that the second layer picks up on edges, the third patterns, etc. I personally find this satisfying rather than just treating the network as a total black box. Because when the network doesn't perform the way you anticipate, if you've built up a little bit of a relationship with what those weights and biases actually mean, you have a starting place for experimenting with how to change the structure to improve or when the network does work, but not for the reasons you might expect. Digging into what the weights and biases are doing is a good way to challenge your assumptions and really expose the full space of possible solutions. By the

How learning relates

what that's referring to is getting the computer to find a valid setting for all of these many, many numbers so that it'll actually solve the problem at hand. One thought experiment that is at once fun and kind of horrifying is to imagine sitting down and setting all of these weights and biases by hand, purposefully tweaking the numbers so that the second layer picks up on edges, the third patterns, etc. I personally find this satisfying rather than just treating the network as a total black box. Because when the network doesn't perform the way you anticipate, if you've built up a little bit of a relationship with what those weights and biases actually mean, you have a starting place for experimenting with how to change the structure to improve or when the network does work, but not for the reasons you might expect. Digging into what the weights and biases are doing is a good way to challenge your assumptions and really expose the full space of possible solutions. By the

How learning relates

what that's referring to is getting the computer to find a valid setting for all of these many, many numbers so that it'll actually solve the problem at hand. One thought experiment that is at once fun and kind of horrifying is to imagine sitting down and setting all of these weights and biases by hand, purposefully tweaking the numbers so that the second layer picks up on edges, the third patterns, etc. I personally find this satisfying rather than just treating the network as a total black box. Because when the network doesn't perform the way you anticipate, if you've built up a little bit of a relationship with what those weights and biases actually mean, you have a starting place for experimenting with how to change the structure to improve or when the network does work, but not for the reasons you might expect. Digging into what the weights and biases are doing is a good way to challenge your assumptions and really expose the full space of possible solutions. By the

Notation and linear algebra

way, the actual function here is a little cumbersome to write down, don't you think? So, let me show you a more notationally compact way that these connections are represented. This is how you'd see it if you choose to read up more about neural networks. Organize all of the activations from one layer into a column as a vector. Then, organize all of the weights as a matrix where each row of that matrix corresponds to the connections between one layer and a particular neuron in the next layer. What that means is that taking the weighted sum of the activations in the first layer according to these weights corresponds to one of the terms in the matrix vector product of everything we have on the left here. By the way, so much of machine learning just comes down to having a good grasp of linear algebra. So for any of you who want a nice visual understanding for matrices and what matrix vector multiplication means, take a look at the series I did on linear algebra, especially chapter 3. Back to our expression, instead of talking about adding the bias to each one of these values independently, we represent it by organizing all those biases into a vector and adding the entire vector to the previous matrix vector product. Then as a final step, I'll wrap a sigmoid around the outside here. And what that's supposed to represent is that you're going to apply the sigmoid function to each specific component of the resulting vector inside. So once you write down this weight matrix and these vectors as their own symbols, you can communicate the full transition of activations from one layer to the next in an extremely tight and neat little expression. And this makes the relevant code both a lot simpler and a lot faster since many libraries optimize the heck out of matrix multiplication. Remember

Notation and linear algebra

way, the actual function here is a little cumbersome to write down, don't you think? So, let me show you a more notationally compact way that these connections are represented. This is how you'd see it if you choose to read up more about neural networks. Organize all of the activations from one layer into a column as a vector. Then, organize all of the weights as a matrix where each row of that matrix corresponds to the connections between one layer and a particular neuron in the next layer. What that means is that taking the weighted sum of the activations in the first layer according to these weights corresponds to one of the terms in the matrix vector product of everything we have on the left here. By the way, so much of machine learning just comes down to having a good grasp of linear algebra. So for any of you who want a nice visual understanding for matrices and what matrix vector multiplication means, take a look at the series I did on linear algebra, especially chapter 3. Back to our expression, instead of talking about adding the bias to each one of these values independently, we represent it by organizing all those biases into a vector and adding the entire vector to the previous matrix vector product. Then as a final step, I'll wrap a sigmoid around the outside here. And what that's supposed to represent is that you're going to apply the sigmoid function to each specific component of the resulting vector inside. So once you write down this weight matrix and these vectors as their own symbols, you can communicate the full transition of activations from one layer to the next in an extremely tight and neat little expression. And this makes the relevant code both a lot simpler and a lot faster since many libraries optimize the heck out of matrix multiplication. Remember

Notation and linear algebra

way, the actual function here is a little cumbersome to write down, don't you think? So, let me show you a more notationally compact way that these connections are represented. This is how you'd see it if you choose to read up more about neural networks. Organize all of the activations from one layer into a column as a vector. Then, organize all of the weights as a matrix where each row of that matrix corresponds to the connections between one layer and a particular neuron in the next layer. What that means is that taking the weighted sum of the activations in the first layer according to these weights corresponds to one of the terms in the matrix vector product of everything we have on the left here. By the way, so much of machine learning just comes down to having a good grasp of linear algebra. So for any of you who want a nice visual understanding for matrices and what matrix vector multiplication means, take a look at the series I did on linear algebra, especially chapter 3. Back to our expression, instead of talking about adding the bias to each one of these values independently, we represent it by organizing all those biases into a vector and adding the entire vector to the previous matrix vector product. Then as a final step, I'll wrap a sigmoid around the outside here. And what that's supposed to represent is that you're going to apply the sigmoid function to each specific component of the resulting vector inside. So once you write down this weight matrix and these vectors as their own symbols, you can communicate the full transition of activations from one layer to the next in an extremely tight and neat little expression. And this makes the relevant code both a lot simpler and a lot faster since many libraries optimize the heck out of matrix multiplication. Remember

Notation and linear algebra

way, the actual function here is a little cumbersome to write down, don't you think? So, let me show you a more notationally compact way that these connections are represented. This is how you'd see it if you choose to read up more about neural networks. Organize all of the activations from one layer into a column as a vector. Then, organize all of the weights as a matrix where each row of that matrix corresponds to the connections between one layer and a particular neuron in the next layer. What that means is that taking the weighted sum of the activations in the first layer according to these weights corresponds to one of the terms in the matrix vector product of everything we have on the left here. By the way, so much of machine learning just comes down to having a good grasp of linear algebra. So for any of you who want a nice visual understanding for matrices and what matrix vector multiplication means, take a look at the series I did on linear algebra, especially chapter 3. Back to our expression, instead of talking about adding the bias to each one of these values independently, we represent it by organizing all those biases into a vector and adding the entire vector to the previous matrix vector product. Then as a final step, I'll wrap a sigmoid around the outside here. And what that's supposed to represent is that you're going to apply the sigmoid function to each specific component of the resulting vector inside. So once you write down this weight matrix and these vectors as their own symbols, you can communicate the full transition of activations from one layer to the next in an extremely tight and neat little expression. And this makes the relevant code both a lot simpler and a lot faster since many libraries optimize the heck out of matrix multiplication. Remember

Recap

Remember how earlier I said these neurons are simply things that hold numbers? Well, of course, the specific numbers that they hold depends on the image you feed in. So, it's actually more accurate to think of each neuron as a function. One that takes in the outputs of all the neurons in the previous layer and spits out a number between 0 and 1. Really, the entire network is just a function. one that takes in 784 numbers as an input and spits out 10 numbers as an output. It's an absurdly complicated function, one that involves 13,000 parameters in the forms of these weights and biases that pick up on certain patterns and which involves iterating many matrix vector products and the sigmoid squishification function. But it's just a function nonetheless. And in a way, it's kind of reassuring that it looks complicated. I mean, if it were any simpler, what hope would we have that it could take on the challenge of recognizing digits? And how does it take on that challenge? How does this network learn the appropriate weights and biases just by looking at data? Well, that's what I'll show in the next video. And I'll also dig a little more into what this particular network we're seeing is really doing.

Recap

Remember how earlier I said these neurons are simply things that hold numbers? Well, of course, the specific numbers that they hold depends on the image you feed in. So, it's actually more accurate to think of each neuron as a function. One that takes in the outputs of all the neurons in the previous layer and spits out a number between 0 and 1. Really, the entire network is just a function. one that takes in 784 numbers as an input and spits out 10 numbers as an output. It's an absurdly complicated function, one that involves 13,000 parameters in the forms of these weights and biases that pick up on certain patterns and which involves iterating many matrix vector products and the sigmoid squishification function. But it's just a function nonetheless. And in a way, it's kind of reassuring that it looks complicated. I mean, if it were any simpler, what hope would we have that it could take on the challenge of recognizing digits? And how does it take on that challenge? How does this network learn the appropriate weights and biases just by looking at data? Well, that's what I'll show in the next video. And I'll also dig a little more into what this particular network we're seeing is really doing.

Recap

Remember how earlier I said these neurons are simply things that hold numbers? Well, of course, the specific numbers that they hold depends on the image you feed in. So, it's actually more accurate to think of each neuron as a function. One that takes in the outputs of all the neurons in the previous layer and spits out a number between 0 and 1. Really, the entire network is just a function. one that takes in 784 numbers as an input and spits out 10 numbers as an output. It's an absurdly complicated function, one that involves 13,000 parameters in the forms of these weights and biases that pick up on certain patterns and which involves iterating many matrix vector products and the sigmoid squishification function. But it's just a function nonetheless. And in a way, it's kind of reassuring that it looks complicated. I mean, if it were any simpler, what hope would we have that it could take on the challenge of recognizing digits? And how does it take on that challenge? How does this network learn the appropriate weights and biases just by looking at data? Well, that's what I'll show in the next video. And I'll also dig a little more into what this particular network we're seeing is really doing.

Recap

Remember how earlier I said these neurons are simply things that hold numbers? Well, of course, the specific numbers that they hold depends on the image you feed in. So, it's actually more accurate to think of each neuron as a function. One that takes in the outputs of all the neurons in the previous layer and spits out a number between 0 and 1. Really, the entire network is just a function. one that takes in 784 numbers as an input and spits out 10 numbers as an output. It's an absurdly complicated function, one that involves 13,000 parameters in the forms of these weights and biases that pick up on certain patterns and which involves iterating many matrix vector products and the sigmoid squishification function. But it's just a function nonetheless. And in a way, it's kind of reassuring that it looks complicated. I mean, if it were any simpler, what hope would we have that it could take on the challenge of recognizing digits? And how does it take on that challenge? How does this network learn the appropriate weights and biases just by looking at data? Well, that's what I'll show in the next video. And I'll also dig a little more into what this particular network we're seeing is really doing.

Some final words

Now is the point I suppose I should say subscribe to stay notified about when that video or any new videos come out. But realistically, most of you don't actually receive notifications from YouTube, do you? Maybe more honestly, I should say subscribe so that the neural networks that underly YouTube's recommendation algorithm are primed to believe that you want to see content from this channel get recommended to you. Anyway, stay posted for more. Thank you very much to everyone supporting these videos on Patreon. I've been a little slow to progress in the probability series this summer, but I'm jumping back into it after this project. So, patrons, you can look out for updates there.

Some final words

Now is the point I suppose I should say subscribe to stay notified about when that video or any new videos come out. But realistically, most of you don't actually receive notifications from YouTube, do you? Maybe more honestly, I should say subscribe so that the neural networks that underly YouTube's recommendation algorithm are primed to believe that you want to see content from this channel get recommended to you. Anyway, stay posted for more. Thank you very much to everyone supporting these videos on Patreon. I've been a little slow to progress in the probability series this summer, but I'm jumping back into it after this project. So, patrons, you can look out for updates there.

Some final words

Now is the point I suppose I should say subscribe to stay notified about when that video or any new videos come out. But realistically, most of you don't actually receive notifications from YouTube, do you? Maybe more honestly, I should say subscribe so that the neural networks that underly YouTube's recommendation algorithm are primed to believe that you want to see content from this channel get recommended to you. Anyway, stay posted for more. Thank you very much to everyone supporting these videos on Patreon. I've been a little slow to progress in the probability series this summer, but I'm jumping back into it after this project. So, patrons, you can look out for updates there.

Some final words

Now is the point I suppose I should say subscribe to stay notified about when that video or any new videos come out. But realistically, most of you don't actually receive notifications from YouTube, do you? Maybe more honestly, I should say subscribe so that the neural networks that underly YouTube's recommendation algorithm are primed to believe that you want to see content from this channel get recommended to you. Anyway, stay posted for more. Thank you very much to everyone supporting these videos on Patreon. I've been a little slow to progress in the probability series this summer, but I'm jumping back into it after this project. So, patrons, you can look out for updates there.

The final index on the bias vector should be "k"

as a final step, I'll wrap a sigmoid around the outside here. And what that's supposed to represent is that you're going to apply the sigmoid function to each specific component of the resulting vector inside. So once you write down this weight matrix and these vectors as their own symbols, you can communicate the full transition of activations from one layer to the next in an extremely tight and neat little expression. And this makes the relevant code both a lot simpler and a lot faster since many libraries optimize the heck out of matrix multiplication. Remember how earlier I said these neurons are simply things that hold numbers? Well, of course, the specific numbers that they hold depends on the image you feed in. So, it's actually more accurate to think of each neuron as a function. One that takes in the outputs of all the neurons in the previous layer and spits out a number between 0 and 1. Really, the entire network is just a function. one that takes in 784 numbers as an input and spits out 10 numbers as an output. It's an absurdly complicated function, one that involves 13,000 parameters in the forms of these weights and biases that pick up on certain patterns and which involves iterating many matrix vector products and the sigmoid squishification function. But it's just a function nonetheless. And in a way, it's kind of reassuring that it looks complicated. I mean, if it were any simpler, what hope would we have that it could take on the challenge of recognizing digits? And how does it take on that challenge? How does this network learn the appropriate weights and biases just by looking at data? Well, that's what I'll show in the next video. And I'll also dig a little more into what this particular network we're seeing is really doing. Now is the point I suppose I should say subscribe to stay notified about when that video or any new videos come out. But realistically, most of you don't actually receive notifications from YouTube, do you? Maybe more honestly, I should say subscribe so that the neural networks that underly YouTube's recommendation algorithm are primed to believe that you want to see content from this channel get recommended to you. Anyway, stay posted for more. Thank you very much to everyone supporting these videos on Patreon. I've been a little slow to progress in the probability series this summer, but I'm jumping back into it after this project. So, patrons, you can look out for updates there. To close things off here, I have with me Leysa Lee, who did her PhD work on the theoretical side of deep learning and who currently works at a venture capital firm called Amplify Partners, who kindly provided some of the funding for this video. So, Leysa, one thing I think we should quickly bring up is this sigmoid function. As I understand it, early networks used this to squish the relevant weighted sum into that interval between 0 and one. You know, kind of motivated by this biological analogy of neurons either being inactive or active. Exactly. But relatively few modern networks actually use sigmoid anymore. That's kind of old school, right? Yeah. Or rather, relu seems to be much easier to train. And relu stands for rectified linear unit. Yes. It's this kind of function where you're just taking a max of zero and a where a is given by what you were explaining in the video. And what this was sort of motivated from, I think, was a partially by a biological analogy with how neurons would either be activated or not. And so if it passes a certain threshold, it would be the identity function, but if it did not, then it would just not be activated. So be zero. So it's kind of a simplification. Using sigmoids didn't help training or it was very difficult to train at some point. and people just tried ReLU and it happened to work very well for these incredibly um deep uh neural networks. All right, thank you Alicia.

The final index on the bias vector should be "k"

as a final step, I'll wrap a sigmoid around the outside here. And what that's supposed to represent is that you're going to apply the sigmoid function to each specific component of the resulting vector inside. So once you write down this weight matrix and these vectors as their own symbols, you can communicate the full transition of activations from one layer to the next in an extremely tight and neat little expression. And this makes the relevant code both a lot simpler and a lot faster since many libraries optimize the heck out of matrix multiplication. Remember how earlier I said these neurons are simply things that hold numbers? Well, of course, the specific numbers that they hold depends on the image you feed in. So, it's actually more accurate to think of each neuron as a function. One that takes in the outputs of all the neurons in the previous layer and spits out a number between 0 and 1. Really, the entire network is just a function. one that takes in 784 numbers as an input and spits out 10 numbers as an output. It's an absurdly complicated function, one that involves 13,000 parameters in the forms of these weights and biases that pick up on certain patterns and which involves iterating many matrix vector products and the sigmoid squishification function. But it's just a function nonetheless. And in a way, it's kind of reassuring that it looks complicated. I mean, if it were any simpler, what hope would we have that it could take on the challenge of recognizing digits? And how does it take on that challenge? How does this network learn the appropriate weights and biases just by looking at data? Well, that's what I'll show in the next video. And I'll also dig a little more into what this particular network we're seeing is really doing. Now is the point I suppose I should say subscribe to stay notified about when that video or any new videos come out. But realistically, most of you don't actually receive notifications from YouTube, do you? Maybe more honestly, I should say subscribe so that the neural networks that underly YouTube's recommendation algorithm are primed to believe that you want to see content from this channel get recommended to you. Anyway, stay posted for more. Thank you very much to everyone supporting these videos on Patreon. I've been a little slow to progress in the probability series this summer, but I'm jumping back into it after this project. So, patrons, you can look out for updates there. To close things off here, I have with me Leysa Lee, who did her PhD work on the theoretical side of deep learning and who currently works at a venture capital firm called Amplify Partners, who kindly provided some of the funding for this video. So, Leysa, one thing I think we should quickly bring up is this sigmoid function. As I understand it, early networks used this to squish the relevant weighted sum into that interval between 0 and one. You know, kind of motivated by this biological analogy of neurons either being inactive or active. Exactly. But relatively few modern networks actually use sigmoid anymore. That's kind of old school, right? Yeah. Or rather, relu seems to be much easier to train. And relu stands for rectified linear unit. Yes. It's this kind of function where you're just taking a max of zero and a where a is given by what you were explaining in the video. And what this was sort of motivated from, I think, was a partially by a biological analogy with how neurons would either be activated or not. And so if it passes a certain threshold, it would be the identity function, but if it did not, then it would just not be activated. So be zero. So it's kind of a simplification. Using sigmoids didn't help training or it was very difficult to train at some point. and people just tried ReLU and it happened to work very well for these incredibly um deep uh neural networks. All right, thank you Alicia.

The final index on the bias vector should be "k"

as a final step, I'll wrap a sigmoid around the outside here. And what that's supposed to represent is that you're going to apply the sigmoid function to each specific component of the resulting vector inside. So once you write down this weight matrix and these vectors as their own symbols, you can communicate the full transition of activations from one layer to the next in an extremely tight and neat little expression. And this makes the relevant code both a lot simpler and a lot faster since many libraries optimize the heck out of matrix multiplication. Remember how earlier I said these neurons are simply things that hold numbers? Well, of course, the specific numbers that they hold depends on the image you feed in. So, it's actually more accurate to think of each neuron as a function. One that takes in the outputs of all the neurons in the previous layer and spits out a number between 0 and 1. Really, the entire network is just a function. one that takes in 784 numbers as an input and spits out 10 numbers as an output. It's an absurdly complicated function, one that involves 13,000 parameters in the forms of these weights and biases that pick up on certain patterns and which involves iterating many matrix vector products and the sigmoid squishification function. But it's just a function nonetheless. And in a way, it's kind of reassuring that it looks complicated. I mean, if it were any simpler, what hope would we have that it could take on the challenge of recognizing digits? And how does it take on that challenge? How does this network learn the appropriate weights and biases just by looking at data? Well, that's what I'll show in the next video. And I'll also dig a little more into what this particular network we're seeing is really doing. Now is the point I suppose I should say subscribe to stay notified about when that video or any new videos come out. But realistically, most of you don't actually receive notifications from YouTube, do you? Maybe more honestly, I should say subscribe so that the neural networks that underly YouTube's recommendation algorithm are primed to believe that you want to see content from this channel get recommended to you. Anyway, stay posted for more. Thank you very much to everyone supporting these videos on Patreon. I've been a little slow to progress in the probability series this summer, but I'm jumping back into it after this project. So, patrons, you can look out for updates there. To close things off here, I have with me Leysa Lee, who did her PhD work on the theoretical side of deep learning and who currently works at a venture capital firm called Amplify Partners, who kindly provided some of the funding for this video. So, Leysa, one thing I think we should quickly bring up is this sigmoid function. As I understand it, early networks used this to squish the relevant weighted sum into that interval between 0 and one. You know, kind of motivated by this biological analogy of neurons either being inactive or active. Exactly. But relatively few modern networks actually use sigmoid anymore. That's kind of old school, right? Yeah. Or rather, relu seems to be much easier to train. And relu stands for rectified linear unit. Yes. It's this kind of function where you're just taking a max of zero and a where a is given by what you were explaining in the video. And what this was sort of motivated from, I think, was a partially by a biological analogy with how neurons would either be activated or not. And so if it passes a certain threshold, it would be the identity function, but if it did not, then it would just not be activated. So be zero. So it's kind of a simplification. Using sigmoids didn't help training or it was very difficult to train at some point. and people just tried ReLU and it happened to work very well for these incredibly um deep uh neural networks. All right, thank you Alicia.

The final index on the bias vector should be "k"

as a final step, I'll wrap a sigmoid around the outside here. And what that's supposed to represent is that you're going to apply the sigmoid function to each specific component of the resulting vector inside. So once you write down this weight matrix and these vectors as their own symbols, you can communicate the full transition of activations from one layer to the next in an extremely tight and neat little expression. And this makes the relevant code both a lot simpler and a lot faster since many libraries optimize the heck out of matrix multiplication. Remember how earlier I said these neurons are simply things that hold numbers? Well, of course, the specific numbers that they hold depends on the image you feed in. So, it's actually more accurate to think of each neuron as a function. One that takes in the outputs of all the neurons in the previous layer and spits out a number between 0 and 1. Really, the entire network is just a function. one that takes in 784 numbers as an input and spits out 10 numbers as an output. It's an absurdly complicated function, one that involves 13,000 parameters in the forms of these weights and biases that pick up on certain patterns and which involves iterating many matrix vector products and the sigmoid squishification function. But it's just a function nonetheless. And in a way, it's kind of reassuring that it looks complicated. I mean, if it were any simpler, what hope would we have that it could take on the challenge of recognizing digits? And how does it take on that challenge? How does this network learn the appropriate weights and biases just by looking at data? Well, that's what I'll show in the next video. And I'll also dig a little more into what this particular network we're seeing is really doing. Now is the point I suppose I should say subscribe to stay notified about when that video or any new videos come out. But realistically, most of you don't actually receive notifications from YouTube, do you? Maybe more honestly, I should say subscribe so that the neural networks that underly YouTube's recommendation algorithm are primed to believe that you want to see content from this channel get recommended to you. Anyway, stay posted for more. Thank you very much to everyone supporting these videos on Patreon. I've been a little slow to progress in the probability series this summer, but I'm jumping back into it after this project. So, patrons, you can look out for updates there. To close things off here, I have with me Leysa Lee, who did her PhD work on the theoretical side of deep learning and who currently works at a venture capital firm called Amplify Partners, who kindly provided some of the funding for this video. So, Leysa, one thing I think we should quickly bring up is this sigmoid function. As I understand it, early networks used this to squish the relevant weighted sum into that interval between 0 and one. You know, kind of motivated by this biological analogy of neurons either being inactive or active. Exactly. But relatively few modern networks actually use sigmoid anymore. That's kind of old school, right? Yeah. Or rather, relu seems to be much easier to train. And relu stands for rectified linear unit. Yes. It's this kind of function where you're just taking a max of zero and a where a is given by what you were explaining in the video. And what this was sort of motivated from, I think, was a partially by a biological analogy with how neurons would either be activated or not. And so if it passes a certain threshold, it would be the identity function, but if it did not, then it would just not be activated. So be zero. So it's kind of a simplification. Using sigmoids didn't help training or it was very difficult to train at some point. and people just tried ReLU and it happened to work very well for these incredibly um deep uh neural networks. All right, thank you Alicia.

Другие видео автора — 3Blue1Brown

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник