A pretty reason why Gaussian + Gaussian = Gaussian

13:15

A pretty reason why Gaussian + Gaussian = Gaussian

3Blue1Brown 11.07.2023 929 287 просмотров 24 596 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

A visual trick to compute the sum of two normally distributed variables. 3b1b mailing list: https://3blue1brown.substack.com/ Help fund future projects: https://www.patreon.com/3blue1brown Special thanks to these supporters: https://www.3blue1brown.com/lessons/gaussian-convolution#thanks For the technically curious who want to go deeper, here's a proof of the central limit theorem using Moment generating functions: https://www.cs.toronto.edu/~yuvalf/CLT.pdf And here's a nice discussion of methods using entropy: https://mathoverflow.net/questions/182752/central-limit-theorem-via-maximal-entropy Relevant previous videos Central limit theorem https://youtu.be/zeJD6dqJ5lo Why π is there, and the Herschel-Maxwell derivation https://youtu.be/cy8r7WSuT1I Convolutions and adding random variables https://youtu.be/IaSGqQa5O-M Time stamps 0:00 - Recap on where we are 2:10 - What direct calculation would look like 3:38 - The visual trick 8:27 - How this fits into the Central Limit Theorem 12:30 - Mailing list Thanks to these viewers for their contributions to translations German: lprecord, qoheniac Spanish: Pablo Asenjo Navas-Parejo Vietnamese: Duy Tran ------------------ These animations are largely made using a custom Python library, manim. See the FAQ comments here: https://www.3blue1brown.com/faq#manim https://github.com/3b1b/manim https://github.com/ManimCommunity/manim/ You can find code for specific videos and projects here: https://github.com/3b1b/videos/ Music by Vincent Rubinetti. https://www.vincentrubinetti.com/ Download the music on Bandcamp: https://vincerubinetti.bandcamp.com/album/the-music-of-3blue1brown Stream the music on Spotify: https://open.spotify.com/album/1dVyjwS8FBqXhRunaG5W5u ------------------ 3blue1brown is a channel about animating math, in all senses of the word animate. And you know the drill with YouTube, if you want to stay posted on new videos, subscribe: http://3b1b.co/subscribe Various social media stuffs: Website: https://www.3blue1brown.com Twitter: https://twitter.com/3blue1brown Reddit: https://www.reddit.com/r/3blue1brown Instagram: https://www.instagram.com/3blue1brown Patreon: https://patreon.com/3blue1brown Facebook: https://www.facebook.com/3blue1brown

Оглавление (5 сегментов)

Recap on where we are

The basic function underlying a normal distribution, aka a Gaussian, is e to the negative x squared. But you might wonder, why this function? Of all the expressions we could dream up that give you some symmetric smooth graph with mass concentrated towards the middle, why is it that the theory of probability seems to have a special place in its heart for this particular expression? For the last many videos I've been hinting at an answer to this question, and here we'll finally arrive at something like a satisfying answer. As a quick refresher on where we are, a couple videos ago we talked about the central limit theorem, which describes how as you add multiple copies of a random variable, for example rolling a weighted die many different times, or letting a ball bounce off of a peg repeatedly, then the distribution describing that sum tends to look approximately like a normal distribution. What the central limit theorem says is as you make that sum bigger and bigger, under appropriate conditions, that approximation to a normal becomes better and better. But I never explained why this theorem is actually true. We only talked about what it's claiming. In the last video we started talking about the math involved in adding two random variables. If you have two random variables, each following some distribution, then to find the distribution describing the sum of those variables, you compute something known as a convolution between the two original functions. And we spent a lot of time building up two distinct ways to visualize what this convolution operation really is. Today our basic job is to work through a particular example, which is to ask what happens when you add two normally distributed random variables, which, as you know by now, is the same as asking what do you get if you compute a convolution between two Gaussian functions. I'd like to share an especially pleasing visual way that you can think about this calculation, which hopefully offers some sense of what makes the e to the negative x squared function special in the first place. After we walk through it, we'll talk about how this calculation is one of the steps involved in proving the central limit theorem. It's the step that answers the question of why a Gaussian and not something else is the central limit. But first, let's dive in. The full formula for a Gaussian is more complicated than just e to the negative x squared.

What direct calculation would look like

The exponent is typically written as negative one half times x divided by sigma squared, where sigma describes the spread of the distribution, specifically the standard deviation. All of this needs to be multiplied by a fraction on the front, which is there to make sure that the area under the curve is one, making it a valid probability distribution. And if you want to consider distributions that aren't necessarily centered at zero, you would also throw another parameter, mu, into the exponent like this. Although for everything we'll be doing here, we just consider centered distributions. Now if you look at our central goal for today, which is to compute a convolution between two Gaussian functions, the direct way to do this would be to take the definition of a convolution, this integral expression we built up last video, and then to plug in for each one of the functions involved the formula for a Gaussian. It's kind of a lot of symbols when you throw it all together, but more than anything, working this out is an exercise in completing the square. And there's nothing wrong with that. That will get you the answer that you want. But of course, you know me, I'm a sucker for visual intuition, and in this case, there's another way to think about it that I haven't seen written about before that offers a very nice connection to other aspects of this distribution, like the presence of pi and certain ways to derive where it comes from. And the way I'd like to do this is by first peeling away all of the constants associated with the actual distribution, and just showing the computation for the simplified form, e to the negative x squared. The essence of what we want to compute is what the

The visual trick

convolution between two copies of this function looks like. If you'll remember, in the last video we had two different ways to visualize convolutions, and the one we'll be using here is the second one involving diagonal slices. And as a quick reminder of the way that worked, if you have two different distributions that are described by two different functions, f and g, then every possible pair of values that you might get when you sample from these two distributions can be thought of as individual points on the xy-plane. And the probability density of landing on one such point, assuming independence, looks like f of x times g of y. So what we do is we look at a graph of that expression as a two-variable function of x and y, which is a way of showing the distribution of all possible outcomes when we sample from the two different variables. To interpret the convolution of f and g evaluated on some input s, which is a way of saying how likely are you to get a pair of samples that adds up to this sum s, what you do is you look at a slice of this graph over the line x plus y equals s, and you consider the area under that slice. This area is almost, but not quite, the value of the convolution at s. For a mildly technical reason, you need to divide by the square root of 2. Still, this area is the key feature to focus on. You can think of it as a way to combine together all the probability densities for all of the outcomes corresponding to a given sum. In the specific case where these two functions look like e to the negative x squared and y squared, the resulting 3D graph has a really nice property that you can exploit. It's rotationally symmetric. You can see this by combining the terms and noticing that it's entirely a function of x squared plus y squared, and this term describes the square of the distance between any point on the xy plane and the origin. So in other words, the expression is purely a function of the distance from the origin. And by the way, this would not be true for any other distribution. It's a property that uniquely characterizes bell curves. So for most other pairs of functions, these diagonal slices will be some complicated shape that's hard to think about, and honestly, calculating the area would just amount to computing the original integral that defines a convolution in the first place. So in most cases, the visual intuition doesn't really buy you anything. But in the case of bell curves, you can leverage that rotational symmetry. Here, focus on one of these slices over the line x plus y equals s for some value of s. And remember, the convolution that we're trying to compute is a function of s. The thing that you want is an expression of s that tells you the area under this slice. Well, if you look at that line, it intersects the x-axis at s zero and the y-axis at zero s, and a little bit of Pythagoras will show you that the straight line distance from the origin to this line is s divided by the square root of two. Now, because of the symmetry, this slice is identical to one that you get rotating 45 degrees where you'd find something parallel to the y-axis the same distance away from the origin. The key is that computing this other area of a slice parallel to the y-axis is much, much easier than slices in other directions because it only involves taking an integral with respect to y. The value of x on this slice is a constant. Specifically, it would be the constant s divided by the square root of two. So when you're computing the integral, finding this area, all of this term here behaves like it was just some number, and you can factor it out. This is the important point. All of the stuff that's involving s is now entirely separate from the integrated variable. This remaining integral is a little bit tricky. I did a whole video on it, it's actually quite famous. But you almost don't really care. The point is that it's just some number. That number happens to be the square root of pi, but what really matters is that it's something with no dependence on s. And essentially this is our answer. We were looking for an expression for the area of these slices as a function of s, and now we have it. It looks like e to the negative s squared divided by two, scaled by some constant. In other words, it's also a bell curve, another Gaussian, just stretched out a little bit because of this two in the exponent. As I said earlier, the convolution evaluated at s is not quite this area. Technically it's this area divided by the square root of two. We talked about it in the last video, but it doesn't really matter because it just gets baked into the constant. What really matters is the conclusion that a convolution between two Gaussians is itself another Gaussian.

How this fits into the Central Limit Theorem

If you were to go back and reintroduce all of the constants for a normal distribution with a mean zero and an arbitrary standard deviation sigma, essentially identical reasoning will lead to the same square root of two factor that shows up in the exponent and out front, and it leads to the conclusion that the convolution between two such normal distributions is another normal distribution with a standard deviation square root of two times sigma. If you haven't computed a lot of convolutions before, it's worth emphasizing this is a very special result. Almost always you end up with a completely different kind of function, but here there's a sort of stability to the process. Also, for those of you who enjoy exercises, I'll leave one up on the screen for how you would handle the case of two different standard deviations. Still, some of you might be raising your hands and saying, what's the big deal? I mean, when you first heard the question, what do you get when you add two normally distributed random variables, you probably even guessed that the answer should be another normally distributed variable. After all, what else is it going to be? Normal distributions are supposedly quite common, so why not? You could even say that this should follow from the central limit theorem. But that would have it all backwards. First of all, the supposed ubiquity of normal distributions is often a little exaggerated, but to the extent that they do come up, it is because of the central limit theorem, but it would be cheating to say the central limit theorem implies this result because this computation we just did is the reason that the function at the heart of the central limit theorem is a Gaussian in the first place, and not some other function. We've talked all about the central limit theorem before, but essentially it says if you repeatedly add copies of a random variable to itself, which mathematically looks like repeatedly computing convolutions against a given distribution, then after appropriate shifting and rescaling, the tendency is always to approach a normal distribution. Technically, there's a small assumption the distribution you start with can't have infinite variance, but it's a relatively soft assumption. The magic is that for a huge category of initial distributions, this process of adding a whole bunch of random variables drawn from that distribution always tends towards this one universal shape, a Gaussian. One common approach to proving this theorem involves two separate steps. The first step is to show that for all the different finite variance distributions you might start with, there exists a single universal shape that this process of repeated convolutions tends towards. This step is actually pretty technical, it goes a little beyond what I want to talk about here. You often use these objects called moment generating functions that gives you a very abstract argument that there must be some universal shape, but it doesn't make any claim about what that particular shape is, just that everything in this big family is tending towards a single point in the space of distributions. So then step number two is what we just showed in this video, prove that the convolution of two Gaussians gives another Gaussian. What that means is that as you apply this process of repeated convolutions, a Gaussian doesn't change, it's a fixed point, so the only thing it can approach is itself, and since it's one member in this big family of distributions, all of which must be tending towards a single universal shape, it must be that universal shape. I mentioned at the start how this calculation, step two, is something that you can do directly, just symbolically with the definitions, but one of the reasons I'm so charmed by a geometric argument that leverages the rotational symmetry of this graph is that it directly connects to a few things that we've talked about on this channel before, for example, the Herschel-Maxwell derivation of a Gaussian, which essentially says that you can view this rotational symmetry as the defining feature of the distribution, that it locks you into this e to the negative x squared form, and also as an added bonus, it connects to the classic proof for why pi shows up in the formula, meaning we now have a direct line between the presence and mystery of that pi and the central limit theorem. Also, on a recent Patreon post, the channel supporter Daksha Vaid-Quinter brought my attention to a completely different approach I hadn't seen before, which leverages the use of entropy, and again, for the theoretically curious among you, I'll leave some links in the description.

Mailing list

By the way, if you want to stay up to date with new videos, and also any other projects that I put out there, like the Summer of Math Exposition, there is a mailing list. It's relatively new, and I'm pretty sparing about only posting what I think people will enjoy. Usually I try not to be too promotional at the end of videos these days, but if you are interested in following the work that I do, this is probably one of the most enduring ways to do so.

Другие видео автора — 3Blue1Brown

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник