# Continuous Probability Distributions: The Normal Distribution (STAT  I – Full Lecture)

## Метаданные

- **Канал:** jbstatistics
- **YouTube:** https://www.youtube.com/watch?v=lqczxjR6MbA
- **Дата:** 18.02.2026
- **Длительность:** 42:48
- **Просмотры:** 1,167
- **Источник:** https://ekstraktznaniy.ru/video/52882

## Описание

In this lecture-style video, I discuss the normal distribution  (Pitched at the level of an applied intro stats course at university.)  I discuss some of its general properties, and do some examples of finding areas and percentiles using R.  (I do not use statistical tables in my course or in the video; I use R in my courses.)

This is a full lecture for my STAT I course, where I work through my lecture outline document (a condensed version of my full textbook chapter on continuous probability distributions). My students have these documents.

(In the coming year, I'll be releasing additional shorter videos on specific topics in statistics, along with more of these lecture-style videos.)

## Транскрипт

### Segment 1 (00:00 - 05:00) []

Hello everybody and welcome to our continuing discussion of continuous probability distributions with a discussion of the everimportant normal distribution. The normal distribution is often called the Gaussian distribution in honor of uh Carl Friedrich Gaus, one of history's great mathematicians, but I typically refer to it as the normal distribution. It's very important for us in a lot of ways. uh one reason being a lot of things that we a lot of variables we have and plot out in a histogram say look approximately normal uh and we'll see this in a variety of examples that when we get data and plot it out there's a lot of times where it looks approximately normal we'll see this little bell-shaped curve in a moment or two but also much statistical theory is based on the assumption of a normally distributed population the mathematics often works out really well for that nice clean results and combined with the fact that we happen to see that quite a bit. Uh it's a very important uh distribution for us in statistical inference. So let's have a quick look here and we have this figure illustrating normal distribution with a mean mu of 162 and a standard deviation sigma of 7. If we plot that out, we have this probability density function for this uh continuous random variable and it is approximately the distribution of the height of a randomly selected adult American woman. We don't know the true distribution of this of course but this is approximately the distribution. So uh we have heights here uh in the low 60s intervals here heights you know 162 164 163. 7 these kind of things are fairly common. Uh and then people between 185 cmters and 190 cmters say that's getting to be uh quite rare. So we have this where they're fairly likely near the mean getting values near the mean quite quite likely and then it gets less and less likely as we get further and further away from the mean. That's casual speak for all of this. Of course, we'll shore this up a little bit. But over here, we have our probability density function f ofx that giving us the height of the curve, which is of course not a probability in itself as we've discussed before because probabilities are areas under the curve. And we'll look at that in more detail today. But this nice little pleasing to the eye bellshaped curve has a precise mathematical function. It is And it is this. So we've got some very important mathematical constants in here. We've got our pi sitting there. We've got e. We've got the root of two. All sorts of interesting mathematical numbers uh appear in the probability density function of our normal distribution. Uh and when we plot this out, you just get this curve up here. That's what it looks like in this particular case. And uh where mu was 162 and sigma was 7. Now if a random variable X has this probability distribution then it can take on conceptually any finite value here. Uh if we go up to the curve if we're looking at this curve the curve never actually touches the horizontal axis. Never for reals touches the horizontal axis but it gets really really close. So the areas let's say out beyond 190 or beyond 130 are really small theoretically conceptually mathematically there's some nonzero area no matter how far you go out but in a practical case that that's not uh a meaningful problem for us for any sort of modeling. So uh we have these uh two parameters. We can say the parameters of the distribution. We could say they're mu and sigma squared. Say those are the parameters of our normal distribution. Or you might say mu and sigma. Sigma just being the square root of sigma squared. Of course these are uh are have our usual meanings of mu and sigma squar. Mu is the mean of the distribution. And more on that in in a minute. But we can think of this as the mean of our probability distribution and sigma squared as the variance is the standard deviation of our probability distribution. And if a random variable X has a normal distribution, we might write it like this in short form notation. X is distributed normally with a mean of mu and a variance of sigma squared. So I'm going to have this second term be the variance. That's not universal notation. Sometimes people put the standard deviation in there, but I am going to use the variance there. Historically that's what we've done typically. Uh and that is how I'm going to keep it. So X distributed normally with mean meu variance sigma squared. That's how that reads. Now our mean and variance or mean

### Segment 2 (05:00 - 10:00) [5:00]

and standard deviation. We're going to look into the meaning of these terms right now. Let's think about this for a minute though. Uh just casually the mean can be any finite value here. There's no restrictions on mu other than it being a finite value. So we can say that it's lies between minus infinity and infinity. Let's say uh and the variance in general uh our variance for any probability distribution it has to be at least zero. But if the variance is zero, it's not a normal distribution. So our variance then has to be uh greater than zero some finite value. Okay. So we have these determine what your the precise nature of your distribution. If you know your random variable is normally distributed and you know its mean and variance, you know all there is to know about it. So let's see what these mean though. If we look at this probability density function at first, you see that X only appears in one place. So this x - mu u^2 part right x is appearing in one place in all of this x - mu^2. So it's simply the distance that x is from the mean in combination with all these other constants here. The distance x is from the mean that determines the height of the curve. So just looking at this I think we have the idea that it's got to be symmetric about mu. And let's see what that means uh visually here. So we have where uh the normal distribution always looks something like this. Mu is the mean but it's also the median. It's a symmetric distribution. It's a symmetric distribution here. And for a symmetric distribution uh where we have finite mean uh the mean and the median are going to be equal. So here the we typically think of mu terms of the mean when we're talking about things. But here we can just as easily think of it in terms of the median of your uh of your normal distribution. Uh and it's also the mode which we'll talk about less but you know think about it fairly often that that's where the peak occurs here. Mu is also where the peak of the distribution occurs. So the mean, the median and the mode we could think of it as all three and then you change the value of mu you're just shifting where that bell shape happens along uh your real line there. the standard deviation, right? The standard deviation is a measure of spread, a measure of dispersion. This same meaning as it has had for us all the way along. So, I've plotted two curves here. They both have the same mean mew. So, they're both symmetric about mew. And this red dashed curve uh this has a greater variability. The red dash curve standard deviation is double the standard deviation of the other one. uh and as we know the area under each curve must be one because it's continuous probability distribution. So if there's greater variability for this red dashed curve that means it's got to have greater area in the tails and a lower peak. So that's visually what the standard deviation is telling us. Now let's think about the standard deviation in another way here. Recall the empirical rule that we talked about back in desh descriptive statistics. If you remember the empirical rule, it told us stuff like, well, for mound shaped distributions, approximately 68% of the observations lie within one standard deviation of the mean. Approximately 95% of the observations lie within two standard deviations of the mean. And then we said something else for three, but those are all based on uh the normal distribution. So if we look at this for uh if we actually go to the normal curve itself the area under the curve from one standard deviation below the mean to one standard deviation above is 683 to three decimal places. And if we go to two standard deviations, two standard deviations out from the mean, well, there's n54 of the area uh between two from two standard deviations below to two standard deviations above. And if we go to three standard deviations, then we have uh n97 [snorts] of the area from three standard deviations below the mean to three standard deviations above. Where's all this coming from? These areas, we'll check that out in a minute. I'll show you how to find that in a minute. But for now, I'm tying this in with the empirical rule that we talked about back in descriptive statistics, uh, when we said, you know, about 68% about 95% that sort of thing. Um, those based on the normal distribution. So, if we think about this at three standard deviations, well, from three standard deviations below to three standard deviations above, just looking at the curve, that's capturing almost all of the area. And just keep in mind that we've got we look at this that's telling us that about 3,000 of the area falls outside of

### Segment 3 (10:00 - 15:00) [10:00]

three standard deviations from the mean. So you're getting almost all the area in there in a sense. Okay, how did we get these areas? Finding areas under a normal curve is going to be very important for us. So remember that for continuous probability distributions, probabilities uh correspond to areas under the curve. you're asked for probability of something that means find the area under the curve. So finding areas under a normal curve is going to be very important uh extremely important for us because this is going to come up in statistical inference as well. We're going to have all sorts of inference procedures. Uh and some of these inference procedures we're going to need to find areas and percentiles under normal curves. It's just a thing that comes up for us routinely. So we need to get this sort of thing down. Now how do we find those areas? Well, if we thought about this, suppose the random variable X is normally distributed with mean of five and a standard deviation of one. What is the probability that X takes on a value that is less than or equal to 4. 3? Well, what we do in these things, always draw a picture. Just always draw a picture right away illustrating the situation for yourself. Just a quick little sit uh just be situational awareness here. We've got a normal curve, a mean of five. Yeah, the standard deviation's one. We don't have to put you to get all these numbers down perfectly and draw a perfect curve, but let's just get an idea of what we're looking for. We want the probability that X takes on a value less than or equal to 4. 3. Right away in your head, that should just be area under the curve to the left of 4. 3. That's what it means. That's not just a normal distribution thing. That's a continuous probability distribution thing. You want the probability that X takes on a value less than or equal to 4. 3. Area under the curve to the left of 4. 3. So over here somewhere, we don't have to get it right on the money. Just somewhere over here is 4. 3. Somewhere over there is 4. 3. And that is the area that uh we seek. That's what we're looking for. That's the probability that X takes on a value less than or equal to 4. 3. That's what that is. That's what we want to find. Now, I'm going to draw this out nicely for us for a moment here. So, just to show you, I didn't want to put just the computer thing down. I knew I had this here, but we just want to have that casual curve. And then here's the 4. 3. And so, this is what it really looks like if you use software to plot this out. And then to find this conceptually we need to integrate the probability density function of the normal distribution from minus infinity to 4. 3. That's conceptually what we need to do. We want areas under curves. Calculus does this for us. This is one of the reasons why we integrate, right? We want to find areas under the curves. This is what's going over there. This is what we want. need. So conceptually we want to integrate the probability density function of the normal uh of the normal distribution from minus infinity to 4. 3. Now there's no closed form solution for that. There isn't a nice little clean little thing where it works out to oh it's just e to the minus 2x or something like that. That's not how it is. It's it needs to be uh integrated numerically which means we need to rely on software for this. So we will rely on software to find areas under the curves. And uh for us we're going to use r and the p norm function in r gives us the area to the left of the value we put in. So to find this area uh we would put in p norm 4. 3 and then r needs to know what the mean and standard deviation are. It needs to know what the parameters of the distribution are. So you have to tell it that the mean is five and the standard deviation is one. Uh and if you do that we get this value of uh 0. 242. So 0. 242 to three decimal places that's that area. And so that means that's that probability. And we're going to be needing to find probabilities for a normal curve uh a lot. And we're going to be doing this a lot. Uh and we need to keep in mind just a few general principles. the area under the entire curve is one. So if you wanted let's say the probability X takes on a value bigger than 4. 3, it just be one minus the area to the left. Right? This is this is the idea here. Remember for continuous probability distributions, this is a continuous probability distribution. So the probability that X takes on a value equal to 4. 3 would be zero because for any continuous random variable X and any constant a right or just any constant say uh probability X it takes on that exact value is zero for reasons we

### Segment 4 (15:00 - 20:00) [15:00]

discussed earlier. So if you wanted this bit out here, we could find that uh straightforward fashion just saying that area the probability X takes on a value let's say bigger than 4. 3 which would also be the probability that X takes on a value bigger than or equal to 4. 3. Those two things are the same given this uh then this is just simply 1 minus. 242. So this is how we can find areas to the right rather easily. Okay. So we have uh an infinity of normal distributions corresponding to any mean and any standard deviation and it is very useful to uh have a baseline distribution from which to work. So have a single distribution. So we can talk about this language of this one distribution, one normal distribution. Sometimes distill everything down to that and we can go out from there rather than talk about oh for your normal distribution with your mean and standard deviation. You know we have a natural language to speak of this one baseline distribution from which to work. And this will become more obvious later why we need such a thing. But take my word for it now. It is useful um for us and needed quite frankly to have a baseline distribution from which to work and that baseline distribution is the standard normal distribution. So just get this down straight now. Don't be looking this up in three weeks. The standard normal distribution is a normal distribution with a mean of zero and a standard deviation of one and of course a variance of one as well, right? Sigma squared of one. We can think of it as having a variance of one or a standard deviation of one because of course one squared is one. So this is what the standard normal distribution is. You don't want to be rifling through notes trying to figure think what the standard normal distribution is. This is a fairly straightforward concept. So just know this now and remember it forever. This is what the standard normal distribution is. A normal distribution mean zero variance one. We typically represent random variables that have the standard normal distribution with the capital letter zed. So if we are dealing with a standard normal random variable, we usually let it have this uh be represented by capital zed and then we say uh zed is distributed normally, mean zero and uh variance one, right? The second term is the variance. It's also one is also the standard deviation here, but that second term we write in is the variance. So let's see what this looks like. This is what the probability density function of our normal distribution looked like. We had that earlier one for American uh women's heights that had a mean of 162 and a standard deviation of seven. But here is our standard normal uh distribution, our probability density function. We have a little zed here and f of little zed representing this height of the curve. And we can we know from our previous knowledge here that the vast majority or. 997 of the area falls between minus3 to three. Again, it's not actually touching there, right? That curve isn't actually touching there. It's just getting really close. So the area out beyond minus3, beyond plus three is is getting very small. Uh and this curve is going to be important for us because we want to find areas under the curve. This is going to be very meaningful for us in a lot of spots. Now we also have it's good to see this curve. This is the cumulative distribution function or the CDF our cumulative distribution function which we have spoken of before. And this is an important curve for us where our f of zed here is our probability that your random variable zed takes on a value less than or equal to the input value here of little zed. And I'm going to put a little cross through my little zed. So the big one here is the random variable and the little one here is value of the random variable. Or in other words, this thing's giving us the area under the curve to the left of this value. So we start over here really close to zero. We're starting And then we start to sort of really accumulate some area in here, right? And then getting nearly one. It's not one exactly, but it's getting closer and closer to one and tending towards one as we shoot off to infinity. And if we look at this at zero, say, we know just from the structure, it's not a very good straight line, but I tried. So we know from the structure that there's half the area to the left and half the area uh to the right of zero. Zero is the mean and also the median of the standard normal distribution. And so this uh

### Segment 5 (20:00 - 25:00) [20:00]

cumulative distribution function at zero here that's what it tells us right here over here that's 0. 5 because it's saying at zero there's an area to the left of 0. 5. And this is what the porm is giving it to us. If we look up various values in this. So if let's say I wanted to know the probability that zed random variable zed with the standard normal distribution takes on a value less than or equal to one and a half. Then conceptually that's a couple of things. That's the area out here to the left of one and a half. And it's also the value of your cumulative distribution function here at 1 and a half. Right? So going over here, that's the value that we're looking for. This area corresponds to that value. That's what the cumulative distribution function does for us. And that's what the P norm function is doing. So P norm is going to give us the area to the left under the normal curve of the input value we put in. And this structure is going to happen for all sorts of other distributions for us. When we have the T distribution, we're going to have PT. When we have the F distribution, it's going to be PF. This structure is going to stay the same. So to find this uh this area, well, this probability, we would use P norm. It's just going to be P norm uh and 1. 5 and uh zero and one. We put zero and one in there. Uh and that turns out to three decimal places. anyway to be 933. That's this area here is 933 and that's this value here that we found from the cumulative distribution function. R has as a default because R knows we want to use the standard normal distribution quite often. R has a default of uh zero, a mean of zero and a standard deviation of one. Now, historically, we've often people have often used tables, and some instructors still use tables, but I have ceased to use statistical tables in my teaching for all sorts of reasons. One of them being I think students sometimes have the idea that in the real world, we actually just look these things up in a table, which of course we do not do. We look these things up with software. So, I try and give you the real deal of statistics all the way along, what things are actually done, what these things actually mean. Um, and so I don't use tables anymore. And some instructors still use tables. I think that that's a reasonable choice. I did up until a number of years ago, but uh I I'm just not using them anymore. So, we are going to use software. I think that is better all around uh both conceptually figuring out what these things mean uh and focusing on the right issues and statistics. So, I'm not using my nice pretty uh normal standard normal distribution table anymore, but some instructors do. That's totally okay. I don't use them anymore. So, we're going to rely on software. Okay. So, why this normal distribution? I said earlier when we were getting into this that it's useful to have some baseline distribution from which to work, some single distribution that we can talk about. And one reason why that works here, why that's a very uh meaningful thing for us for the normal distribution is that we're going to see in a second that any normally distributed random variable can be easily converted into something that has the standard normal distribution. So anything with a normal distribution can be converted into something that has the standard normal distribution. So we can just talk about the standard normal distribution in some spots and it's really general because anything with a normal distribution can be converted to that. So let's see what that actually means here. Okay, standardizing normally distributed random variables. So suppose a random variable X is normally distributed with mean and mu and standard deviation sigma. So I would write that X is distributed normally with a mean of mu and a variance of sigma squared. So that's what my notation is going to be. X is distributed normally. Mean mu variance sigma squared. That of course means it has a standard deviation of sigma. And here we can convert X into a variable having the standard normal distribution with a very simple linear transformation. Let's think about this. Let's say I started out with X which has a mean of mu and I subtracted off mu. Well, that quantity is going to have a mean of zero, is it not? That's a property of uh this linearity of expectation, right? But we talked about but just casually here you start off with something that has a mean of mu. When you subtract off mu you're that quantity is going to have a mean of zero. And if we divide now by sigma if we divide by sigma less

### Segment 6 (25:00 - 30:00) [25:00]

obvious but still true that is forcing the standard deviation to be one. So this quantity for if we have this random variable X and we subtract off mu we end up with something that has a mean of zero. If we divide by the standard deviation we are forcing the standard uh deviation of this quantity to be one. Now the what happens here you got to keep in mind there's an important concept is that this is a linear transformation that does not change the shape of the distribution. So this is a linear transformation and linear transformations do not change the shape of the distribution. So I have forced this quantity to have a mean of zero and a standard deviation of one but I have not changed its shape. So we said as a premise that X was normally distributed to begin with. So then this quantity has to be standard normal. So if we let zed we're going to call that let that be zed because then zed is going to have the standard normal distribution distributed normally with mean zero and variance one. Remember again that's mu and that's sigma squared when I'm writing it in this form. So zed has the standard normal distribution. It's important to note that this linear transformation does not change the shape. So if x was not normal to begin with, then zed is not going to be normal either. This doesn't magically transform any shape every shape into the into normal distribution. What it does is it forces this quantity to have mean zero uh and variance one. And if X is normal to begin with, well then zed is also normal because that linear transformation didn't change the shape. Okay, so now we have this. Wait a minute. So we have something has any normal distribution, any mean and any standard deviation. We can just convert it to the standard normal. And this helps us uh talk about things in in basic simpler ways. For one, it helps us in a number of ways to think of things in terms of the standard normal distribution. It just makes our our life simpler and cleaner rather than have to say okay suppose this is as this distribution and this distribution. We can speak of this in terms of the standard normal knowing that we can convert any normally distributed random variable to that distribution. Reasons for this and reasons for its importance will come up uh a little bit later but let's see uh some examples of this with some applied probabilities. Parents often want to know if the child's growth is progressing in a typical fashion of course right so one characteristic because of the size of the child we can look at a number of uh things you know you we can look at the head circumference which I look at off the start in another example of in an earlier video various characteristics like that but one characteristic is the length of their upper arm and so we don't know the true distribution of the upper arm length of uh one-year-old American girls, but studies have shown that it is approximately normal with a mean of one of 16. 1 cm and a standard deviation of 1. 3 cm. So this from various studies we can can estimate these values. We're going to take those as a given here that those are the true values. we can't possibly know the exact true values but for the purposes of this question let's uh pretend we know them and as well they are pretty darn close uh to reality. So this we're going to say is mu and this is sigma and this is a normal distribution with those parameters. So that's a normal uh distribution PDF with a mean of 16. 1 and I'm going to say over here that the variance the second term is the variance. So I'm just going to write it as 1. 3 squared. That way we're still officially holding to that second bit being the variance. But we can just see the standard deviation there as well. Rather than squaring that out and putting 1. 69, we can look and see that the standard deviation is 1. 3 but the variance is 1. 3 squared. So I write it that way sometimes. That's what this curve looks like. Uh and so let's see if uh we can have some find some probabilities. What is the probability that a randomly selected one-year-old American girl has an upper arm length less than uh 18 centimeters? 18. 0 centimeters. Okay. Well, if we want we're going to I'm going to plot this out. So out here. So, probability that a randomly selected one-year-old American girl has an upper arm length less than 18. 0 cm. So, what we want is the probability that X, if we let X be

### Segment 7 (30:00 - 35:00) [30:00]

the upper arm length of this randomly selected one-year-old American girl, then we want the probability X takes on a value less than 18. 0. And this curve is that normal distribution. So distribution PDF with a mean of 16. 1 and a variance of 1. 3 squared. That's what this curve is. And we want this which is just the area to the left of 18. That's what we want. This probability is that area. Probabilities areas under curves. So we want this area. How do we find that? we just go straight to R. This is going to be very easy for us if this is the probability we want. This is precisely the area that the Porm function gives us when we input that. So if we want that area, which we do because we want this probability, uh we just go straight to R with this. So P norm 18. 0 because this porm gives us the area to the left of the input value. Uh and then we need to put in the parameters. And remember R asks for the mean, so 16. 1, and the standard deviation. R wants the standard deviation for that as that second term. So we have to put in the standard deviation, which is 1. 3. Uh, and if we figure all of that out then to three decimal places, we get 928. That's our answer, and we're done. And we can move on with our lives and go on to something else. But I would like to show you the equivalence here and what we would have to do if we were standardizing it. Showing what happens when we standardize uh what what's happening there and that we get the same answer when we standardize and go through the standard normal distribution. This is something that we would have to do if we were relying on tables. We don't have to we're not relying on tables and we're using software. So we'd be done here. But it is still a useful thing to know. Just trust me on this one. We do need to know something about standardizing. it is an important thing for us uh as we go on. Uh and additionally I would like you to show you the equivalence here. So if say we had to rely on a table or in this case we're mainly showing uh the the equivalence I've plotted out my standard normal curve here. So my normal distribution with a mean of zero and this time I'm going to explicitly say the standard dev the variance is one squared. Okay, just to map it to show you the the same type of way I had up here. Obviously one squared is one, right? But [snorts] this is a standard normal distribution. So we could think of this. So if we were doing this, I could say I want this same. Let's start up here again. I want the probability that X takes on a value less than 18. 0 cm. Sure. Okay, fine. Now what does this do for us? Well, I could say probability that X minus mu I could standardize here. X minus mu over sigma is less than 18. 0 - mu over sigma. I haven't done anything uh un toward here. This is totally fine. I subtracted off mu and I divided by sigma which has to be a positive quantity. So I don't have to worry about the sign flipping directions or anything like that. Now what did that do? What did that do for us? Well, this thing X - mu over sigma is something that has the standard normal distribution. So I could say that this is zed. This quantity has the standard normal distribution. And I can say this is 18. 0 minus the mean which is 16. 1 and the standard uh deviation which is 1. 3. So if we put that in right to three decimal places, I get one point probability that zed takes on a value less than 1. 4615. I'll put 4615. I'll put four decimal places here. Um and then this is 1. 4615 is uh somewhere over here, right? One point we don't have to get it right on the money. 1. 4615 is right there. And that's this area here. Wait a minute. Those two things visually look like the same thing. And they are of course the same thing. So if we put this into R and we had our P norm and to avoid rounding error, we might put in this whole quantity here into there, our Porm and 01. We get N28 again. So we get the same thing N28. This area is 928 and which was the same thing as what we had up top. Now again, I know you might be thinking, "Oh my goodness, we just did this already. I why did we have to go through and standardize? " Showing you the equivalence. It's something that we need to do in various times. Of course, if all we needed was this probability

### Segment 8 (35:00 - 40:00) [35:00]

we would just put this uh original one into R like we did and be done. But I'm showing you what happens behind the scenes here when we're standardizing because standardizing is going to be important for us along the way. Okay. So, let's look at another type of problem. Something that's really common for us in statistics. We really need percentiles from a normal distribution a lot. Trust me, it's going to come up. So, we need areas under the curve a lot and we need percentiles a lot. It's just stuff that we need in inference. Both of those things are going to happen a lot. So, uh, and this is a relevant thing here. What is the 20th percentile of the upper arm length of one-year-old American girls? What is the 20th percentile of upper? That that's a common thing for reporting this type of thing. When you're reporting the size of a child or how big a child's head is, uh, if they're in the 99. 9th percentile, they got a really big head, right? So, um, they're not mocking anyone with a big head. My head was probably pretty darn big at birth. Okay. So but here we want the 20th percentile. So if we just draw the curve out, this is the real curve here. This is the normal distribution under discussion. That's my normal distribution with the mean of 16. 1 and a variance of 1. 3 squared. That's our probability distribution that I plotted out here. But the 20th percentile is the value of the variable that gives us an area to the left of 0. 2. That's what the 20th percentile means. So here there's some area some value here. So I'll just put this area here because I'm running out of space. So that area is 0. 2. What that means, right, that's what we're we're talking here. This here is the 20th percentile. The 20th percentile, the value of our random variable such that the area to the left is 0. 2. That's what it means. If we need to find this, straight to R we go. There's a the this is keep in mind this is a different type of problem than what we were just discussing. We earlier up top had a value of our random variable and we wanted to find the area to the left. This is the inverse function and we need the inverse function for this because we're given an area to the left and we want to find the value that makes that happen. Those are inverse operations. So R has a built-in function that does this for us. R knows we want to do this quite a bit. So R has something called the Q norm function which gives us quantiles of the normal distribution like percentiles except we're putting in this value here. Okay. So qorm we would just go straight to R and say Q norm input value is the area to the left 02 in this case uh and the appropriate mean and standard deviation and we get our value here. So if we just go straight to R and have that and so the 90th or the 20th percentile. So our 20th percentile is 15. 01 cm. If this uh randomly selected one-year-old American girl has a upper arm length of 15. 01 cm, they are in the 20th percentile. their upper arm length is as big or bigger than 20% of that group. And if that's what we wanted, needed, if our whole goal was finding the percentile here, we'd be done. It's over. It's done. Okay. But to show the equivalence here and what we would have to do if we were using a table, this kind of thing. But for us, it's mainly I'm showing you that they're their equivalent operations. So there's there's reasons for me going through this. This is the standard normal uh probability density function that I've drawn out here. So we would have to go in the opposite direction here. We're going to think about this. We're say if we did in fact have this 20th percentile for the standard normal distribution. If we had that 20th percentile, then we can of course find that corresponding value for X because we know that zed is equal to X - mu over sigma. We know we can transform X to zed. So we can also go back rework this sigma * Z plus mu. So this is going to imply that X is equal to mu + sigma * zed. So what we can do then is find the 20th percentile of our standard normal distribution. So we go to R say Q norm 2 0 for the mean one for the standard deviation and uh we get minus. 84 to two decimal places. You want to carry many decimal places all the way through your calculations of course but here let's just try and keep this simple and not too junked up. Carry as many decimal places as you can. always better just go

### Segment 9 (40:00 - 42:00) [40:00]

to software and do it that way originally. But uh here we have that the 20th percentile of the standard normal distribution is about minus. 84. Uh and so then we can figure this out. This is the 20th percentile of the standard normal distribution. So then we can say hey wait I can solve for x. My x is going to be 16. 1 plus sigma which is 1. 3 times the corresponding zed value the minus. 84. And if we do that, you'll probably be shocked to learn that we get a value of 15. 01 cm. Or in other words, 15. 01 cm is the 20th percentile of the distribution of the upper arm length of a randomly selected one-year-old American girls, which is what we found up here. Doing it straight up the easy way through Rightway. Now it's very reasonable to think why did we do all this standardizing? Why did we have to do that? I came up with the right probability uh and percentile right away going straight to R. Why did I need to standardize? It's just something we need going forward. We standardize a lot in statistics. We are going to have a have to have a very good knowledge of our standard normal distribution. We're many times going to be finding areas under the standard normal curve and this is going to be very uh important for us in a wide variety of statistical inference settings. So it's just something we need to do. Pretty reasonable to wonder why uh we need to do it at this point. Uh but it does allow us to speak in a simpler language having this one baseline distribution from which to work. And then we just do need to know these properties of standardizing and why this works and why we're doing these things later on. This the standard normal distribution comes up for us a lot in statistics. So I need you to find be able to find areas uh under the normal curve. I need you to be able to find percentiles for normally distributed random variables. These things are important. Takes a little practice to make sure we always just get that down straight, right? Uh perfectly. Uh remember to always draw a picture for these things and just map out what you're trying to find before running off to software. Always draw out a picture uh say just doesn't make any sense to run to software before you know what you're trying to find. So draw a picture uh illustrating what you're trying to find. Then think, how can I use software to get this value that I need? And that takes a little bit of practice. So work through some of those exercises, get this down because we do need it going forward. All sorts of important stuff coming up. We'll see you then.
