# Hypothesis testing: Rejection regions and p-values |Full lecture (Intro Stats)

## Метаданные

- **Канал:** jbstatistics
- **YouTube:** https://www.youtube.com/watch?v=d0Sm_xOs33Y
- **Дата:** 27.02.2026
- **Длительность:** 36:31
- **Просмотры:** 1,230
- **Источник:** https://ekstraktznaniy.ru/video/52879

## Описание

This is a full lecture-style video introducing rejection regions and p-values, pitched at the level of an applied introductory statistics course at university. The focus is on big-picture concepts rather than just quickly following a recipe.

Here I continue to work through my lecture outline document, a condensed version of the hypothesis testing chapter from my textbook. Students in my STAT I course at the University of Guelph have these materials.

In my next lecture-style video I'll work through two real-world examples.

## Транскрипт

### Segment 1 (00:00 - 05:00) []

Hello everybody. We are going to continue our discussion of hypothesis testing with the discussion of rejection regions and p values. But first brief recap of what we've been talking about. We've been talking about hypothesis testing in general first and then in the specific case of inference for a single mean mew of a normally distributed population. And we were carrying out the hypothesis test that the true mean uh equals some hypothesized value, some value that is of interest to us for some reason against one of these three alternative hypotheses. And our appropriate test statistic was either a zed statistic if sigma is known very rare or a tstistic when sigma is not known and we estimate it with the sample standard deviation which is commonly what happens in practice. Then we did an example we went through and talked about all this. See the previous video if you're not up to speed on this. We did an example involving serial weights. uh and we did a t test and we found our test statistic fell far out in the right tail of this distribution. This was the distribution of the test statistic under the null hypothesis. So something far out in the right tail then is giving us a lot of evidence against the null hypothesis. If the null hypothesis is true, uh the value we get should just be a random sample from this distribution. So something way out in the right tail or left tail would be very strange to get under that distribution uh and thereby giving us strong evidence against the null hypothesis. This is where we ended off last time. But then the question is where would you draw the line? If you got a value in sort of in the middle here near zero that's not any evidence against the null hypothesis. Values far out in the tails evidence against the null hypothesis. Where do you draw the line? And that is where today's discussion begins. rejection regions and p values. Let's get to it. We are going to start with the rejection region approach. Though I greatly prefer the pvalue approach and once we get the pvalue approach down, we will be using that in the vast majority of situations. The rejection region appro uh approach is very important historically and still today for a number of reasons. But the p value approach is uh better in uh almost all situations I would argue. So we are going to focus on the p value approach. But we still have to know the rejection region approach. Okay, redirection region approach. One method of determining whether the evidence against the null hypothesis is statistically significant and we'll use it in spots. Well, we'll see in a second where we're going to make a concrete decision, a binary decision between the null and alternative hypothesis here. So, um, which is one reason I don't like it. Okay. After constructing appropriate hypothesis, so we're going to go through the logic we've been doing all along. construct appropriate hypothesis and then we are going to decide on an appropriate value of alpha which is the significance level of the test and we choose that more detailed discussion of this later on uh once we get these basics of hypothesis testing down but that alpha that significance level that we choose is going to be or is the probability of rejecting the null hypothesis if it is in fact true. So this is a fundamental thing that we'll talk about uh about later. Rejecting the null hypothesis in the event that it's true because we happen to get some weird sample data that were unusual which gave us evidence to reject the null. More details on that later, but we typically choose this to be a small value. We want that to be small. Uh and people often blindly pick 0. 05. One of the weird things about modern science is this blind picking of 0. 05 uh so often it is very strange in my opinion. Anywh who another talk for another day. So we pick a value people often pick. 05 uh and then based on that choice of alpha we are going to find an appropriate rejection region. So based on the this uh this choice of alpha and the test statistic under discussion we are going to get a rejection region for that test statistic and then we're going to get the value of the test statistic that we're using and we're going to then say reject the null hypothesis if the test statistic falls in the rejection region. This is the logic behind this. It is a little better I think uh to talk about statistical significance at the alpha level of significance. So if we fall in this so-called rejection region, we can say that the evidence against the null hypothesis is statistically significant, which I think is vastly superior terminology to this reject the null hypothesis. Historically, reject the null hypothesis is important. It's a nice clean language when we're talking about the mathematical underpinnings of hypothesis testing, but it is also

### Segment 2 (05:00 - 10:00) [5:00]

absurdly overused. We talk about rejecting the null, when we're not actually making a decision between the null and the alternative hypothesis in many, many, many situations. We're simply assessing the evidence that this study provides. So, this reject the null language I find extremely unsatisfying in a lot of applied settings. any who still important to understand. So we are doing this in a very fixed and rigid way and we're deciding essentially between the null and the alternative hypothesis based on a binary decision rule. Are we going to reject the null or we're going to not reject the null? Uh and in a lot of situations this is unsatisfying. Okay, let's look at a visual of this. So suppose we wish to test the null hypothesis that mu equals mu not against the alternative that it's different from mu not. We are going to pick some value of alpha and this is going to be a smallalish value because we don't like going off and rejecting the null hypothesis everywhere. Remember that in general we are giving the null hypothesis the benefit of the doubt. We don't like just rejecting the null everywhere. So we typically make that a small probability such as 0. 05. We will discuss the merits of the choice of alpha in more detail uh later on once we've un understood this and worked through some examples. In later videos we'll discuss the specific merits of the choice of alpha uh and downsides of choosing an extremely small one etc etc. Okay. But we have this so this is a curve here. This could represent either our uh standard normal distribution or the appropriate t distribution. Either one looks something like that. This is the standard normal curve but uh the t distribution would look very similar to this and the same idea holds uh for either distribution here. So we are going to have some distribution under the null hypothesis and we're doing a two-sided alternative. We take our value of alpha and we split it up evenly into the two tales. We put alpha over two on one side. the other side. These critical values are the values that make that happen for the different distributions. It will depend on the value of alpha. distribution. But there's some critical value and that is the cutoff point here where we're going to reject the null hypothesis. If we get this critical value or something farther out here in the rejection region this way or if we get this critical value or something out farther out to the left, we reject the null hypothesis out here. And in this region between the critical values, we would not reject the null hypothesis. So let's look at this for a specific value of alpha. Suppose suppose that we're choosing this alpha of 0. 05. We are doing this suppose we're choosing this alpha value of 0. 05. And suppose we are carrying out a zed test. So we are actually doing this for the standard normal distribution. So if that's what we're doing, we're going to have zero here. There's going to be some value here such that the area out to the right is 0. 025. And there's going to be some value out here such that the area out to the left out here is 0. 025. And the standard normal distribution is symmetric about zero. So we know the magnitude of these two things has to be the same and they're differing only in sign. The one on the left is going to be a negative value. The one on the right is going to be a positive value. But they u are the same magnitude. So to find this we just use our regular old rules that we we use to find these values or regular old methods. We go to software that's how we find these values. We go to software. So we're going to go to R for this and say hey R give me the value such that the area to the left under the standard normal curve is 0025 and we get a classic value here of minus 1. 96. It's - 1. 960 rounded to three decimal places. But typically uh just say 1. 96 and then out here 1. 9 six is the value we get out here. That's the value with an area to the right of 0. 025. So this is the logic we would use here. Now let's just look at a prettier version here. of that. Uh and this is what it looks like if we draw it up really nicely. And so then what that means is that we're going to reject the null hypothesis in favor of the alternative. If our zed test statistic because in this case I'm doing a zed test is at least 1. 96 then we're rejecting in that right tail or the zed statistic is minus 1. 96 or farther off to the left. So those are our two

### Segment 3 (10:00 - 15:00) [10:00]

rejection regions and we could bring those together. So we reject the null in favor of the alternative if your zed statistic falls in those regions. And if we wanted to we could get those bring those together and say the absolute value of zed is bigger than or equal to 1. 96 if we wanted to write it in one line like that. So if we get a zed statistic that falls in the rejection region, we would reject the null hypothesis in favor of the alternative. Uh and if we do not, then we would not reject the null hypothesis. So that's what it would look like for a zed test and a two-sided alternative and an alpha level of 0. 05. the value would be uh different for the t test depending on the degrees of freedom but we would go through that same logic and get those values for the t stat from the t distribution rather than the standard normal distribution. Now what if we were doing a onesided uh alternative instead? So, if we're doing a one-sided alternative, let's say we're still doing a zed test here, and we're doing a zed statistic, carrying out our our zed test. And for a one-sided alternative, depends what side we're doing, right? So, this one over here, like I have here, the alternative is that mu is less than mu kn. And remember our our test statistic our zed test statistic xbar minus mu kn over sigma over the square root of n. So values in the left tail give us evidence against the null hypothesis and in favor of this alternative. If Xbar is way less than your hypothesized value, then this test statistic falls in the left tail. And Xbar being way less than the hypothesized value is giving us evidence that mu is less than the hypothesized value. So the rejection region, we put the entire alpha over here in this left tail and then we find the value of our standard normal random variable that makes that happen. And so we just go and say, "Hey, R, can you give me the value of my standard normal random variable uh such that the area to the left is 0. 05 and we get this to three decimal places minus 1. 645. So we would reject the null hypothesis in favor of the alternative if the zed statistic that we end up with is less than or equal to minus 1. 645. And of course on the other side of things, we know how this is going to go down for this other alternative. If mu is greater, if we have the alternative that mu is greater than mu not at that same alpha level, we put that entire alpha value in the right tail because values out of the distribution give us evidence against the null hypothesis and in favor of this alternative hypothesis. So we would reject the null hypothesis in favor of the alternative hypothesis for this zed test. If the value of zed that we get from our test statistic is bigger than or equal to 1. 645. This is the rule that that we have. We pick an alpha level and then depending on the type of test we're doing depending on the alternative hypothesis. We create the appropriate rejection region. Then based on our sample data, we calculate the value of the test statistic and see if our test statistic falls in that rejection region. If it does, we can reject the null hypothesis in favor of the alternative. And if it does not, we don't have enough evidence to reject the null hypothesis in favor of the alternative. Now, in some ways, this is very silly. Although sometimes it's very useful, very useful conceptually in in working out certain things in statistics. This rejection region idea is much needed and important in the history of statistics and still used today and not overall a silly thing. But in many situations, a rigid application like this would be a silly idea. In many cases, we're simply in the real world in applied problems, we're assessing the strength of the evidence against the null hypothesis and not making some concrete rigid binary decision between the null and the alternative. So if we need to make a concrete decision between the null and alternative we have to draw the line somewhere. So this is okay approach but in most situations and more most real world situations we are assessing the evidence that we have and so drawing is arbitrarily fixed place to you reject the null hypothesis in a lot of places is kind of silly. And let's think about this a little bit. for this particular example that we're looking at here. [snorts] If we're doing this one-sided uh test over here, if we get let's think of three situations. We

### Segment 4 (15:00 - 20:00) [15:00]

get a zed value of 1. 64 and 1. 65. These two situations have fundamentally different conclusions. This one over here is of course uh statistically significant at 0. 05. We could reject the null hypothesis in favor of the alternative. If our zed statistic is 1. 645 or 1. 65 right over here, if our zed statistic is 1. 64, we get completely different conclusions. We do not reject the null hypothesis here um and we do not have statistically significance against the null hypothesis at 0. 05. And just to make this more and more ridiculous, we could get let those numbers get closer and closer and closer until they're almost next to each other, next to each other essentially on the number line, and we're still reaching completely different conclusions. And that's quite unsatisfying, at least to me. And on the other side of things, what if we got a zed value of 28. 4? Let's say 28. 4. Both of these we have exactly the same conclusion. We have statistical significance at 0. 05. We can reject the null hypothesis in favor of the alternative at 0. 05. But 1. 65 happens when the null hypothesis is true. 1. 65 or something farther out happens about one time in 20. 28. 4 happens not in a billion lifetimes. There is fundamentally greatly different differing bits of evidence against the null hypothesis for those two values but we are saying something very similar in the conclusions and that is also unsatisfying. So we need something, some measure in terms of how extreme that value is, how far out in the tail it is, how much evidence there is against the null hypothesis and in favor of the alternative. And we could say and do people report the value of the test statistic. So we have some idea here. So I could look at it and say, okay, 1. 65, that's just in the rejection region. 28, that's way out there. fundamentally a lot more evidence against the null hypothesis. We could do that. But we have all sorts of test statistics. We have zed tests, t tests, kaiquare tests, f tests. There's all sorts of different test statistics with the different degrees of freedom these distributions change. We can't all understand just by looking at it or any of us can't just understand right away how far out in the tail these values are. So we need some measure of how far a test statistic is out in the tail and we want that to be somewhat universal for all these different distributions and that's where the p value comes in. Okay, p values first of all controversial misinterpreted a bit tricky. Uh people write articles on how p values are misinterpreted and in within the articles misinterpret the p- value. you know these kind of things. We argue about these things like mad statisticians will forever more uh in this argue debate debate. Um but they still have real value. So I Okay, sure there's some problems. Sure they're misinterpreted but this is they're still a good thing and very helpful in the world of hypothesis testing. Overall big picture it's going to give us a measure of the strength of the evidence against the null hypothesis. So we're going to have one number in the end and it's going to give us a measure of the strength of the evidence against the null hypothesis from this particular study. Not just overall for everything that we have everywhere. Talking this study, this test and there are various definitions, various ways of phrasing these and none of them are especially intuitive and easy. So I'm going to give you a couple here. and we're going to think about this in a couple different ways. First of all, p value is the probability. A p value is a probability under the null hypothesis of obtaining a test statistic at least as extreme as the one observed. Okay, we're going to try and wrap our minds around this fully, but the fir a couple of big points. One, probability. It's a probability. You give me a p value of 1. 7, you screwed up royal. Okay, it's a probability. It's a probability found under the null hypothesis. that is acting as if the null hypothesis is true. We calculate the p value assuming the null hypothesis is true. That's what happens in that calculation. When I say under the null hypothesis, that's what I mean. We're saying suppose the null hypothesis is true. And then it's probability of obtaining the test statistic that we observed. observed or something more extreme, something even farther out. Now, this is not a bad place to start

### Segment 5 (20:00 - 25:00) [20:00]

except what do we mean by extreme? That's not super obvious here. I'll try and make it clear as we go through, but it is not super obvious. So, let's look at it at just a slightly different perspective. That's a really good starting point, I think. But our p value is a probability assuming the null hypothesis is true of getting the observed value of the test statistic or a value that would yield at least as much evidence against the null hypothesis. So when we say extreme up here, we're talking about uh extreme in the sense of evidence against the null hypothesis. How far out is it in the sense how extreme is it in terms of the evidence it provides against the null and in favor of the alternative? So, none of these definitions are all that simple and easy to understand or fully wrap your mind around. We're going to look at a couple examples in a moment, but the things that should be easy to wrap your mind around is the fact that it's a probability and found under the null hypothesis, acting as if the null hypothesis is in fact true. Okay, let's see what this means and how we'd find them for our test statistics that we've been talking about. So suppose in a zed test or a t test, the observed value of the test statistic is found to be 1. 5. We have our hypothesis. We get the data. We calculate the value of our zed statistic or our t statistic depending on what we're doing. And we want to find the p value. Well, the p- value is going to depend on this value. It's the distribution of the test statistic under the null hypothesis and it's going to depend on the alternative hypothesis. So the p value is going to be different for the three alternatives. So suppose we're testing the null hypothesis that mu is equal to mu not. This these are the tests of this section. These basic ideas hold uh elsewhere as well. But uh we are going to look at uh this the three different alternatives. So let's say suppose just suppose for this first one over here the alternative is that mu is less than mu not we got this value of 1. 5 now our test statistic suppose we have we're doing a zed test again same logic for a t test it's the uh but we would need to um in a for a t test use the t distribution with appropriate degrees of freedom rather than the standard normal. So that's our zed test statistic. And if we got the a stat of 1. 5 for either a zed or a t, we we'd say, okay, here's 1. 5. Here's the value we actually got. And our p value is going to be the probability under the null hypothesis of getting the value we got or something more extreme but more extreme in a specific sense more extreme in the sense that it gives us evidence against the null and in favor of the alternative. So if this is the alternative, values in the left tail give us evidence against the null hypothesis. hypothesis and in favor of the alternative. So the p value is going to be the probability of getting this value we got or something farther left. That is the p value for this alternative hypothesis. So we would go if we're doing a zed test, we go to the standard normal distribution. If we're doing a t test, we go to the t distribution with appropriate degrees of freedom and we find this area. So if we were doing a zed test, we would go to the r or other software and find this area to the left of 1. 5 under the standard normal curve and that area would be our p value. And that for us here is 933. So that is our p value of our zed test here. And if we were doing this for uh this other one here, right? So we were doing this alternative here. And here let's say we are testing the null against the alternative that mu is greater than mu not whatever our mu not is. Well, our test statistic value is still 1. 5. So Our p value is the probability of getting the value we got or something even farther to the right. Something farther out to the right which

### Segment 6 (25:00 - 30:00) [25:00]

is giving even more evidence against the null and in favor of this alternative. So for this alternative mu greater than mu not values in the right tail of the test statistic of the distribution give us evidence against the null and in favor of this alternative. So the p- value is the probability under the null of getting this value or something farther to the right and that is the p value in this case. So we have this and to three decimal places then that's going to be 067. Okay. So now we get to the two-sided alternative and this is the p value if we're doing a zed test, right? It would differ for a t test. So if we have the alternative hypothesis here that we are that mu is differs from mu not then we would go through the same logic and we would say here's 1. 5 and values out in the right tail give evidence against the null and in favor of this alternative. Sure. So we are interested in the probability of getting this value this 1. 5 or something farther out there. But we also would have thought it just as exactly as much evidence against the null hypothesis and in favor of the alternative had the test statistic been of the same magnitude but on the other side of zero. So over here if we had minus 1. 5 that would have been the same amount of evidence against the null hypothesis. Values way out in the left tail give evidence against the null and in favor of this alternative as well. So even though we got a value of 1. 5, we think, hey, you know what, we could have got a value over on the other side of minus 1. 5. And so we should take that area into account as well. And our p value is the sum of these two areas. So those two areas. 067 plus 067 which is of course 067 * 2 right times two. And we conceptually we're summing those two areas together. But the t and zed are both symmetric about zero. So this uh sum of these two areas is the same thing as taking the tail area like the little area in the tail beyond the observed value of the test statistic and doubling it. So if we get a value in the right tail, we'll take that right tail area and double it. If we get a value in the of a test of the test statistic in the left tail, we'll get take that left tail area and double it. And that is how we get our p value for a two-sided alternative. No, might be, you know, not necessarily the simplest thing to think of why we're doubling it, but we didn't get a value in that left tail, but we might have gotten a value in that left tail. So, we in this case, we got 1. 5, might have got minus 1. 5, right? Under the null hypothesis, that would be just as likely to get something that far out in the left tail as right tail. And we should take both of those things into consideration. So if we went and we did this then we get our uh final p value of course our two-sided p value of. 134 and that's generally how we go about getting this p value here in this two-sided all alternatives for these zed tests and t tests like this. Now big picture here something very important that's how we get these p values but let's look at this. It's kind of hard to fully grasp what's going on here. the probability under the null of obtaining a test statistic at least as extreme or something uh that far out or something with even more evidence against the null hypothesis. That kind of idea. But let's keep in mind then is a probability of getting something at least as extreme as the thing we actually observed if the null hypothesis is true. And so if our p value is really small, if we have a really small p value, that means then that it would be really hard or very unlikely to see what we actually saw if the null hypothesis is in fact true. A bit casual wording there, but the gist of it. Okay, so if we have a very small p value, then it would be very unlikely to see what we actually observed or something more extreme if the null hypothesis is true, thereby giving a lot of evidence against the null hypothesis. And so if we look even at the drawings here, like if our test if our this middle one, let's say, if we're looking at this one, the farther out in the right tail that test statistic is, we we know that's giving us evidence against the null and in favor of this

### Segment 7 (30:00 - 35:00) [30:00]

alternative. But the farther out we go, the smaller the p value will be and the small the greater the evidence against the null hypothesis. So as a big picture very important notion the smaller the p value the greater the evidence against the null hypothesis. All else being equal blah blah blah. Pendants in the back quiet down. Okay. So this is a general important idea. The smaller the p value the greater the evidence against the null hypothesis and in favor of the alternative. Now, this really only applies to smallish p values because if you have big p values like this 933, this one gave absolutely no evidence against the null hypothesis. None. Zero. Like zilch, right? It's values in the left tail that gave evidence against the null and in favor of this alternative. We got a huge p value here. Absolutely no evidence against the null hypothesis for for this one and in favor of the alternative. uh had we gotten a value way out in the left tail that would have resulted in a small p value and strong evidence against the null hypothesis and in favor of this alternative. So very important that we recognize that all else being equal, the smaller the p value, the greater the evidence against the null hypothesis in this particular study. We're not going globally for everything we happen to know about everything in the world. we're talking about what's happening in with this particular data, this particular question and this particular hypothesis test. Okay. So then if if if I have emphasized if we are carrying out the test at a fixed significance level alpha then we can say the evidence against the null is statistically significant or that we reject the null hypothesis at the alpha level of significance if the p value that we come up with is less than or equal to alpha. Okay, so if we did this, I don't know why this little red thing comes up. Anyway, go away little red thing. Stop. Okay, I don't know why that's there. Anyway, p value less than or equal to alpha. So that's if we are carrying out a test at a fixed significance level alpha. And again, a lot of the times we should not be carrying out a test at a fixed significance level alpha. We are simply assessing the evidence. Does your new method of growing figs does do we have evidence that this is results in greater fig yield than previous methods? These are questions that we have and we're not just saying I will decide yes if this p value is less than or equal to 0. 05. This is silly. Okay, it's absurd and ludicrous really. Uh it is needed sometimes if we have to make a hard decision, a binary decision between the null and the alternative. Sure. In the real world a lot of the times we're not. Where might we be? Let's say might be something like we've got a million explanatory variables and we want to go through and we need to purge some of these and get ourselves to a simpler model and it has to be done somewhat automatically because I can't go through and look at these million explanatory variables. So if we have something like that maybe we want to do some purging of variables that we don't feel are we don't have a lot of evidence are really meaningful uh to us and we might have hard and fast cut offs there. It comes up. It's a thing. It's definitely a thing, but it's just an overused thing. And much of the time, we should simply be reporting uh the strength of the evidence against the null hypothesis and in terms of what our p value is. Now, this what we're doing here results in the exact same conclusion as the rejection region approach at this alpha level of significance. It's the same conclusion at that alpha level of significance. So you might be saying, "Holy moly, we did a lot of stuff there with you going on rants about all sorts of things and now you're just saying it's the same thing. " But it's not the same thing. It can be made to be the same thing. If we carry out the test at that fixed alpha level, then going the p value route and just making this statement at that fixed alpha level is the exact it's the exact same conclusions as for the rejection region approach. But the p value also gives the p value. We report the p value a lot in our statistical inference procedures and software. The p- value gets reported in journal articles on things. The p- value gets reported. And this lets the reader make up their own mind. I don't need you telling me to see whether there was statistical significance at 0. 05. I can read the article. I can see what happened. I can make up my own mind. And part of making up my own mind is assessing for myself the strength of the evidence against the null hypothesis. And part of that is what the p value happens to be. So we can report the p

### Segment 8 (35:00 - 36:00) [35:00]

value. We can draw conclusions at a specific alpha level if we want. But if we also report the p value, the reader can take that into account and make up their own mind. So reporting the p value is a very very common uh thing. It is uh all sorts of controversy attached to p values but I think they are a very valuable thing in hypothesis testing uh and despite there being problems in various ways of them being misinterpreted they are very beneficial thing to have uh and are a pretty reasonable summary of the evidence against the null hypothesis in the particular test that we're talking about. So p values a very important thing in the world highly controversial we argue about it forever all that kind of thing but beneficial and we will use uh these p values going forward in a variety of statistical inference situations won't always be z tests and t tests but the logic behind them is going to be very similar in those other settings as well. So going forward we're going to be using these is going to help to make get these concepts concrete in our brains by going through some examples. So, that's coming up next and we'll see you in the next video when I work through some examples.
