# Hypothesis Testing: Introduction | Full Lecture (Intro Stats)

## Метаданные

- **Канал:** jbstatistics
- **YouTube:** https://www.youtube.com/watch?v=_HOW_zFRCpk
- **Дата:** 23.02.2026
- **Длительность:** 42:46
- **Просмотры:** 1,632

## Описание

This is a full lecture-style video introducing hypothesis testing, pitched at the level of an applied introductory statistics course at university. The focus is on big-picture concepts rather than the mechanics of any specific test.

Here I work through the start of my lecture outline document, a condensed version of the hypothesis testing chapter from my textbook. Students in my STAT I course at the University of Guelph have these materials.

If you're looking for a quick procedural walkthrough, this isn't the right video for you; these lectures are about statistical thinking. I have many shorter videos dedicated to specific topics that may be more appropriate.

References for the examples:

Manconi et al. (2010). Measuring the error in sleep estimation in normal subjects and in patients
with insomnia. Journal of Sleep Research, 19:478–486.

Fink et al. (2007). Male facial appearance signals physical strength to women. American Journal
of Human Biology, 19:82–87

## Содержание

### [0:00](https://www.youtube.com/watch?v=_HOW_zFRCpk) Segment 1 (00:00 - 05:00)

Hello everybody and welcome to our initial discussion of hypothesis testing. We have already been introduced to confidence intervals which were a statistical inference method where we used sample data and sample statistics to help us estimate population parameters. And that's what statistical inference is all about, right? Trying to use sample data to say something about the entire population. And hypothesis testing is another statistical inference method. uh it is uh a little bit different from the confidence interval type of uh type of idea. We have a very similar underlying mathematical logic to it. But here we are answering specific questions about population parameters and not simply trying to estimate them. Uh, hypothesis testing is a little bit more nuanced than confidence intervals, controversial, uh, and very often screwed up. When you're watching people talk about it, uh, it can be a little bit tricky. The big picture stuff we talk about today, I think, is very understandable. It's in some of the nuances of precisely what we can say where it does get a little bit tricky but an important statistical inference method that helps us answer questions about the real world. So in hypothesis testing we translate some question of interest into a hypothesis about the value of a parameter or parameters and then carry out a statistical test of that hypothesis. So, we have some question in mind. Does my newly developed method for implanting baboon hearts in humans uh result in greater success than the old method? Something along these lines. Um and then we're going to collect try to anyway collect appropriate sample data that will uh allow us to answer that type of question. But that big picture question gets translated into a formal uh into formal hypothesis about the value of a parameter or parameters. uh and then we use specific mathematical methodology to help us say stuff uh make appropriate statements about the hypotheses that we come up with. Okay. So again, this is going to be a bit of a big picture thing for a while here. Big picture ideas today. Uh so examples of questions hypothesis testing might help us answer. Do we have strong evidence that the mean fuel consumption of a new model of car differs from what the manufacturer claims? pretty natural question. Businesses aren't always forthcoming with the absolute truth if there is one, right? And so sometimes we might want to look into their claims and see if we have evidence against that. Uh so very natural thing to want to investigate. Another extremely natural thing that we want to investigate is do we have evidence of a difference uh between two groups in some way here? evidence in effectiveness between two recently developed vaccines. Uh could be you developed a new drug and you want to see if it works better than a standard drug currently in use or better than a placebo. Uh new surgical method that you want to see whether it uh reduces posttop infections, things along these lines. We're very often comparing two groups, a treatment group to a control group, two different groups that we're interested in comparing. uh this is a very very common thing uh in the world of of science. So is there a strong evidence of a relationship between blood type and rate of pancreatic cancer? You might not know this but there is are relationships between blood types and certain types of cancer. Uh and we might want to investigate this for various types of cancer. Does blood type O people with blood type O do they have lower rates of pancreatic cancer say than other blood types? we might want to look into these types of things uh and get the appropriate data to answer these sorts of questions. Those are big picture kind of questions. Let's shore this up a little bit uh with some real data real situations in a minute or two. Inspired by those research questions, we're going to come up with statements about population parameters that are related to those original research questions. collect some data where we can, where it's possible to do so, and then use that data to carry out these formal hypothesis tests and allow us to make statements about those population parameters and then eventually go back to the context that we had in those original research questions and try and say here's what the data showed us about those original research questions that we had. So one big thing here is the conversion from these research questions that we have these big picture about the world uh into these formal hypotheses about population parameters and this is an important step for us and we are going to translate those questions into appropriate null

### [5:00](https://www.youtube.com/watch?v=_HOW_zFRCpk&t=300s) Segment 2 (05:00 - 10:00)

and alternative hypothesis. Now, the discussion here, I hope, makes a little bit of sense, but I promise it'll make more sense when we work through some examples. This has some nuance and some subtlety and some uh times when we can be a little bit unsure of precisely how to phrase things, but this will all become a little bit more clear as we work through some examples. So, we're going to have a null and alternative hypothes hypothesis. The alternative hypothesis we're going to denote by HA here. often often uh the hypothesis the researcher is hoping to show. Not always just o overall big picture ideas here that this is often the hypothesis the researcher is hoping to show and the null hypothesis which we're going to have h subzero here or h knot is the hypothesis of no effect or no difference. These are big picture ideas again we're going to look at this through examples but big picture ideas. the null hypothesis, no effect, nothing's going on. These two drugs have the same effect, nothing's changed. This kind of idea, big picture ideas, the null hypothesis is kind of your status quo hypothesis that nothing has changed and we go from there. Then this alternative hypothesis is typically the one we're trying to show. So one of the things to keep in mind here, let's look at some let's look at this first. So some very casually phrased null hypothesis and alternative hypothesis overall null hypothesis nothing going on right nothing happening the alternative that something's going on over here very natural one the drug and placebo have the same effect that's your null hypothesis the alternative hypothesis is they have different effects there's something going on with that drug null hypothesis same as it ever was. Nothing's changed. Alternative hypothesis, something has changed. Null hypothesis, sure, they have that claim. They're telling the truth. Alternative hypothesis, let's say they are lying or they're telling a falsehood. And overall, big picture-wise, very common type of thing. Null hypothesis, there's no relationship between those variables. Let's say, and the alternative hypothesis is that there is a relationship. Now, this is all very casually phrased, of course, and we're going to get a little bit more specific in a moment. One thing to keep in mind is in all of these things, we are going to give the null hypothesis the benefit of the doubt. We are at first going to say, hey, here's our null and alternative hypothesis. And in the logic of all of this, we're going to give the null hypothesis the benefit of the doubt. get appropriate sample data, do an appropriate method of analysis, and then see if we have evidence against the null hypothesis and in favor of the alternative hypothesis. This is the overall basic logic that we're going to go with here. And we'll see what happens with the math in that in various settings, but we have this null hypothesis very casually. Nothing's happening. Nothing's going on. Same as it ever was, status quo. Give that the benefit of the doubt. And then we see based on sample data uh whether we have evidence against the null and in favor of the alternative. So let's look at some specific examples here. Okay. So study investigated the objective subjective mismatch in sleep perception. This is kind of near and dear to me. I have done several sleep studies in my day. I have a weird quirk where I hold my breath on the exhale. Believe it or not, you might go right away to sleep apnea, but they tell me no. That's sleep apnea is something different. Mine's a little quirky where I hold my breath on the exhale. Yeah, what are you going to do? So, occasionally I go to sleep clinics and I have done this very thing before. So, this is this sort of thing is near and dear to me. Uh so here um participants at a sleep study had their total sleep time measured by technology in the following morning when they woke up as I have done filled out that questionnaire asked how many minutes they thought they slept. Now the sleep time mismatch the difference between the two times the objective sleep time measured by technology and what they say they thought they got. So that's our sleep time mismatch and positive mismatch times indicate the individual slept for more time than they thought. Okay. So we had this and in one aspect of the study, so we'll look at the some data in a minute here, but in one aspect of the study, the mismatch times were recorded for a sample of 159 self-dagnosed insomniacs and 288 normal sleepers. Now, there's several questions that naturally arise here. We might think on average, do insomniacs perceive their sleep time correctly? That's a reasonable big picture question we might ask ourselves

### [10:00](https://www.youtube.com/watch?v=_HOW_zFRCpk&t=600s) Segment 3 (10:00 - 15:00)

here. Uh, and that we might translate that into a specific question about population parameters. And here we would say we might say this the null hypothesis we could phrase uh one like this that the mu subi which I'm going to denote as this or the meaning of this one is the true mean okay the true mean mismatch time for insomniacs and I'm going to say my null hypothesis reasonable null hypothesis is that is zero or in other words on average the insomniacs are perceiving their sleep time uh correctly. And the alternative would be that that's wrong. The alternative hypothesis that the null is wrong here. So we could say h a that the true mean mismatch time for insomniacs is not zero. It differs from zero. And we could go and collect some data and we'll talk about ways we can uh actually analyze this and carry out a hypothesis test on this. So this is the our experience here the concrete example we have some big picture idea are these insomniacs perceiving their sleep time correctly we may translate that into a question about a mean here uh and it's always going to be in the null hypothesis here sort of you know that there's nothing going on sure they're perceiving their sleep time correctly on average uh and this is going to be a parameter we've translated the big picture question into the value of a parameter our alternative hypothesis is that parameters differs from zero. Now, that's not a bad question to ask ourselves, but it's also a little bit um unsatisfying in that let's say we did find evidence against the null hypothesis and in favor of the alternative. Well, we wouldn't know whether that's necessarily an insomniac thing or a human thing. We've only looked at the insomniacs here, right? We we if we just looked at this one and we found evidence, let's say that insomniacs did tend to uh let's say underestimate their sleep time, we wouldn't know if that's an insomniac thing or just a human thing. So we might want to see what happens with the normal sleepers as well. So do normal sleepers perceive their sleep time correctly. So we might do the same thing with normal sleepers and test the null hypothesis. Yeah, sure they do. in a sense that the true mean mismatch time for normal sleepers is zero uh against the alternative that it's wrong that is not the case and that the true mean mismatch time for normal sleepers is not zero pretty reasonable questions uh that we might think of but overall what probably makes a little bit more sense here for the arguments I was saying that if we're just looking at the insomniacs then we're not going to know whether that's a human thing or an insomniac thing. Why don't we compare these two groups? And this is a very common thing. Like I said above, we're very often interested in comparing these two groups. So does the mean mismatch time for insomniacs differ from normal sleepers? And by mean, I'm talking true mean here. These hypotheses always involve uh the parameters and never statistics. I'll say this in a number of places, but these are always hypotheses are about parameter values here. So we might com test our null hypothesis and nothing's going on. The insomniacs, the normal sleepers, same uh true mean mismatch time. So a very very common thing that we want to investigate in this world in all sorts of ways in science and business and uh everywhere. Okay, just the we're interested in comparing two groups. So testing that the null hypothesis that these two groups have the same true means the population means for those two groups are equal and the alternative hypothesis that they are different. This is extremely in common type of thing that we want to do in this world. It's just a very natural question that arises in all sorts of situations. So let's have a look at the data that we have in this case. Okay. So in this particular one I having box showing box plots here have 159 self-declared insomniacs and 288 normal sleepers. See the description for the citation for this particular study but this is real data. Note that there's a difference in sample sizes here. 159 self-declared insomniacs, 288 normal sleepers. We can compare groups with different sample sizes. This is totally fine. The math takes care of it. The sample sizes will find their way into the equations that we use. Um, when we're looking at box plots and the sample sizes differ a great deal, that

### [15:00](https://www.youtube.com/watch?v=_HOW_zFRCpk&t=900s) Segment 4 (15:00 - 20:00)

can distort things a bit because when you have a greater number of uh of observations, then you tend to have a greater number of outliers. And it can be a little bit distorting here when we're looking at groups in box plots that have a greatly different number of observations. But here the difference in sample sizes would not make up for what we're seeing here. Lots of outliers for the normal sleepers. Lots of outliers. No outliers here. And for the insomniacs, just sort of overall this shape looks to be a fair bit different between the two. But perhaps most notably in all of this, going to the questions that we're we're interested in, just it really does look like this distribution for insomniacs is shifted up here somewhat. It's shifted up here. It just it looks fundamentally different from what's going on with the normal sleepers. And this is the mismatch time. Again, remember positive mismatch times are indicative that the individuals uh said they got less sleep than they did as measured by technology. So it looks like there's something going on here. And in my box plots, I put a little just a little red dash there representing the sample means. So that's where the sample mean is here. The distributions look roughly symmetric. So sample mean is really close to the median in both spots, but just looks like something's going on, right? Looks like there is something going on. So we we just visually say that that's a pretty big difference. That difference there is I think overall it looks to me like there is a lot of evidence that there is a difference between these groups. And if we use our methods to compare these things later on, well, we'll talk about this when we talk about things like a two sample t test. If we use those methods later on uh that we'll discuss in great detail later on, we would find very strong evidence that the true means here differ. this sample data those differ that there's very strong evidence that the true means differ and we could analyze this and talk about this for a long time. One thing to keep in mind here I'm glossing over the idea of how we got these samples and the possible sampling bias that might be there. All that important stuff that I've talked about along the way uh still applies here. Uh but our discussion of hypothesis testing has a lot of important nuance. So it might be glossing over that for a bit, but it's all still underlying this. The ideas of sampling bias, how our study design might be distorting things, those are all important notions. But back to this, it really looks like uh on average our mismatch times for insomniacs are greater than that for normal sleepers. And that's what the math bears out when we do our statistical methods later. If we carry out a what we'd call a two sample t test on this extremely common thing that we do in statistics, uh we would find very very strong evidence that is a real thing and not just you know regular old sampling variability at work and that these selfdescribed uh self-described insomniacs tend to uh underestimate the amount of sleep that they got. And more formally in the hypothesis test here, we would say we'd have strong evidence that the true mean mismatch time for insomniacs is greater than uh for normal sleepers. We'll learn all about the formal details of that and the nuances of that two sample t test later, but that's the gist of it. So we look at another one here. Let's say a study investigated a possible relationship between hand grip strength of young men and their facial attractiveness as perceived by young women. So, we're going to have a study here in a minute where the these male volunteers have hand grip strength measured and they're they also have their facial attractiveness measured uh by young women. This is a college study. They do these things and common type of study. big picture-wise, then we might think that we'd have a null hypothesis that there's no relationship between these variables. Like really big picture, like no relationship between hand grip strength and facial attractiveness as our null hypothesis, right? Nothing's going on. No relationship between these variables. Our alternative hypothesis would be that there is some sort of relationship between them. Now, that's casually phrased. we need to shore that up a little bit uh and test something a little bit more specific. So what we do and we'll talk about this much more later but just to give you some idea what we do is we make that a little bit more mathematical and we test the null hypothesis that the true correlation is zero. or in other words that in reality overall there's no overall increasing or decreasing trend. If there's if there in fact is absolutely no relationship

### [20:00](https://www.youtube.com/watch?v=_HOW_zFRCpk&t=1200s) Segment 5 (20:00 - 25:00)

between these variables then there would be no overall increasing or decreasing trend. Uh but we would have this as a formal mathematical test about a parameter specifically that true correlation between these variables is zero. And the alternative hypothesis be that hey it's not zero that there's something going on the true correlation is not zero that in reality there's some sort of increasing trend or decreasing trend between these variables and we want to collect some data and investigate that and in fact his research is dead. So we had these 32 male student volunteers have their hand grip strength measured in kilograms force facial attractiveness assessed by 79 female student volunteers viewed pictures of the men's faces on a computer. So each of these 32 males, so there's 79 females in here, but the we have a sample size of 32 uh because each male has their hand grip strength measured. So one variable hand grip strength and another variable, their facial attractiveness. So two variables measured on each of these 32 males. And what happens when we have a look? Certainly not obvious what's going to happen. It would not be the weirdest thing in the world to think that there might be some sort of positive relationship between these, right? That and we're only looking at one study here. So, in this particular example, for the what we see here, there doesn't look to be much going on. Uh this looks like somewhat like a random scattering of points. There's certainly no obvious clear relationship between these variables in this particular scatter plot. no clear increasing trend or decreasing trend. So overall, I think our gut's going to tell us that here we're not going to have a lot of evidence against the null hypothesis. And that is in fact what happens if we crunch the data that will crunch this data using the methods that we'll learn later on. If we do that, we find that we don't have any meaningful evidence against the null hypothesis. This sort of plot is something that you'd see, you know, frequently if there in fact is no relationship between these two variables. And so for this particular data set, uh we don't have uh any real evidence against the null hypothesis and in favor of the alternative hypothesis. So we're very interested in these types of questions in the world of science. Sometimes just the question of do you have evidence of a relationship between variables is the big question. Sometimes that's the question. And we'll look at these sort of methods later on later chapters when we talk about uh simple linear regression and correlation investigating relationships between two quantitative variables like this. Now let's look at an example involving a single mean. Uh in this example, mining companies subject to government regulations on their effluent and in some areas including where I am, uh the arsenic level here, the mean arsenic level in their effluent has to be no greater than. 3 mg per liter. Now I'm glossing over some of the testing procedures. I'm sure the regulations are, you know, very detailed in all of that. So, but this is the spirit here. uh we need this mean arsenic level to be no greater than. 3 milligrams per liter. So if we're writing out translating this into hypothesis, it's going to depend where the burden of proof lies. If we suppose at first that the burden of proof is on the government uh to show that the company's mean arsenic level exceeds that level, then we're going to put that in the alternative hypothesis. That alternative hypothesis is going to be that the true mean arsenic level in their effluent is in fact greater than that. And this is because we're giving the null hypothesis the benefit of the doubt and then seeing if the data gives us evidence against the null and in favor of the alternative. So, uh, our null here, now I'm going to write it like this at first is how we're going to go with this. I'm going to say the null hypothesis is that the true mean arsenic level in their effluent is actually equal to 30 mg per liter. And you might think, hey, wait, hey, wait. Why did we not do this? say the true mean less than or equal to 030 with this same alternative over here? Um we could and do and that's a very very reasonable approach. In fact that's just sort of more intuitive approach to writing out the hypothesis here. We'd say, "Hey, wait. If we did that, we had this alternative over

### [25:00](https://www.youtube.com/watch?v=_HOW_zFRCpk&t=1500s) Segment 6 (25:00 - 30:00)

here. " Then the null and the alternative, they span the entire space. They're compliments. They split it up really nicely. The null, if the null is true, the null is true. false, then the alternative is true. Everything's just beautiful and works out really cleanly conceptually. And so this really is the nicer way to to write them, I think, at least at this point conceptually. But at some point, we're going to need to test an actual value. put a number in our formulas. Um, and I'm going to want to use language saying things like the distribution of the test statistic when the null hypothesis is true. And these that sort of language only works when we're testing point hypothesis here that your parameter being equal uh to a certain value. So although conceptually it's perfectly legitimate to write a null hypothesis like this with this alternative and many sources do and that's even a little bit more understandable uh right now. Um at some point we just have to write it a little bit differently in order to use the language that I want to use. Uh and additionally really we'd see that these two things they're going to mean the same thing in the end once we carry out uh carry out the test. If we find strong evidence against the null hypothesis and in favor of the alternative over here, well, we would have found that same strong evidence with this other uh method of writing down the null. So the they look a little bit different here and you might wonder why we're setting this parameter exactly equal to a certain value. But this is how we're going to write it for various reasons um that are are meaningful to us in hypothesis testing or at least the language that I use in hypothesis testing. Um but it is perfectly legitimate to write the null hypothesis in this way. Now the idea here, let's go back to the one of the main points is that the burden of proof was on the government and we're giving the null hypothesis the benefit of the doubt in these in hypothesis testing. So if the burden of proof is on somebody, we're not going to put that in the null hypothesis because you're not give the benefit of the doubt to the person that has the burden of proof. It doesn't work that way. So this would be the appropriate null and alternative in this particular setting. But if we change the situation a little bit and the burden of proof is on the mining company to shows that there is in fact less than. 3 milligrams per liter of arsenic on average. If that's where the burden of proof lies, we're going to put that in the alternative hypothesis. Say the alternative hypothesis is that the true mean arsenic concentration in their effluent is in fact less than30 mg per liter. This is the thing that they need to show. get uh collected data and see if they have evidence in favor of this alternative. So our null hypothesis is going to be that they're not meeting that essentially and that the true mean is equal to30. We're going to put that our parameter is equal to this certain value. Again, it would be completely appropriate if we had it over here. You might think, why aren't we writing it like this? This would be a legitimate way to write the null hypothesis in this particular situation. But I am going to use this uh over here that the parameter being equal to a specific value. In the end, we're going to be putting a specific value into the formulas that we have not discussed yet. Uh and we're going to have to put a single value to the test. So we're going to be doing that. But in spirit, these two things are really trying to achieve the same thing. So, one thing to keep in mind, I brought this up before and I'll bring it up again and it is extremely important is that these hypotheses always involve parameters and never ever statistics. Please never even think of writing something like this. This is just wrong in so many ways because Xbar is a statistic and that has no place in a hypothesis ever. Hypotheses do not involve statistics. Hypothesis are statements about the world. Statements about parameters, specific values of parameters, not statements about statistics. This is very important because that is what statistical inference is all about. Okay, let's look at some of the logic of hypothesis testing here. So, we're going to do things. a simple case, one of the simplest cases we can construct. Uh, and we'll go through some of the math behind this. So, let's say you have a friend Tom. Friend claims that he's developed a coin tossing technique that allows him to have uh tosses of a coin come up heads more than half the time on average. Okay, that's interesting. Uh, Tom's a bit of a con artist, bluster here. So, he might just be lying to you, just telling you tales. Or maybe he's

### [30:00](https://www.youtube.com/watch?v=_HOW_zFRCpk&t=1800s) Segment 7 (30:00 - 35:00)

done it. Maybe he's figured it out. So, we want to put Tom to the test. We have no real idea whether he's telling us the truth or not, but we're not certainly not going to take his word for it unless he proves it to us. So, Tom's saying if we Tom's going to have to show us that he has evidence that that he can do this. So we'd put that in the alternative hypothesis. The alternative hypothesis here, the reasonable alternative hypothesis would be that Tom is actually telling the truth and that Tom's true probability, this P representing Tom's true probability of tossing heads on the coin is actually greater than 0. 5. And our null hypothesis is going to be n man, he's just, you know, he's just full of it again. And it's actually just saying it's just going to be 0. 5. And we're going to assume uh here that we're talking a fair coin, okay, in all of this that we're going to have Tom toss a coin in a minute, but we're conceptually we're going to assume you're tossing like a fair coin. Um and so people are just tossing it to sort of normally eyes closed kind of thing that it comes up heads probability 0. 5. So we're going to see now if we have evidence in favor of Tom's claim here. This is Tom's claim that he can toss this coin such that the probability of getting heads is greater than 0. 5. He's got some technique to make heads come up more often. Okay, we give him an ordinary quarter. We're going to say let's suppose this is a fair coin. You know, ordinary quarter isn't perfectly balanced, etc., etc. But we'll say just in this case, it doesn't make too much of a difference. Um, and we're going to watch him toss this coin. We're going to work under the assumption that he's not conning us and slipping in a fake coin. Um, and just overall that this coin he's tossing is is a regular coin, but that it's close enough to fair to think, okay, this this is a fair coin. Uh, and he toss it 100 times and he makes the coin come up heads 68 times in those 100 tosses. Well, right away this is looking like just casually that there appears to be quite a bit of evidence here that Tom does have a technique that makes heads come up more often. Just I think by gut feel 68 out of 100 is quite a few. 68 heads and 32 tails and 100 tosses is is a lot. Uh it feels like that anyway, but mathematically we're going to want to quantify that. So, does this result provide strong evidence that Tom's claim is true? Okay. Well, in hypothesis testing, what we do is we pretend for a little while that the null hypothesis is in fact true. We give it the benefit of the doubt for one, but then when we're doing our calculations, we say, hey, suppose the null hypothesis is in fact true. What are the chances of getting what we got? this kind of thing. Okay, This is the basic logic. Let's presume, let's assume that the null hypothesis is true for a moment or two. Uh what are the chances of of getting what we actually observed here? So if the null hypothesis is true, remember that then we've got ourselves a binomial problem, straight up binomial problem, where X, if X is the number of heads Tom gets in a 100 tosses. If the null hypothesis is true, that has a binomial distribution with parameters 100, 100 tosses, and a P of. 5. That's what we might say is under the null hypothesis that this is the P if the null hypothesis is in fact true. And our binomial distribution with those uh parameters looks like this. Plotted out a binomial distribution uh with these parameters here between 20 and 80. I truncated it a little bit. And what Tom got is 68. So Tom got 68 which is right there. That's Tom's 68. That looks pretty weird. That's pretty far out. That's really compelling. Just visually here, that is compelling. If this is the distribution of heads, when we toss a coin 100 times, Tom managed to get way the heck out here in this right tail. That is pretty far out there. So, we want to do is quantify how far out there that is because we can't just be wandering around and talking like this. I can't be going, I think it's pretty far out there. And you go, I don't think it's that far out there. Right? That's not a good way to go about things. So we need to shore this up mathematically and scientifically how we go about this sort of thing. So we ask ourselves this question. We say hey if the null hypothesis is true what is the probability of getting what we got here or something even farther out something even farther out. Now why farther out? We'll talk about this in detail in a number of places. We'll talk about this more for sure. But it's not just 68 because had he gotten 69 or 72 or 83

### [35:00](https://www.youtube.com/watch?v=_HOW_zFRCpk&t=2100s) Segment 8 (35:00 - 40:00)

heads or that kind of thing, we would have thought that even more evidence against the null and in favor of the alternative. So what we do is we figure out the probability of getting what we actually observed or something with even more evidence against the null hypothesis. And so what we want to do then is figure out the probability of getting 68 or more heads if the null hypothesis is true. And for that we could just go straight to software, right? We don't want to calculate all these probabilities of 68 69 all the way up to 100. I don't want to do this. So we could use the pinome function in R or another function in other software. uh and we'd set it up like this because the pinome function gives us the probability of getting less than or equal to the value we put in there. So if we went that to R 1 minus P binome 67 and the parameters 10 and. 5 what R tells us is that probability is 0 0 2 0 okay to five decimal places that is a very small probability this is the probability of getting 68 or more heads if the null hypothesis is in fact true if Tom's just tossing an ordinary coin those times. I'm assuming a fair coin. The probability of getting 68 or more heads. 02. That's about one in 5,000. That's very small. So either Tom has figured something out and is in fact tossing uh heads more often than 0. 5 or we witness a very unlikely series of events here and that he did these 100 tosses and just happened to get this very large number of heads. So very small probability of seeing what we observed or something farther out which tells us that we either witnessed a very unlikely event under the null hypothesis like the null hypothesis might still be true and we just witnessed something highly unusual or Tom's actually correct and he's tossing this coin in such a fashion that he's getting heads to come up more than half the time. Okay, so this is the basic logic of hypothesis testing. We're giving the null hypothesis the benefit of the doubt. We collect some data. We see how much evidence we have against the null hypothesis and then thus in favor of the alternative hypothesis. Here we had a lot of evidence against the null hypothesis and thus in favor of the alternative hypothesis that Tom's probability of getting heads is greater than a half. had we gotten something let's say here at you know just over 50 let's say Tom got 52 heads or something then the probability of that happening is is not that small right is it be a little less than than than 0. 5 if you probability getting 52 or more heads say now Tom may be going see I told you I got 52 heads out of 100 but it's not going to be extreme enough for us to say hey that's a lot of evidence for your claimed value here that's that that's that's not going to be what we're what we do. We'd say, "Oh, geez, it's it's quite likely to get 52 or more heads in a 100 tosses. " So, let's this not really showing any real evidence that you're telling us the truth here. So, this is the general idea of how we go about getting a this showing evidence against the null hypothesis and in favor of the alternative. There'll be many ways uh different types of statistics that we use to investigate this um and a lot of nuance along the way but this is the general idea that we have. So to recap some of this our hypothesis testing is going to consist of formulating null hypothesis and an alternative hypothesis. These hypotheses are based on the research question of interest and not on the observed sample data. We have questions we are trying to answer. Okay, we have questions we're trying to answer and we think hypothesis testing can help us answer those questions. If we don't have questions we want to answer with hypothesis testing, then we don't carry out a hypothesis test. This is a very basic bit of logic that seems to be lost on a lot of people who make stats videos, right? That hypothesis, we don't just go looking around. What hypothesis can I test? Right? you have a question that you want to investigate, a question you want to investigate that hypothesis testing can help you answer. And in those situations, then we go through and carry out a hypothesis test. Always remember that these hypotheses are going to get translated into the values of the parameters. Maybe that two true means are equal, two population means are equal. That's a huge thing that we test a lot as a question of interest a lot and all sorts of other things that we're interested in. But uh they are um uh hypotheses about the values of

### [40:00](https://www.youtube.com/watch?v=_HOW_zFRCpk&t=2400s) Segment 9 (40:00 - 42:00)

parameters and never sample statistics and they're questions. They're big picture questions that we had about the world things we're trying to answer and we're trying to use hypothesis testing to help us answer that. So we get some sample data. We're going to choose an appropriate method of analysis. Get an appropriate test statistic. In the coin tossing example, I simply used the number of heads as our test statistic. We were doing that, but there will be much more complicated test statistics along the way. And based on the value of that test statistic, we are going to get an assessment of the strength of the evidence against the null hypothesis. That will be based on the value of the test statistic. The test statistic is going to uh when we calculate the value of the test statistic, we're then going to use that to assess the strength of the evidence against the null hypothesis and thus in favor of the alternative hypothesis. Maybe we have no evidence against the null. Maybe we have lots. We'll see what the test statistic has to say. Then eventually we're going to properly interpret the results in the context of the problem at hand. So we get the data, we come up with our appropriate method of analysis, we calculate our test statistic, we come up with a measure of the strength of the evidence against the null hypothesis. Maybe at first we make a brief statement about what that means in terms of those parameters, but in the end it's going back to the context of the problem at hand. What does this data tell us about our original research question? What does it state about that? That is what we're doing. And so in the end our conclusions must relate back to our original problem of interest. We have to know what it means there or the rest of it is useless. Okay. So these are some big picture ideas of hypothesis testing. I'm going to move on to hypothesis tests for a single mean mew. We're going to then talk about some of the nuances in hypothesis testing in this setting. uh and many of those ideas will hold for other settings as well. And then in future chapters, we're going to talk about u all sorts of other uh scenarios in which we can use hypothesis testing to help us answer questions that are important to us. It is a big part of statistical inference. Little nuanced, little tricky, little controversial, but important. Uh and so I will talk about all those details in future videos. We'll see you later.

---
*Источник: https://ekstraktznaniy.ru/video/52881*