# [bsr43] Introduction to Biostatistics: Chapter 12  Analysis of Frequencies (part 1/5)

## Метаданные

- **Канал:** statisticsmatt
- **YouTube:** https://www.youtube.com/watch?v=I6Yrl9SzCbg
- **Дата:** 11.05.2026
- **Длительность:** 22:21
- **Просмотры:** 150

## Описание

The videos on this YouTube Channel are not affiliated with The University of Missouri or my role as a professor at the University.

Here's a link for pdf's of certain videos. https://statisticsmatt.gumroad.com Also note that if a pdf of the video you are wanting is not uploaded yet, please reply in a comment that you'd like me to upload and I'll do it.

Help this channel to remain great! Donating to Patreon or Paypal can do this! 
https://www.patreon.com/statisticsmatt
https://paypal.me/statisticsmatt

Several playlists are in PDF format and can be purchased at 
https://gumroad.com/statisticsmatt

## Содержание

### [0:00](https://www.youtube.com/watch?v=I6Yrl9SzCbg) Segment 1 (00:00 - 05:00)

This an introduction to biostatistics using R. And we're in chapter 7. I'm scrolling so you can see all the topics we've covered thus far and let's jump to that chapter. Now the kaiquare distribution is the distribution that's a workhorse for analyzing frequencies in this chapter. And really we're going to summarize contingency tables in unique ways. So the setting is we have two variables or more but we'll start with two and both are categorical or qualitative variables and then we want to summarize the frequency distribution of that joint distribution of those two variables. We're going to one test is called a kiquare test of independence and we want to test are those two categorical variables independent of each other. We're also going to look at what are called goodness of fit test. So we have a qualitative variable and can we model it with a posson distribution or a binomial or even a normal distribution and we're going to show the techniques for that and of course we're going to look at relative frequencies and odds ratios and stuff like that. Well, first let's look at some of the mathematical properties of the kiquare distribution. And again it's the workhorse. Now part of the kiquare distribution is something called a gamma function and it appears in statistics everywhere. And so some knowledge of it is worth it. And now the definition or the true definition of the gamma function which we'll denote by uh capital gamma of alpha and where alpha is a positive real number is by this and so this derivation assumes calculus which we're not assuming knowledge of that for this lectures but that's what's called an integral sign and we're looking at the area under a curve and anyway that represents what's called the gamma function and it's part of the definition for a kiquare distribution. Now some properties of the gamma function when alpha is one we can show that gamma of one is one. when alpha is strictly greater than one we can show that it has this recursive uh property. So gamma of alpha is you subtract one from it whatever that number is and then take it times gamma of alpha minus one those two are equal. Now the third property when alpha is an integer gamma of alpha is equal to the factorial function. And we'll show some R code illustrating these. When gamma is 1/2 when alpha is 1/2 gamma of 1/2 we can prove or show that it's the square root of pi. And now the following property [clears throat] which makes use of this property the recursive property and that gamma of 1/2 is square of pi. If we have gamma of n plus a al alpha so for instance gamma of 5. 5 or 6. 5 or 2. 5 that's what this represents. [clears throat] We can use this recursive algorithm to keep reducing it until we get to gamma of 1/2 and that equals square of pi. Now this product here you can represent it in what's called falling factorial notation and that's the number n with an underscore under that's called falling factorial. So we take this number and then we keep reducing it by one this number of times. And so if we have say 5. 5 um two falling factorial that means we reduce it twice. So that's

### [5:00](https://www.youtube.com/watch?v=I6Yrl9SzCbg&t=300s) Segment 2 (05:00 - 10:00)

um so we get 5. 5 times 4. 5. We do it twice. But here we're doing it all the way until we get to alpha of 1/2 and that's what this represents is that first part and then of course times gamma. Um the falling factorial isn't introduced in a lot of uh statistics courses and so I just thought I'd throw that in there. Let's do an R illustration of these. So gamma of four since four is a positive integer that's equal to three factorial and you can tell that both of those are six. The recursive property so gamma of 5. 5 is you subtract one and then take it times gamma of 4. 4. four and those two are equal. Gamma of 7 over two. So really that's the same as gamma of 3. 5. And using the recursive relationship we could subtract one subtract one until we get to gamma of 1/2 which is the square of pi. So those two quantities are equal. Now the actual kiquare distribution and I always draw the same curve. It's roughly this and there's some degrees of freedom associated with that kai squared and that's what this represents. And so n is the degrees of freedom associated with this kaiquare distribution and it's equal to this quantity here. So if x is positive and so the absisa is the x and this is the density value or the f ofx [clears throat] and it's this quantity 2 raised to n /2 gamma that's what we talked about a second ago is n /2 x raised to n /2 -1 e x over 2 minus one and these create general shapes that look like this. Now if X is a kiquare random variable with degrees of freedom n, we can show that it has a mean of n and a variance of 2n. And so as the degrees of freedom increase, the mean increases. So the curve tends to shift to the right some as the degrees of freedom increase. And of course the variability increases by 2. Now, here's some properties that statisticians should know and remember and prove at some point in their career is that if we have k independent kaiquare random variables that we can add the k variables to create a new random variable. Right? Functions of random variables are random variables. We can show that X has a kaiquare distribution with n degrees of freedom and that's determined by adding the individual degrees of freedom with the random variables x1 through x k. [clears throat] Property two. If X is a standard normal random variable and we square that random variable to create a new random variable that called Y then Y has a kaiquare distribution with one degree of freedom. And this property is used over and over in statistical tests. If X is a standard normal, we square it to get a kai squared with one degree of freedom. Well, these can be these two properties can be combined if we have a sequence of random variables that are independent standard normals and we square each of those to create a new random variable and then we add those to create another new random variable that the random variable X has a kaiquare uh distribution with k degrees of freedom, right? Because each of these are kaiquare one degrees of freedom, but then we're adding independent k independent kai squared. So that is another kai square with k degrees of freedom. illustrate these properties in R and um just show you that we can simulate these properties and they are exact. So we're going to do a simulation of 500,000.

### [10:00](https://www.youtube.com/watch?v=I6Yrl9SzCbg&t=600s) Segment 3 (10:00 - 15:00)

We're going to we have a random variable X and we're going to sample it 500,000 times. We're going to square each of those and call that a new random variable. Well, then that distribution of y is a kai squared with one degrees of freedom. So, let's create a histogram of our random variable y. So, that's our simulated distribution. And then we're going to plot over the top of it the theoretical kiquare distribution. And this is it. the simulated values. So they're normal, we squared them, created a histogram, and of course the red line is the exact theoretical. And you can tell those are so close to each other. Now, if we have we're going to take K equal 10. So we want k independent squared distributions. So we take random samples of size k, square them, and sum them. Then we're going to do that 500,000 times. plot the exact density of a kai squared with 10 degrees of freedom over the top of it and we get this. And so this the of course the histogram are the simulated observations. The line is the theoretical and you can tell those agree so close it's amazing. Let's look at the general kaiquare test statistic. Now x is a categorical random variable and let's say it has k categories a b cde e f g you know there's k different options there. We obtain a random sample of size n and we let o be the number of observations in that sample that equal that category. So for instance, let let's say we're rolling a die a good die. I don't know. So there there's six sides. And so the outcome of that roll of a die is categorical because it can only take on the values one through six. And so if we think about binning these and each roll is one of those six. So we can put a tally as the outcomes occur. And then that observation, however many are there, this is what we're calling O right here. So 01, the outcome for category 1 is five. O2 would be two in this case. 03 would be one. And that's what the number oi is. Now we also have to calculate EI which is the expected number of observations that equal category I and this it I say it usually depends it almost always depends upon the null hypothesis. So in this example, if we roll the dieice 36 times, we would expect each. So each outcome has a probability of six. So we would expect about six observations in each of those categories. Now here you know the categories have equal probability of occurring but you would have to put the respective probability of observing one of those categories. So E1 would be six, E2 would be six because that's what the number of roles we expect in each category. So now we set up the null and alternative hypothesis. And this is a very general we're assuming that the X has K categories. And we're testing is the probability of being in category one. This value that stands for the probability of the first category under the null hypothesis. And then this is the probability that

### [15:00](https://www.youtube.com/watch?v=I6Yrl9SzCbg&t=900s) Segment 4 (15:00 - 20:00)

we're assuming for category 2. And this the probability K. Now in the silly die rolling experiment, we'd say P1 equal to 16, P2= 16 all the way to P6 = 16. The alternative a at least one of those is not equal to their null hypothesized value. So our test statistic, which we're going to call X squar, is we take the observed minus the expected, square it, divide it by the expected, and we sum over all categories. And this is it. And now if we think about this test statistic, if the observed is close to the expected, that number is going to be pretty close to zero. So then when we square it and divide by E, that's still going to be pretty close to zero, which means the null hypothesis is probably true. If the observed and the expected are different, that difference becomes large and then the test statistic becomes large. Now we can show that the test statistic follows a kaiquare distribution with k minus one degrees of freedom. And as a reminder, this is how we calculated the expected value. It's how whatever our sample size is times the probability of the h the null hypothesis probability of being in that category. Now the kiquare distribution looks like this. There's a certain so of course the degrees of freedom will be k minus rejection region of alpha and then we want that critical value because if this test statistic is in the rejection region we reject a null and if it's not we don't. Well, that point right there is what we're calling that value k^ squ subalpha with degrees of freedom k minus one. And if our test statistic is greater than that, we reject. If it's not, we do not reject. Now, here's a big note that we must check. There's an assumption for this and that the expected values are all greater than equal to five and that's for all categories and you have to check that and we'll do an example where that condition is not met and where then we collapse categories the you know the categories with ex low expected values into a you know a bigger category and that's what I'm saying here it's common to combine find them. Let's do an example where we roll a dieice 50 times. Our null hypothesis is that the probability of each outcome is equally likely and should be 16. So that means the expected value for each of those categories. We rolled it 50 times. Each has a probability of 16. So we would expect on average to see about 8. 33 rolls for each of those categories. Okay, so let's simulate this. And I like this package called extra distra, which is extra distributions. And one of them is a discrete uniform distribution, which I like. That way I don't have to program myself. So we have 50 observations. Of course, there's K categories, you know, that are possible. The roll of a die one through six. So let's take a random sample from this discrete uniform 50 of them and store our observations in X. We can table them and we see that we rolled 9 1's, 6, 10 threes and etc. Now to calculate the expected value for each of these cells, we take n times the probability of being in each of those uh categories. And of course that creates this vector. We expect 8. 3 in each of those cells. To calculate the test statistic, we take the vector of values which is here minus the expected which is here square it divided by the expected sum over all categories and we get a value of 9. 6. So if this is a kiquare random variable and the degrees of freedom are five

### [20:00](https://www.youtube.com/watch?v=I6Yrl9SzCbg&t=1200s) Segment 5 (20:00 - 22:00)

right 6 - 1 and this point is the test statistic. So that's 4. 96 p value is this area. So that's the p value and so and that's the function p kai squared. We want our test statistic value degrees of freedom would be five. We do not want the lower tail. We want the upper tail. So that p value is 0. 42 and that's pretty large. So there's not enough evidence to suggest rejecting the mole hypothesis. And that makes sense because we truly generated data from a discrete uniform random variable. Well, there's built-in functions for this. Remember that our observations are in X. We want to table them, which creates, you know, the the number of times in each category, the OIS in a sense. We just say ki squar and then this is the output and notice the test statistic is the same and the p value is the same. So we do not reject. Now here notice there's no extra arguments. And when you just put it in like this it assumes that all the ps or the probability of being in each of those categories are equal. But sometimes that's not the case. And to differentiate the probability of being in each of those categories, we have to add this argument called P. But here, since all the probabilities are 16, we have to put one sixth into each of those. And of course, it produces the same output. Okay, so we're at um 22 minutes and we'll pick up with chapter three in the next video or section three. Oh, I can't find my cursor.

---
*Источник: https://ekstraktznaniy.ru/video/52890*