# [bsr47] Introduction to Biostatistics: Chapter 12  Analysis of Frequencies (part 5/5)

## Метаданные

- **Канал:** statisticsmatt
- **YouTube:** https://www.youtube.com/watch?v=GMWyoTiMZ2g
- **Дата:** 08.06.2026
- **Длительность:** 20:03
- **Просмотры:** 13
- **Источник:** https://ekstraktznaniy.ru/video/52886

## Описание

The videos on this YouTube Channel are not affiliated with The University of Missouri or my role as a professor at the University.

Here's a link for pdf's of certain videos. https://statisticsmatt.gumroad.com Also note that if a pdf of the video you are wanting is not uploaded yet, please reply in a comment that you'd like me to upload and I'll do it.

Help this channel to remain great! Donating to Patreon or Paypal can do this! 
https://www.patreon.com/statisticsmatt
https://paypal.me/statisticsmatt

Several playlists are in PDF format and can be purchased at 
https://gumroad.com/statisticsmatt

## Транскрипт

### Segment 1 (00:00 - 05:00) []

This is an introduction to biostistics using R. We're in chapter 12, the kaiquare distribution and analysis of frequencies and we're in section seven which is relative risk odds ratio and the mantel hansel test. Okay, first some nomenclatures. Um the different types of observational studies. One is called prospective and that's where you start. now and follow patients into the future. You follow them prospectively. Now researchers they identify a cohort you know based upon some exposure status and then they track outcomes over time. Okay. The data is collected as the events unfold allowing for better control of over variables and timing. An example might be that following a group of patients with high cholesterol to observe whether they develop heart disease over the next 10 years. That would be an example of a prospective study. Now retrospective observational studies, it begins after the all the outcomes have already occurred. Okay? And the researchers look backward in time using existing records or patient recall to excess exposure. Now this is often quicker. It's less expensive but more prone to bias. You know the patient recall could be you know an issue. Incomplete records and so there's all there's opportunities for incomplete information which can lead to bias. Okay. So example, reviewing medical charts of patients who had heart attacks to see if they had high cholesterol in the past. So we go back through the records and retrospectively look it up. Now, relative risk. Okay, it's it's perhaps easier to just give an example. So let's say we're evaluating a new vaccine uh to prevent influenza and they conduct a randomized control trial with two groups. So we have a thousand in the vaccinated group and unvaccinated group. We follow them um prospectively and after flu season, our data is all collected and we notice that 50 got the flu in the vaccinated group and 200 people uh got the flu and the unvaccinated. So we can create a little 2 by two table and notice that you know there's a thousand in each group and then um of the vaccinated 50 got flu and 200 got the flu and the unvaccinated to calculate the relative risk is these you calculate two conditional probabilities. So in the numerator it's the probability of flu given that you were vaccinated. So that little line means were in the vaccinated group. And how would you estimate the probability of getting flu? Well, there's a thousand that were vaccinated and 50 got the flu. So this is the probability of getting the flu in the vaccinated group. So 05 and the relative risk then you divide by the next conditional probabilities probability of flu given that you're unvaccinated. Well there's a thousand in the unvaccinated and 200 got the flu which is 20% and so the relative risk is this ratio and it's 0. 25. Now how do you interpret that? Well a risk ratio of 0. 25 25 means that the vaccinated individuals only had 25% the risk of getting flu compared to the unvaccinated. You could also put it another way and that the vaccinated the vaccine reduced the risk by 75%. Now looking at the risk ratio if it's a one it means there's no association. If it's greater than one, there's an increased risk of flu with vaccine. And if it's a if the relative risk is smaller than one, it means there's a decreased risk in flu. Now, let's look at the odds ratio. And these are for 2x two contingency tables to calculate the odds. Now, first the when we're calculating the odds, it's it's the number with characteristic divided by the number without the

### Segment 2 (05:00 - 10:00) [5:00]

characteristics. Um, and it's an actual number. a lot of you can convert these like the probability of patients with the characteristic and the probability without but I think strict it's strictly defined as the number not a probability okay so we have a 2x two table and patients with the characteristics so that could be the vaccinated group and this is the unvaccinated and then this could be sample one could be the patients with flu and this is they did not receive flu. Okay. And we're calculating the odds ratio. So we're going to calculate two odds of some something and then divide them and that and that's the ratio. And so the odds of an event in the exposed group to the odds in the unexposed group. So we take the sample of odds of the charact you know of the characteristic in sample one. So that this in sample one means that we're in this population and the odds are it's the number with the characteristic divided by the number without the characteristic and that's the numerator. And then the denominator is calculated in a similar fashion. It says the odds of the characteristic in sample two. So we're in sample two. This is the number with the number without. And so that's the odds. And you know using divi, you know, division with fractions, you come up with this. And a lot of people, they just know that the odds is the product of this divided by the product of that. And that's what this represents. Now, if we're calculating the odds ratio in the previous example, we'd have the odds of flu given the vaccinated. So 50 received flu, 950 did not. So it's an odds of 1 to 19. The odds of flu in the unvaccinated. So you go to that the table, previous table, and it's 200 over 800, the number with flu, the number without flu and one to four is the odds. So using division you with fractions, you get 419. So the odds are or the odds is. 21. Now how to interpret that? Now odds ratio of 21 means the vaccinated individuals have a reduced odds of flu compared to those individuals with the unvaccinated. Now we can take this reciprocal right switch the numerator and the denominator and then you can say the odds of flu in the unvaccinated individuals is 4. 75 times higher than the odds of the flu in the vaccinated group. That's another way to say it. Now the mantle Hansel statistic is we're trying to calculate the odds or the odds ratio but while controlling for a third variable and and so it's we're going to estimate the pool odds ratio AC across multiple strata and some you know sometimes the strata are age groups study centers and we're trying to adjust just for those confounding factors. And what the test statistic is it essentially calculates weighted average of those stratum specific estimates of the odds ratios. Okay. Now this is mainly used for 2x two tables across strata. And I think an example will help illustrate this. So let's look at an example. And this is taken from the Rhel menu for uh the Mantel Hansel test. And here we're looking at applicants to grad school at Berkeley. And we're going to look at the six largest departments. And we and within each department we have it's classified at as admission, yes, no, and sex, male, female. And then we have six departments. Okay. So this is stored the data are stored in a three-dimensional array. So it is a two a 2x 6 matrix

### Segment 3 (10:00 - 15:00) [10:00]

or an array and you know the first two dimensions deal with admission sex and then department. Okay. So, admission, gender or sex, and department, ABCDE, E. And our goal is to compare the odds of males being admitted to the odds of females being admitted to these to the school. But we think department may confound that relationship and we'll look at that. So if we look at the dimensions of the data set, it's 2x 2 by six. And in my opinion, a very simple and easy and nice way to display the data is with this F table. F stands for flat. It's called a flat table. And whatever the dimensions are here, the last dimension or last variable is are the columns. And then it creates different rows for all the combinations of whatever or however many dimensions you have here. But there's two. So there's admit and there's gender and then this is department. And then we can with a table visualize you know the males and females being admitted and rejected by department. Now we can collapse dimensions and look at tables. So for instance, let's say we only wanted to have a 2x two table of admit and gender and collapse it over department. And this is the way that you would do it. You take the data set which is that 2x 6 array. We only want the first two dimensions and we're going to sum over that third dimensions. Means we're just going to collapse them together. And there's different functions you can use here, but it's the apply functions. We're going to apply over this array. We want these dimensions or margins, I think, is the argument name there. And we want a sum. So, we create a 2x two table. Now, we can look at the numbers, but I like looking at proportions because then you can compare more easily. So, I'm just going to add proportions to the front of this 2x two table. And we get this. And if we were to not factor in department, it does look like males are being admitted at a higher rate than females, but that's of course without controlling for department. Now the way R if you were to just type the data now remember this is not a uh data frame you know where columns represent variables and it's not printed the way you think about what it does it whatever the first two dimensions are it creates that contingency table and then it does it for each specific stratum of the later dimension. And so there's three. So it conditions on each department. That's the third variable. So it creates a 2x two table for department A. And this is it. And it does. Yeah. Actually, I'd have to look at the proportions of those to know if male or female is being admitted at a higher rate. But then, oh, I highlighted B, which I shouldn't have done. So here is over B. This is C. And if you were to if I were to slow down and look at these individually, some departments admit more males, females. So that third variable department is a confounding variable. it it's confounding that the odds of male to female being admitted. Our goal is the odds ratio of males being admitted to females being admitted. But department is a confounding factor that may influence that. So the mantle Hansel test tries to accommodate that that confounding variable. So what the

### Segment 4 (15:00 - 20:00) [15:00]

null hypothesis is that the odds ratio for each of those departments is the same and then the alternative is that the odds ratio is not the the same. So, first let's look at a Fischer's exact test of just the collapsed 2x two table. And if you remember, that's what this is. And so, we're just looking at the odds ratio of males to females being admitted. And when we look at this test, it's zero. So, the odds are not the same. And so it looks like males have a higher odds of being admitted than females, but really we didn't factor in the department. And that's what the mantle handle test does. You can't like since this is already set up in a 2x 2x6 array, we can just enter it there and it treats that last factor as the confounding factor. You can put them in separately like the it there's an x and a y and a z arguments that you could supply it, but this is already set up. So we just do the mantle hansel test. And here the p value is 23. And so the odds of males being admitted to females being admitted while controlling for department. There's not enough evidence to conclude that the odds ratio is different than one. Meaning that males and females would be admitted at different rates. Well, to conduct this test, we set up the stratumspecific 2x two tables, right? So, this is the, you know, this would be admitted, not admitted, male, female. And we want to calculate the mantle Hansel test. And so, this is it. We call it X squar. You know, they put a subscript MH to emphasize it's the Mantel Hansel test. And so what it does, it looks at that number for each stratum, adds them up. It takes and then subtracts what the expected value is for each of these cells. And the and we'll look at the divisor in a second, but the expected value is actually 100% what you think it is. You take the row total times the column total divided by your sample size. And that's the expected value for each specific stratum. And this denominator is the variance associated or actually this yeah the variance associated with all of these tables. And this is how you would calculate it. And it's and it of course it's uh stratum specific. But then you have to sum over each of those stratums to get that variance. And then this x squar value is a kai squared value with one degree of freedom. So it so we create alpha the degrees of freedom is one and then we find that critical value which is what that value is and we look at the see if the f the x squar statistic is in the rejection region right or if it's not and then we conduct our test. Now, when you're finished, you may want to create sort of a common odds ratio across all those uh stratum. And this is if you had to pick one number, this is how you'd calculate it. And you can see that you know it's very odds ratio like, but this is it. And that would be the common estimator. So if we were to go back and look at this example of admission, the common odds ratio is 0. 9. So that says that females have a slightly higher odds of being admitted than males when you factor in department. Okay. Well, that's all I have for this video. In the next chapter, we will talk about non-parametric statistics.