# Today I learned in R with Torsten Blass

## Метаданные

- **Канал:** Equitable Equations
- **YouTube:** https://www.youtube.com/watch?v=OiHEvqtOK9w
- **Источник:** https://ekstraktznaniy.ru/video/44683

## Транскрипт

### Segment 1 (00:00 - 05:00) []

Welcome back everybody to another today I learned in R. It's going to be an exciting lesson today because we've got to talk and blast with us and we're going to be learning all about hidden features and objects in R. And I'm excited about this because I actually have no idea what to expect. So very exciting times. We're going to be talking a little bit about chat GBT and some of the the pros and cons of how we use that etc etc. some interesting stuff about if you're coming from another language into R. Why R is kind of really interesting. But enough about all of that. Over to Toronto. Torton, good to have you with us. Who are you? Where are you? What are you about? What's the lowdown? What's the word on the street? Hi. — Yeah, excellent. Thanks. Um, yeah, thanks for having me, Andrew and Greg, being part of this series. Thanks for the invitation. I've been following your channels for a long time and found the videos very helpful. Also, as a motivation to see what's possible on YouTube when you produce great content. So, — thanks very much. — Glad to have you here, — Andrew. Lovely to see you as well. And by the way, Torson's uh channel is the data digest. We'll have a link in the um description below. So, go and check it out for sure. — Okay. So, I would like to show two things. One is a hack that I often use and then the other one is about how you can investigate the functions that you use for some statistical tests. So you know that when you use 1 col 10 it produces this sequence from 1 to 10. But also in the background it creates an object that you can summon when you type la you see that there's something called last value that gets created as an object before the expression is then printed to your screen. And when I hit tab it's there. So you don't have to always type the whole thing and it's just what I created. And I find that helpful at different occasions. One is if you create a test sequence for example and you say give me from 1 to 30 all the numbers in steps of three and you assign it to this test sequence object you don't really see what happened — it's just there and when you type in LA and then the tap key you see okay it goes from 1 to 28 in steps of three maybe it's not what you expected and okay I'm already learning stuff cuz I didn't know any of this is amazing thank you D brilliant absolutely amazing Amazing. — Great. I hope it gets better even. There is one hack. When you put parentheses around your assignment line, then it will print whatever you created by default. But it's a bit clumsy to have these parentheses in your script. — Yeah, I agree. — Some ways I use that. For example, when I investigate new functions or I need a refresher on something like a t test where you compare two vectors, two different distributions from 5 to 15. you get this result, right? It gives you the average of X and Y and it tells you that there's a difference of four with a 95% confidence interval ranging from minus 7 to minus1. It's not including zero. So, you know, hypothesis gets rejected and you get a p value less than 5%. And sometimes I'm in the flow and I just want to do structure. la to see or remind myself of how exactly are these results hidden in the background. So for example then I can go up and follow up with dotp value if I need that exact p value somewhere in my markdown script and then I can extract it but sometimes I don't know how exactly was that written and then I used the structure on last value — right absolutely fascinating okay I've learned more my brain is filling up to keep going this is great — quick question about this do you know if this was has like sort of always been there or is this something that's been added in recent years or would you even No, — it's part of the base package. — Yeah, I know that I can tell. Um, but that's very cool. — Very cool indeed. Love it. — So, similarly, you know that the linear model when you can predict the mileage of a car based on its weight from the empty cars data set, you get the intercept and the slope. So, you start at some value and then with every one unit of weight, you decrease by 5 miles per gallon. But usually we use those with the summary. So you could do summary last value. Of course, there's always the chance to use your arrow key up and then pipe it into summary or you use the base pipe signal that we now have. This all works. When I then see the summary of my linear model, then when I'm in a flow, I just type structure last value to again get all the results. And then I can go up and I learned that the R squar value that explained how much variance gets explained by this one simple factor. It's already 75%. So I just like it. You can even use it if you in the process of creating plots and then you're adding another theme or you turn something to lock scale then you could save your last plot with also last value turn it into a PNG or a JPEG file. So it works for

### Segment 2 (05:00 - 10:00) [5:00]

plots, for expressions. — Absolutely lovely. Yeah. I did not know any of this. So — do you have any maybe keyboard or shortcuts? Do you use like control one to switch to the script? — I used to do a little bit of that. I but I that, but not so much anymore. But I did find that quite useful when I was in the habit of doing it. Yeah. And yourself, Andrew? It's funny you had mentioned that because I'm very poor also in using my in using those keyboard shortcuts and I've kind of come face to face with this lately because one of my students actually brought me sort of I don't know if it's a posit sheet or something else but a list of all the keyboard shortcuts that exist in our studio and my jaw just kind of dropped and I realized how much unnecessary clicking I'm doing. Yeah, I think with me u what's slowing me down when I'm doing coding isn't so much that I don't have a shortcut for the keyboard, but that I usually don't know what I'm doing and I'm like staring at the screen scratching my head. You know, that's most of my time is spent feeling a little bit confused. Um but like I have I must say I do when I watch people that are much better than myself at coding kind of flipping around on the keyboard, it does look very impressive. You can see how it's a big time saver. Yeah, that's one thing that like if you delete something one character at a time, you know that when you hold the control key that it goes word by word. Ah, — I did not know that. — And that also works if you use the arrow. So now when I use the left arrow, I go character by character, but with the control key, I jump to different sections. — Also very useful. — This is really helpful. — These are great. — No problem. Okay. So then the second part if I may. — Yes, please do. — I was brushing up on my statistics going to this intuitive bioatistics book that is focusing more on intuition about all the statistical tests and approaches and not so much on the math and the programming. — I must tell you when you told me about that book I bought it. I have it now and I'm looking at it myself. So thanks for the suggestion. I'm reading it now myself. — Say the name of the book one more time for us all please because I don't know it. — Intuitive bioatistics. Yeah, a non-mathematical guide to a statistical thinking. — That's great because we're in an era where it's not like we need to do any of the math ourselves. You know, obviously we're sitting here on R, but the understanding has gotten even more important. — Yeah, 100% agree. — Exactly. And the chapter about confidence intervals for proportions, I found the following. So, we all know the prop test where it's about the binomial distribution. So, you're either a zero or one, false or true, yes or no, heads or tails. and it expects a x how many successes there are like positive tests and a sample size so let's say 9 out of 10 and then you get an estimate of the proportion 90% it assumes a null hypothesis that is 0. 5 like a fair coin and it realizes n out of 10 times I give you a confidence interval of 0. 54 to 99 not including the zero so a positive p value but now in the book it mentioned seven different ways you can actually construct this confidence interval. — Yeah. — And I found the bome library and then the benome confin function that when you just use it and you do 9 and 10 it produces here now even 11 different ways — you create confidence interval and you notice some of them go above one. — Yeah. which could not happen in theory. But the asmtoic method is actually the standard textbook method to produce these and it's not well suited for small sample sizes or ratios proportions that are close to zero or one. Oh, one more thing I forgot. If you're into um politics and polls, sometimes these polls mention that they talk to a thousand people asking if they vote for candidate A or B and then they produce this margin of error of plus - 3%. Right? And this you see here it's 53 or 47% and the 3% margin comes from a sample size of a th00and. So it's really hard to narrow that down. Not to mention that with an alpha of 5% there are 5% of polls that don't even include the right value. They are below 47% or above 53. But now how are these confident interval created with all these different methods? And here is my hack or my tip. When you come from a different programming language like SPSS, you do a lot of clicking, but do you really know which statistical test you use? And in all, if you just type in the function without the parentheses, so you have to delete them, you get all the mass. You get the entire code printed. And don't be intimidated. It looks like a lot, but that's because it's doing 11 different methods. If I scroll up to asmtotic the standard way, it tells you all I need is a standard error with some formula and then I subtract or add that to the proportion to get my confidence interval. That's all the Z value. It

### Segment 3 (10:00 - 15:00) [10:00]

also explains how it creates that because you give it a confidence level like 95% or 99. It produces the alpha and based on that alpha it uses the qorm function to get this critical value to know okay everything lies within two standard deviations or three that's how you get your zcore and that's how this is produced and now comes the bummer so first of all if you do the help on this function it tells you that the sm21 is the textbook definition and then it mentions another method with the link to Wikipedia that gives you the function it's a bit complicated but what I want to mention is that in the book it gives you the formula of this modified better version that takes into account that this standard confidence interval is probably wrong. It goes over one and all the others go more down to. 5. So this asmtoic method is also called the walt method and this is the modified walt. And in a textbook the modified walt is declared as add two to the successes and add four to the sample size. And this is only correct if you go for the 95% confidence interval. If you use 90 or 99 it's actually not two and four but and that shows this function. It's plus z^ squared divided by two or. 5 times and n is not added four but zed squared again. So if you use a different confidence interval for 95% z is 1. 96. So you round it to two that's why this function comes up with two and four but if you actually look at the Wikipedia article you see that it's working with z square to increase the sample size and also the success ratio. And I found that really helpful to see what is actually the formula used to produce these. And it's not complicated like here a square root or divide by sample size. And it tells you exactly how it produces these proportion prime or n prime these fake values that are different from the actual success n. And that works for most of the functions. like some of them they are quite complicated but it's nothing that you couldn't go through line by line and learn you could really learn what is actually the eight correction in the proportion test and one point I want to make with regards to CHBT when I tried to learn something new about like interactive tables the reactable one I would ask questions about how to modify stuff and it would give me function arguments that didn't actually exist in the function it would hallucinate like change the columns like this and it wasn't really an argument. So now you could in theory clear your whole console, type in the function name and then copy the whole thing into CHBT or CLA and it would know all of the capabilities of that function. — What goes into the arguments? How are they actually called? Because it would — that's interesting. — And I asked are you familiar with the package and the function before I gave you the code? And it said I know of the function and how it's used but I didn't really know all the details. So interesting interesting very — and I think I'm trying to remember if it's exactly in the Elmer package which is the sort of uh Hadley package for interacting with LLMs. I think it has a capability where you can say look at the help file for these functions. Um if not then it's one of the sort of ancillary things probably that uh that Pit puts out like the BTW package or whatever. — Yeah, you're right. the LLM tend to do better on things that are like more commonly used and so library that isn't one of the sort of big composite packages for instance it's probably going to have a higher chance of hallucinating on and sometimes you might encounter a function that will not give you the code but then you can use get anywhere like there's a command get anywhere media on default and then it tells you usually it just does some checks hey all these numbers and then it just sees if it's an odd number or an even to see where it has to cut to get the median you And some functions like the correlation function, it goes down to a call to C code where it tells you, hey, this is a complicated function, a correlation. This is standard deviation. And so I refer to C++ code. — Yep. This is such a good way of learning to code better. I mean, um, in addition to learning the stats better, I think that, uh, seeing the way that, um, other people who are presumably experts have done this can just improve your own coding so much. I know that's one of the main ways I've learned. — Well, I love that when we were just getting ready to go live on this, uh Andrew, you said, "Oh my goodness, we're going to nerd out. " And — Oh, yeah. Totally smile because this is like absolutely up your street. I mean, deep in the math. — Oh gosh. You know, looking at binomial confidence intervals or confidence intervals for proportions. I mean it is a deep well and I can tell you that the sort of stuff that is taught in like stats one where you do like a normal approximation it's what's called a asmtoic here — those are actually like really terrible confidence intervals and you mentioned a couple of the reasons Torstston but also even if you do have um proportions

### Segment 4 (15:00 - 18:00) [15:00]

somewhat near the middle even if you do have large sample sizes the confidence levels that you're saying you're getting are kind of reliably wrong you never quite actually get confidence level of 95% and just making the sample size larger doesn't automatic doesn't usually fix it. It's — because the underlying distribution is not really normal. — There's um issues with the discrete nature of the binomial distribution. Even the um continuity correction doesn't do it. I'm trying to think what else. It's not exactly something in my brain totally right now, but I do know that like this gets complicated and there's a reason there's 11 methods listed here. Uh well, I must say it just watching you work there at Horton and digging deep uh is fascinating. I mean, absolutely lovely. It's just got it warms my heart to know that there's people like you out there. — Love it. — Um and when I get stuck, I know where to turn. — One other thing I can actually add to this is um you can actually if you're wanting to learn about a function, you can also wrap the function name in the view command and that'll pull it up in your viewer window. — Yeah. If you try view like whatever the function is. — Yeah. So you can get it off to the side so you can that may or may not be useful to you. — Yeah. And then go line by line and — interesting. — I did not. — You can do a save as if you want, you know. I think you can at least. — Huh. — Anyway, — cool stuff. — Definitely next level. Well, this has been absolutely lovely. Um and for people that are watching, go into the description of the video below. There will be a link uh to all three of our channels and you can go along and I highly encourage you to subscribe to Andrew and Torstton's channels. I love them both. I bumped into Toron's channel and loved it so much I immediately reached out to him and said, you know, let's connect and I'm glad we did. — Um, any last words, Torson? Andrew? — Well, tell us a little bit about the kind of stuff people can expect if they go to the data digest. To what do you usually do over there? — Yeah, I have that open. It started as a graph tutorials because I also wanted to learn how to make ggplots for work and it had a bit of a steep learning curve when you're just used to making charts in Excel. And then I went into just looking up functions that I didn't completely understand. First, I tried to copy the Tidy Tuesday summaries that David Robinson did on his channel to shorten his live streams. That's where I learned also a lot because he's such a talented coder. And now I just make videos about stuff that interests me wherever it takes me. Could be historical data sets, could be web scraping, chess tournaments. Toron at toward the bottom of the screen you've got a thumbnail there that says group by summarized and pivot wider and I was Andrew and myself were in the middle of uh piecing together a book on our programming and we were talking about a chapter that we're doing on data you know kind of shaping your data and I having seen that thumbnail and I I'll confess I haven't watched that video yet but having seen the thumbnail I said to Andrew oh by the way we should think about a interesting use case for pivot wide you know because usually what we're trying to do is pivot long. Um and uh and I just and I and I referenced your YouTube title. I said I know Torchton's got a video that looks into a particular use case for pivot wide. We should have a look at that. There might be something interesting in it. So um as I saw that thumbnail on there, I just uh I couldn't help but smile and remember that conversation. Well, Torstston, this has just been absolutely amazing. I learned so much from you and have new ideas going forward. Everybody, please check out the data digest. I know I will be spending more time over there. Also, as usual, check out Greg's channel, our programming 101. And uh if you're here through one of their channels, come see me on equitable equations. Um it's uh it's this has been great. So, thanks everybody. — Thanks. Thank you.
