Once again, I'm hanging out with Greg Martin of @RProgramming101 to talk about some of the cool new stuff we've learned recently. I get into the new dplyr function filter_out(), while Greg teaches us all about uncount(). If this vid helps you, please help me a tiny bit by mashing that 'like' button. For more #rstats joy, crush that 'subscribe' button! And don't forget to check out my website, www.EquitableEquations.com!
Оглавление (3 сегментов)
Segment 1 (00:00 - 05:00)
Welcome back to today I learned in R. This is a video that Andrew and myself share on our respective YouTube channels where we teach each other something. Andrew, what are you going to teach me today? Hey Greg, it's good to see you. I'm really excited. I'm going to be talking about a new dplyr feature, which is the filter out function. How about you? Okay, I'm going to teach you about something I learned recently, which is uncount. Okay, so we've all used the count function many times and I've bumped into the uncount function and it's very easy to use. I'm going to teach you about that. If you haven't yet, if you're watching this on my channel and you haven't yet subscribed to Andrew's channel Equitable Equations. It's super duper good. I use it all the time. It's a great go-to place to learn about R and statistics. And what's your channel, Greg? We got to subscribe to that one, too. Okay, R programming 101. — Yeah. Cheers to that one. — Okay. Andrew, will you go first and teach me your stuff? Hit me with the lesson. — Okay, so I'm going to go ahead and share my screen, go over to good old R Studio. The filter out function that I want to show you today is new as of dplyr 1. 2. 0. So, if you haven't updated your R packages in a while, I recommend doing that just on general principles, but in particular, you need to do it in order to have access to the function I'm going to show today. And you know Andrew, I didn't even know about the update packages function. Like I'm seeing that on the screen thinking, "Oh my goodness, how did I not know that? " Honestly, honestly, I hate to say it, but I usually just update my packages when I'm prompted to or when I update R occasionally and then I have to reinstall packages. I mean, this is a lovely example of how I'm always learning from you, Andrew. You know, I it's a function that just doesn't come up that often. So, it's one to run every now and again, I guess. Yeah. Yeah. Okay, filter out. Teach me. I want to look at the msleep data set and in particular, I'm going to be interested in the conservation variable. And I would ultimately like to take out the domesticated mammals from this set. Now, the way that we've been doing that in the past was with the filter command and the filter command is built to keep rows. So, filter conservation equals quote domesticated is going to keep all the observations where the domesticated stat where the conservation status is domesticated. And in this case, we have 10 of those. So, if we're trying to take those out, we should get 73. Right? Now, we can the old way to try and do this was with a negation. We would filter to only keep observation where the conservation status is not domesticated. But, that doesn't quite work. If we run this, we don't get 73 observations. We only get 44. And the reason for that is that the missing values for the conservation variable are being removed. Is conservation not equal to domesticated? Well, if conservation is NA, it's not going to survive that filter. So, this is where the new filter out function comes in and it does sort of what it says on the tin. It takes out observations in the same way that filter keeps observations. So, this next command, filter out conservation equals domesticated, is going to keep all observations that don't satisfy this criteria and it's going to behave the way we want with respect to NAs. So, there you go. You can see we got the 73 observations that we were hoping for. Excellent. I love it. Now, I had not seen that function before now. So, as always, Andrew, thank you for that. I've definitely learned something from you today. And I hope people watching this enjoyed that because that's I think I'm going to use that all the time as my go-to place for, you know, instead of the exclamation mark equals ba ba ba, do that. Always considering what it is that you want to do with the missing values, though. I'm generally a slow adopter of new functions and new technology, even when they come from Posit or sometimes even especially. It can feel like gilding the lily sometimes. This is one that really feels like useful and fulfills a need that I that I have. Yeah. No. What do you got for us today, Greg? I know you said uncount. Let's see it. We're going to Yeah, let's talk about uncount. I'm going to share my screen. Okay, can you see that? There we go. Gotcha. Okay, and we're going to look at the Titanic data set. So, library tidyverse, of course. And when you load the tidyverse data set, you automatically get the Titanic data set. And I've just said class of Titanic to show you that this is a table. Yeah. And that's important for the purposes of this lesson. If I view Titanic, Yeah. here we've got a I know it looks like it's just a regular data set, but keep in mind that it's a table and look at how it's structured. These are the people that survived or
Segment 2 (05:00 - 10:00)
didn't survive on the Titanic when it sank. And it's divided by the class of person, first class, second class, third class, the kind of ticket that you had or crew, male or female, age and whether they survived or not. And then it's got the frequency. So, here we've got third class passengers that were male, that were children, that didn't survive. There were 35 of those. Now, how this table differs from a regular neat and tidy data set is that usually when we work with data, we want there to be a separate row for every observation. So, what we would want and and it applies to many of the different functions that you use in R, you know, when you if you're doing a chi squared test and you want to kind of create a specific table or you're doing data viz, it's often useful to have a separate row for every observation. In other words, for this number three, third class passengers that were male, that were children, that didn't survive, we would really want 35 separate observations of that. We would want 118 observations of first class males that were adult, that didn't survive, etc. So, how do we turn this table into a data frame that we used to that has the observations as I've just described? Easy peasy lemon squeezy. Let's have a look. So, we start with Titanic. Now, the first thing you you you of course aware that we can say as_tibble. The as is often a prefix to changing the nature of some kind of data object and a tibble is kind of like the sort of tidyverse version of a table. I'm not going to get into the nuts and bolts of that. Not a table, but a data set. A data frame. And once we've done that, we simply say uncount n and then I'm just piping that into view so that you can see what it looks like. And there you go. Boom shakalaka. It's done exactly what we expected. Third class male passengers that were children, no, and it's done that. If we go down, I'll bet you it changes at 35. It does. It's 35 observations of those. Etc. etc. So, this is now a big long data set with an observation a row for every single observation. Wow, that's really lovely. I think it it's another one that really fulfills a need because the sort of built-in or tidyverse Titanic data set is not really in a format that's amenable to the kinds of, you know, machine learning applications. It's you know, the Titanic data is often used as an example as you're starting sort of learning, you know, can we predict survival? Yeah. And the version that's in R is just not the one you'd want to use for that. The one you'd use for that is the — summarized. They've already summarized. They've packed the data up and something is lost when you do that. And this just helps you unpack it. And there's quite a few of those built-in data sets that actually are, you know, already summarized tables that you sort of if you really want to do anything with them in terms of practicing your coding, you've got to do this sort of thing. You've got to kind of like turn it from a table into a tibble and then do an uncount, get it into a neat and tidy data set and then do your magic. So, I like I Yeah, I bumped into the uncount the other day. I hadn't used it before. I really liked it. Simple, easy to use. As is the case with your lesson, it does what it says on the tin. You know what I mean? Uncount uncounts. It's the opposite of the count function. That's fabulous. Thanks for showing that, Greg. You know, the those built-in data sets, it's really interesting just kind of like going through the list. It's almost like a walk through history in the sense that you see all these formats and conventions that just aren't so much used anymore. And it's I think a fascinating sort of uh There's certainly some stuff in there that's a little bit politically incorrect. Like in this day and age, it would, you know, it would raise eyebrows. But, yeah, it is what it is. Like I think the fact that there are those built-in data sets is great for learning. You know, you don't have to go and fetch data and store it somewhere and then go and load it. You can just use the data, practice your coding and then, you know, and then apply that in the real world down the line. Yeah, and when I find packages that have good newer data sets or more conventional data sets, tidier, then it's always something that makes me happy. And a lot of times there are accompanying really good books, you know, I talk about ISLR a lot, Introduction to Statistical Learning with Applications in R. There's others. Well, talking about books, Andrew, now that you bring it up. And we hadn't planned this, by the way. But, should we just talk a little bit about the book that we're writing together? Yeah. What's the big picture on this one, Greg? So, what I think's going to be really nice about the book when we get it out there and hopefully that's going to be this year sometime is it really is an easy to follow walk-through from the very beginning all the way to complicated stuff
Segment 3 (10:00 - 13:00)
and everything in between. Now that we're in the stage of kind of like editing and re-reading little bits and pieces, you can sort of see how it's come together in a really, really beautiful way and I'm super excited about getting it out there. Me, too. It's funny you would mention about the sort of the journey that the book takes you on. I'm in the process of doing two parts separately and you know, because I'm in the crunch of my academic semester, it's all moving a little slower than it will next month, but I am doing a little bit of work on sort of very first chapter where we're just like, "Here's filter and mutate. " But then like the more heavy editing I'm actually doing is on a later chapter that's about coding with AI and actually using some of the tools that that um the Elmer package provides so that you can actually write functions that make AI that make calls to a cloud or whatever more than just you know, sort of Google you know, asking AI to write the code for you, but actually programming with it and that I think is a some content you just don't get elsewhere. It's really Exactly. It's a whole new world. Yeah, I know it's fun. The other thing I like about the way the book's landed up being structured is you can use it either as a book that you you're learning a lot from scratch and you start on page one and work through it with examples or you can dip in and out and use it as a reference. So, you can like say, "Look, I'm wanting to understand how to use that particular package. " Boom, so like a you know, there's a chapter that you can go straight to and jump right in. So, yeah, it's looking good. Realistically, when I'm reading a book about coding or math for that matter in my case, I'm often not reading it like a novel. Chapter one, chapter two, chapter three. I'm pulling it off the shelf when I have a need and an interest and so one of the things that you suggested right from the start that I think is a really good perspective here is that we should write the book with that in mind. And so while you can read it start to finish and it does take you on that journey, you can also just be like, "Hey, I need to learn a little bit more about working with tables in Quarto. " Yeah. Yeah, yeah. I mean, that's a good example of the sort of thing that a person that's reasonably comfortable with R might be like, "Hey, I want to learn how to use Quarto and create a dashboard. " Or I need to understand how to use GitHub cuz that's the next stage in my learning develop. It's all in there. And as you've said, the AI stuff's in there, the programming, it's everything from beginning to end all in one very easy to read book. So, happy days. I'm very excited about it. — Me, too. And as soon as that's out, we'll make sure that a link goes down below so that people that are finding this video later can get right to it. But if you're watching this sort of in real time, you know, stay tuned and make sure you're subscribed up to everything to our Programming 101 and Equitable Equations to get all the updates. Anyway, Andrew, always good to see you. Thanks a lot. Everybody else, put comments. Where are you watching it? Comment. We love the comments. I mean, I don't always get to reply to them all, but the comments that I do and the ones that I read are great. So, do comment, send your thoughts, questions, pontifications, criticisms, anything you want. All right, Greg. Well, it's a joy seeing you as always. Thanks for teaching about uncount. I really feel like I learned something new today that I'll be able to use. And I'll see you next time. Take care. Love it. Bye. Bye-bye.