# Python Pandas Tutorial (Part 2): DataFrame and Series Basics - Selecting Rows and Columns

## Метаданные

- **Канал:** Corey Schafer
- **YouTube:** https://www.youtube.com/watch?v=zmdjNSmRXF4
- **Дата:** 10.01.2020
- **Длительность:** 33:35
- **Просмотры:** 752,126
- **Источник:** https://ekstraktznaniy.ru/video/11781

## Описание

In this video, we will be learning about the Pandas DataFrame and Series objects.

This video is sponsored by Brilliant. Go to https://brilliant.org/cms to sign up for free. Be one of the first 200 people to sign up with this link and get 20% off your premium subscription.

In this Python Programming video, we will be learning about the DataFrame and Series objects. These are the backbone of Pandas and are fundamental to the library. DataFrames can be thought of as rows and columns, while a Series can be thought of as just a single column of rows. We'll also learn the basic navigation of these datatypes by learning how to select specific rows and columns. Let's get started...

The code for this video can be found at:
http://bit.ly/Pandas-02

StackOverflow Survey Download Page - http://bit.ly/SO-Survey-Download

✅ Support My Channel Through Patreon:
https://www.patreon.com/coreyms

✅ Become a Channel Member:
https://www.youtube.com/channel/UCCezIgC97PvUuR4_gbFUs5g/join

✅ One-Time Contr

## Транскрипт

### Introduction []

hey there how's it going everybody in this video we're gonna continue learning more about pandas and specifically we're going to be learning about the data frame and series data types so like I said in the last video these are basically the backbone of pandas and are the two primary data types that you'll likely be using the most so in this video we're gonna go over how we can think of data frames and series data types in a different way and then we'll look at the basics of getting information from these data types now I would like to mention that we do have a sponsor for this series of videos and that is brilliant work so I really want to thank brilliant for sponsoring the series and it would be great if you all can check them out using the link in the description section below and support the sponsors and I'll talk more about their services in just a bit so with that said let's go ahead and get started okay so first let's look at what a data

### DataFrame Basics [0:43]

frame is and then we'll learn more about how we can think about this in terms of a Python object so we saw data frames briefly in our last video when we check to make sure that our data was loaded in correctly so these were the objects that were displayed in Jupiter as rows and columns basically a table so let's take a look at what this looks like so if you were following along with the last video this is basically the same jupiter notebook that i had before except this has just cleaned up a bit so we're importing pandas here we are reading in our csv files so one is just our main data frame for our survey results one is our schema data frame for the schema results and then we are setting some options here where we have the max columns set to 85 so we can see all the columns and the max row set the 85 so that we can see all of the schema now if you haven't been following along with the video so far then I do have a link in the description section below that links to where you can download this Dayla data and follow along with this okay so this is a data frame here so where we are printing out D F dot head this is what this returns so this here is the first five rows of our data frame so you can see that a data frame is made up of multiple rows here and we also have multiple columns so in the case of this data these are survey results but your data can be you know whatever your data is but it's most likely going to be in rows and columns kind of like a table so for this data with these being survey results each row is a survey as one person who answered the survey and each question was their answer for that question on the survey so for example this respondent number one here they answered that yes they were a hobbyist and if you want to know what hobbyist means then we just like we saw in the last video we can look at our schema data frame so let me go ahead and print this out here and let's look at this so if I look at what a hobbyist is then we can see that question was do you code as a hobby so that's what this data is and that kind of gives us an idea of what a data frame is basically a data frame is just rows and columns but now let me explain how I like to think of data frames using native Python so if we were only using Python and not using pandas to store information in rows and columns then how would we do this well for those of you familiar with dictionaries you might think that it's a good idea to store information that way so let me pull up a new notebook here that I have open here with some snippets and let's

### Dictionary Basics [3:32]

take a look at this okay so let's look at this first cell here so a lot of us are probably familiar with Python dictionaries where we have keys and values so if I'm representing some data in this example it's a person then we can use a dictionary so first off I have a key of first which is going to be the first name and then that has a value of kori and then we also have keys and values for the last name and the email as well okay so this dictionary here represents data for a single person but how would we represent data for multiple people well there are probably a couple of different ways that we can do this but the way that I like to think of this in terms of learning pandas is to make all of our values and our dictionaries a list so let's take a look in the second cell here to see what this would look like so here in the second cell now we can see that we have a pretty similar diction to what we had above but now instead of just a single string here for the values I instead have a list and our list currently just has one person but now since this is a list we can add more first names and information in here so the first value of our list is going to be our first person so if I go to the third cell down here at the bottom then now we can use this as an example to see what this would look like with multiple people so the second value in our list will be our second person and the third value in the list will be our third person so if we look here we have people we have a key of first so if we want the second person here we go to the second value that's Jane the last name is Doe and the email go to the second value here is Jane Doe at email com if you want the third person that would be John and then third value in last would be Doe then third value and email is John Doe at email com so we can kind of think of this like rows and columns the keys are the columns and the values are the rows now if you look up the definition of a panda's data frame online then you'll see a lot of definitions that just say something like it's a two dimensional data structure now that might sound a little confusing but in layman's terms that basically just means rows and columns okay so like I said here the key for email here would be our email column and contain all of the email values and if we wanted to see the email column then we can just access that key so if I come down here into actually let me run all of these really quick here I think I open this up without running these so I want to make sure that we have this registered okay so if I wanted to see that email column then I can simply say people and then access that email key if

### Creating a DataFrame [6:25]

I run that then we can see that we got all of the emails now the reason that I wanted to show you this is because I feel like this really helped me in terms of how I think about data frames so data frames are very similar to this but with more functionality than what we have here in stand Python now we can actually create a data frame from this dictionary and see what this looks like so let's do that and look at some basic data frame functionality and then we'll look at this more using the stack overflow data from the last video so here in this bottom cell in order to create a data frame from the information that we have here I'm going to go ahead and import pandas so I'm going to say import pandas as PD and now we can create a data frame actually using this dictionary that we have up here so to do that I can just say DF is equal to PD dot data frame and check the casing there that's a capital D and a capital F and then we'll just pass in that dictionary that has values as lists so if I run this and that seemed to run okay without any errors and now let me just print out DF here and if I print that out then we can see that now our data frame is representing this in a way to where we do have rows and columns that we can visualize so we get these people printed out in a nice table of rows and columns now we also have these over here to the far left that don't have column names this 0 1 & 2 now this is an index now I'm not going to go too much into indexes right now because that's what the next video is going to cover but basically it's a unique value for our rows now it doesn't need to be unique but again we'll talk more about that in the video specifically on indexes so now that we have a bit of an idea of how to think about data frames now let's take a look at how to access information here within the data frame so first let's just access the values of a single column so just like we did with the dictionary we can access a single column just like we were accessing the key of a dictionary so just like I did people and email up here I can do very similar down here and just say that I want that email column of my data frame now that's not actually a key that is going to access the column of a data frame but we can see here that we get all of the emails back from that data so again I do want to emphasize that I only use the pure Python example so that we could get an idea of how to think about a data frame but like I said a data frame is much more than just a dictionary of Lists so for example we can see that when we displayed the email column here it doesn't look the same as when we displayed the list of values from that dictionary and that's because this is actually returning a series and we can see this if we check the type so if I check the type of this email column here so let me run that we can see that

### What is a Series [9:34]

this is Panda score series so this is a series object so what is a series so a series is still basically a list of data but just like with a data frame it has a lot more functionality than just that now if you look up the definition of a series online then you'll see a lot of definitions that just say it's a one-dimensional array and that might sound a little confusing but in layman's terms that basically just means that it's rows of data so again you can think of a data frame as being rows and columns and a series as being rows of a single column so a data frame is basically a container for multiple of these series objects so again that's important so let me go over that one more time so we can see that a data frame here is two-dimensional because it has rows and columns so we can see here that it has you know first name last name email now whenever we access just the email then we can see that we get all these emails here now this is a series and I said that a data frame basically contains is a container for multiple series objects so we can think of this email column here as a series this last column here is a series and this first column as a series and also we can see where we printed out this series here for the emails we can see that this series also has an index as well just like our data frame did so this index is over here on the left the 0 1 - okay so we can access a single column

### Accessing a Single Column [11:08]

of a data frame like we're accessing a key just like we did here in this cell but you might also see some people use dot notation to do the same thing so you might see some people do it like this so they might do D F dot email and if I run this cell then we can see that let me get rid of this cell here and just so we can compare these two we can see that this gives us the same thing whether we access this like a key or whether we use dot notation this returns the same series object of the email values now whichever way that you want to do this is really just a personal preference I actually prefer the first way of using the brackets and there are a couple of reasons that I prefer to use that over dot notation first is that I like using the brackets because there's a chance that one of your columns is named the same thing as one of the attributes or methods of a data frame and if that's the case then using the dot notation might give you some errors so for example if a data frame a dataframe has a method called count so if you had a column named count and you did and you were trying to access that count column using dot notation then that's actually going to access the count method from data frame instead of that count column so that actually wouldn't work how we did it here if you wanted to access the actual column called count which we don't have one in this specific data frame but if we did then we would have to access it like this so that's kind of why I prefer brackets so I'm going to be using brackets throughout this series but I wanted you to know about dot notation because if you're working with other people using pandas then you might see them access columns in using dot notation so you need to know that it's

### Accessing Multiple Columns [13:09]

at least a possibility and again that doesn't mean that they're doing it wrong it's just a personal preference I just prefer using the brackets okay so I said that data frames have a lot functionality than what we saw using you know standard Python so let's look at some other stuff that we can do here so let's say that we wanted to access multiple columns now in order we can use the bracket notation and pass in a list of the columns that we want so if I wanted both the last name and email columns then we could say DF and use our brackets just like we saw before but now I'm going to put in a set of inner brackets here as a list of columns that I want to access so for the first value I'll put last for the last name and for the second value I'll put email for the email so if I run this then we can see

### Multiple Columns and Series [14:02]

that now we have a data frame returned here of the last column and the email column now I want to emphasize again here that I passed a list inside of these brackets here so there are two pairs of brackets you can't leave off the inner brackets because you'll likely get a key error because pandas will think that you're passing in both of those strings as a single column name and another thing that I want to point out here is that now that we're getting multiple columns this can no longer be a series because remember a series is basically a single column of rows so when we get multiple columns like this it's just returning another data frame and in this case it's a filtered down data frame with just these specific columns so we filtered out the first name column here and we just have the last and the email okay so that's how we get a specific column or multiple columns and we can slice these as well similar to how we slice a list but I'll show that on our larger stack overflow data set here in a second now if you have a lot of columns and want to see all of them easily then we can just grab the columns specifically by saying D F dot columns and we can run this and we can see here that this gives us all of our columns here so our columns are an index of first last and email okay so now we've seen to get a column but how would we get a row so in order to get rose we can use the Lok and I Lok indexers so that is Lok and I look so let's take a look at these so first I look so I local iäôs us to access rows by integer location hence the name I Lok is integer location so if I wanted to get the first row then we can just say DF dot i lok and then use brackets here too since this is an indexer use brackets and pass in a 0 and that will give us the first row so if I run this then we can see that the first row has a first name of Cori last name of Schaefer and email of corium Schaefer at gmail. com so what that did is it returns a series that contains the values of that first row of data which like I said is the first name last name and email of the first person in this example and again we haven't discussed indexes yet that will be in the next video but the index here is the column names so that we know what those values are so up here our index was 0 1 & 2 but whenever we're actually accessing a row it's going to set that index to the column name so that we know what those values are because if this just said 0 1 & 2 then we might not know what these are and just like when we selected multiple columns we can select multiple rows as well by passing in a list of integers so if I want the 1st and 2nd row then we can just say and again this is going to be a pair of brackets within these brackets because we're passing in a list to our index here and I'm just going to pass in a list of 0 & 1 so if I run this

### Selecting Columns [17:30]

then we can see that now we get the first two rows of data and again be sure to pass in an inner list inside those brackets so that it does what you expect it to do and also we can see that now we're getting a data frame with these multiple rows now with these I'll oak and Lok indexers we can also select columns as well and that is going to be the second value that we pass into these outer brackets so if we thought of I'll oak and Lok as functions then we can think of the rows that we want as the first argument and the columns as the second argument so let me show you what this

### Selecting Columns Example [18:08]

looks like so here we have our inner bracket those are the rows that we want but now after that list we can put a comma and now we can specify the column that we want now with I Lok we can't specify an actual column name because these use integers integer locations so these are for integers only so remember our first name is the first column the last name is the second column and the email is the third column so if we wanted to grab the email address of the first two rows then we can grab the column at index 2 which will be the third column since all of these start at 0 so if I was to pass in a 2 here and run that then we can see that now we get the email addresses of these first two rows okay so that's I'll okay so now let's look at Lok so with I Lok we were searching by integer location with Lok we're going to be searching by label and when we're talking about labels for rows these will be the indexes and again we don't have custom indexes right now so this index is just a default range of integers so at the moment this will somewhat be similar with I Lok the I Lok indexer but we'll look at uses or use cases with Lok with actual labels in the next video when we cover indexes so real quick let's look at our entire data frame again so I'm just going to print that out down here so like I said over

### Index Labels [19:38]

here on the far left these are our indexes so these are the labels for that row so if I want the first row then by default this just has a label of 0 so I can say DF Lok and pass in a 0 there and if I run that then we can see that we get that row with that label of 0 and again I know that looks similar to look at the moment but we'll see how to use indexes with labels in the next video and just like with I Lok we can also pass in a list to specify multiple rows so if I wanted the first and second row then just like with I Lok I can pass in an inner list here so let's say that I want the first row and the second row so I'll run that we can see that now we

### Selecting Specific Columns [20:25]

get the first and the second row and again now we can see that we are getting a data frame back with now that we have multiple rows and just like with I Lok we can also pass in a second value into our indexer to select specific columns for these rows now with I'll oprah used integers to select the columns but now that we're using lok we can use labels so if we want the email column of these first two rows then now we can just pass in a value of email so if I run that

### Selecting Specific Columns List [20:58]

then we can see that now we get the email value of these first two rows now I didn't show this with I Lok but we can also pass in a list for the columns as well so if I want the last name and the email for these rows then instead of just passing in a string as this second value here then we can pass in a list of strings of the columns that we want so I'm gonna wrap this in brackets here I know that this can get a little confusing with all these inner brackets but let's say that we want email and we want last name so if I run this then now we can see that we got these specific columns here email and last name for these specific rows the row with label 0 and the row with the label of 1 and also notice that the columns display and the order that we used in our list up here within loke which is a different order from our original data frame so up here its first last email but we asked for email and last and it gave us back in that order of email and last okay so now that we've seen the basics of grabbing certain rows and columns from a small data set now let's go back to our data set from the last video and see how we grab some rows and columns from the Stack Overflow data set so I'm

### DataFrame Overview [22:20]

gonna go over here to back to our pandas demo here and again just a quick overview of the data that we have here we're importing pandas we have DF as our main survey results here our schema DF as our schema results we are setting some options here this is what our main data frame head looks like which is the first five rows and then this is what our schema looks like so I'm going to go down below our schema here and now let's mess around with this a little bit so let's go over a bit of what we learned and pluck out certain rows and columns but first let's see how many rows and columns that we have in this data frame now we saw a couple of different ways to do this in the last video but the easiest way to do this is to use the shape attribute so if I say DF dot shape

### DataFrame Shape [23:07]

and run this then we can see that we have 88,000 rows and 85 columns so let's grab all of the responses for the hobbiest column so again what I'm trying to do here is if we look at our main data frame I want to grab all of the responses for this column right here hobbiest okay so how would we do that now if you remember if you want to see what columns are available then you could just say DF doc columns to see all

### DataFrame Columns [23:38]

of these we can see that these are kind of long we have 85 here but here we have hobbiest which is the one that we want and that is the question where people answered if they code as a hobby or not and in the next video we're going to cover indexes I'll show how we can you know search a schema data frame to find exact questions so that we can see what questions are what specific columns and the data frame but right now let's just grab those hobbyist responses so if you remember from that small data set that we just saw in order to grab that hobbyist column we can just access that like a key so if I say DF and then pass in hobbyists there then we get a series of all of those responses and luckily that doesn't display the entire 89 thousand rows and our browser here but we do get the head and the tail of that data to get an idea of what those responses look like now real quick let me show you something that will cover more of further into the series but I want to give you an idea of how powerful something like pandas is so let's say that we wanted to know how many of these responses were answered yes and how many were answered no now if we were using regular Python then we might import the counter class or write a quick function or a loop to do this but pandas has so much of this stuff already built in so to get the count of unique values in this column I can just use this value counts method to calculate this so right up here I can just tack on a method of value underscore counts now again this

### DataFrame Value Counts [25:18]

is going to be for a future video but I just want to give you an idea of what pandas can do so whenever I add this value counts method we can see that out of this series that we returned here for all of our answers for this hobbyist question the value counts are seventy one thousand people said yes they do code as a hobby and about eighteen thousand said no they don't code as a hobby and again we'll cover more of this and future videos when we learn more about analyzing data in depth but I wanted to give you a quick taste as to why it's beneficial to even learn pandas like we're doing here it makes this type of stuff really easy and we could go further and plot that out and everything okay but with that quick sidetrack out of the way let's keep going and go over the other things that we learned earlier so we got a column here so let me get

### Selecting Rows [26:14]

rid of that value counts so we have our column here so now let's grab a specific row and a specific column so let's grab the first row and we'll also grab that same hobbyist column for that row so how do we grab rows so remember if we want to grab rows that we use the loke or I'll oak met or indexers so I'm going to go ahead and use lok because remember that that's the one that allows me to use labels and i'm going to use a label instead an integer for the hobbyist column name now again since we're just using a default index and we can see the indexes here 0 1 2 3 4 since we're just using a default index instead of a custom one our current labels for our indexes are just a range of values from 0 to 88,000 something so in order to get the first row I can say D F dot Lok and pass in that label of that first index which in this case is just a 0 and these are all of the responses from the first respondent so this is one person's entire survey results here now if we wanted to see their results for just that hobbyist question then remember within the brackets here I can pass in a second value for the columns that I would like so if I pass in hobbiest then we can see that their answer to that whether they code as a hobby is yes and also like we saw earlier I can also pass in a list of multiple rows or multiple columns to get the exact rows and columns that we want to see so to get the first three responses for the hobbiest column then instead of just passing in a single value here then I can put in some inner brackets here and pass in a list of multiple rows so if I pass in a list of three rows here and run this then these are the first three

### Slicing [28:14]

results for that hobbiest column now one thing that we haven't seen yet is that we can also use slicing to grab multiple rows and columns as well now if you're familiar with list slicing then this is pretty much the same thing the only difference is that our last value is going to be inclusive at least with loke so if we wanted the first three rows then we could say that we want from 0 and then slice to the index of 2 and if I run this oops and I

### Slicing with brackets [28:48]

accidentally made a mistake here actually whenever we're using slicing we do not wrap these in brackets so I'm gonna take that out so for our first value we're just saying we're no longer passing in a list of values we're just passing in this slice of zero and then colon 2 so if I run that then we

### Slicing with columns [29:08]

can see that now we get the same result that we got before and we can do this with the columns as well so right now we're only getting two hobbiest column but let's go back and look at our columns and see what columns come after the hobbiest column so up here these are all of our columns here where we printed them out so let's look at a few columns after hobbiest here so we have open source or open source employment so let's say that we wanted to get all of the columns from hobbiest all the way up to this employee employment column so to do that I'm just gonna copy that we can

### Why slicing is inclusive [29:43]

come down here and we can just pass in a colon and then employment and that'll do a slice from hobbyists to employment now I also want to point out that this is the reason that slicing is inclusive for these values because imagine how much of a pain it would be if we wanted all of the columns from hobbyist to employment but the last value here wasn't inclusive and we had to come up here and say well if I want from hobbyists to employment then I really need to pass in you know hobbyist to country and country's not inclusive that would just be way too confusing so it's so much easier for this to be inclusive here so if you are wondering why they did that then that's why they do it so if I run this then we

### Summary [30:31]

can see that now for we get these first three rows here and for the first three rows we get all of those responses for the columns of hobbyist open source er all the way up to employment so now we've seen an overview of everything that we've learned about exploring our data frames and series objects so far and how we can pluck some you know basic information out of these now there's still tons to learn about data frames and series objects and we'll continue learning more about these throughout the pandas series since these two data types are the main data types that we'll be using and pandas so we'll be learning more about advanced filtering queries how to see which data type each column of our data contains and a lot more now before we end here I do want to mention that way you have a sponsor for this video and that is brilliant org brilliant is a problem-solving

### Brilliant [31:24]

website that helps you understand underlying concepts by actively working through guided lessons and brilliant would be an excellent way to supplement what you learn here with their hands-on courses they have some excellent courses and lessons on data science that do a deep dive on how to think about and analyze data correctly so if you're watching my panda series because you're getting into the data science field then I would highly recommend also checking out brilliant and seeing what other data science skills you can learn they even use Python in their statistics course and will quiz you on how to correctly analyze the data within the language they're guided lessons will challenge you but you'll also have the ability to get hints or even solutions if you need them it's really tailored towards understanding the material so to support my channel and learn more about brilliant you can go to brilliant org forge slash CMS to sign up for free and also the first 200 people to go to that link will get 20% off the annual premium subscription and you can find that link in the description section below again that's brilliant dot org forge slash C m/s okay so I think that's gonna

### Conclusion [32:26]

do it for this pandas video I hope you feel like you got a good introduction to the data frame and series objects and how to navigate through some of your data now like I said there's a lot more to learn about these data types and some advanced filtering that we'll learn and future videos so be sure to stick around for that now in the next video we're going to be learning more about indexes so we saw basic default indexes in this video but we'll learn how to set the index to specific columns and the benefits of doing that in the next video but if anyone has any questions about what we covered here then feel free to ask in the comment section below and I'll do my best to answer those and if you enjoyed these tutorials and would like to support them then there are several ways you can do that the easiest ways to simply LIKE the video and give it a thumbs up and also it's a huge help to share these videos with anyone who you think would find them useful and if you have the means you can contribute through patreon and there's a link to that page in the description section below be sure to subscribe for future videos and thank you all for watching
