# Python Pandas Tutorial (Part 7): Sorting Data

## Метаданные

- **Канал:** Corey Schafer
- **YouTube:** https://www.youtube.com/watch?v=T11QYVfZoD0
- **Дата:** 06.02.2020
- **Длительность:** 15:39
- **Просмотры:** 213,575
- **Источник:** https://ekstraktznaniy.ru/video/11767

## Описание

In this video, we will be learning how to sort DataFrames in Pandas.

This video is sponsored by Brilliant. Go to https://brilliant.org/cms to sign up for free. Be one of the first 200 people to sign up with this link and get 20% off your premium subscription.

In this Python Programming video, we will be learning how to sort our data. We will learn how to sort single columns, sort multiple columns, and view the largest and smallest values in a DataFrame. Let's get started...

The code for this video can be found at:
http://bit.ly/Pandas-07

StackOverflow Survey Download Page - http://bit.ly/SO-Survey-Download

✅ Support My Channel Through Patreon:
https://www.patreon.com/coreyms

✅ Become a Channel Member:
https://www.youtube.com/channel/UCCezIgC97PvUuR4_gbFUs5g/join

✅ One-Time Contribution Through PayPal:
https://goo.gl/649HFY

✅ Cryptocurrency Donations:
Bitcoin Wallet - 3MPH8oY2EAgbLVy7RBMinwcBntggi7qeG3
Ethereum Wallet - 0x151649418616068fB46C3598083817101d3bCD33
Litecoin Wallet 

## Транскрипт

### <Untitled Chapter 1> []

hey there how's it going everybody in this video we're going to be learning about how to sort our data in pandas so we'll look at ways to sort our columns how to sort multiple columns and grabbing the largest and smallest values from different rows now as usual we'll look at how to do this on a small data set first and then we'll see how this applies to a larger data set like our stack overflow survey data that we've been using throughout the series now I've been reading your all's comments and suggestions for the series and I'm trying to take your suggestions to heart a lot of people have said that they'd like shorter videos so I'm going to do my best to have shorter videos that don't sacrifice on any of the details that I think are important now I'd also like to mention that we do have a sponsor for the series of videos and that is brilliant so I really want to thank brilliant for sponsoring the series and it would be great if you all can check them out using the link in the description section below and support the sponsors and I'll talk more about their services in just a bit so with that said let's go ahead and get started okay so I have my snippets notebook open that we've seen throughout the series now again if anyone would like to follow along then I do have links to this code and the data in the description section below so let's say that we want to sort this small data frame that we have here so first let's decide how we want to sort this now one way that might make sense is to sort this by last name so in order to do this we can use these sort values method so in order to sort this by last name I can simply say DF dot sort underscore values and then we'll say that we want to sort that by and set the by argument equal to the column name in this case it's the last so if I run this then we can see that we get a data frame return where those last names are sorted alphabetically and if though these were numbers then those would be sorted as small as the highest and we'll see that when we look at our stack overflow data now if we wanted to sort

### sort these in descending order [1:47]

these in descending order then we can pass in an argument and just say that we want the ascending order to be equal to false so if I run this now we can see that those are sorted in descending order now sometimes your sorts can get a little complicated maybe you want to

### sort on multiple columns [2:08]

sort on multiple columns now you do this when the first value that you soared on has identical values and then you want to sort on a second value so in order to do this we can just

### pass in a list for these columns [2:19]

pass in a list for these columns that we want to sort on so let's say that we want the first column that we want to sort on is the last names in ascending order and then if there are duplicate last names then we want to sort the first name after that so in order to do this we can just pass in a list of

### pass in a list of columns [2:37]

columns to sort by so I'm actually going to go to a new line here and now instead of just sorting by last I can pass in a list here so let's say that we want to sort on the last name first but then first name so if I run this then we can see that this is sorted in descending order on the last name just like we saw up here before but it's also sorting in the first name in descending order as well if there are identical last names so we can see that Jane was first up here and now it's John because this is in descending order and sometimes you might run into a situation where you want to sort on multiple columns but you might want to have one in descending order and another in ascending order so let me add one more name to our data frame here so that it's more clear whenever I do this so I'm gonna add one more name here at the top to our dictionary so I'm just going to call this atom and we'll keep a do plan on a dot email com so now let me go ahead and rerun all of these cells so I'll just say cell run all and now down here at the bottom we can see that we have our last names and then all of these in descending order but let's say that we want to sort this data frame by last name in descending order like we have here then we want the first names to be in ascending order so to do this we can simply pass in a list of boolean values to the ascending argument so let me copy what we have here and show the differences of what this looks like so instead of just saying ascending is equal to false let's instead pass in a list and these will correspond to our columns so if I say that I want the last name in descending order then we can keep that as false but then if I want the first name to be in ascending order then I can just pass in a true for that second value so now if I run this then we can see that our last names are still in descending order but our first names here are now in ascending order with Adam coming before these two here and as we've seen several times throughout the series if we want to save this data frame and make this sort permanent then we can set the in place value equal to true so if I do this I'm just going to add an in place equal to true here at the end if I run this and now I look at our data frame that now our data frame has been modified in an order where we can see that these values are sorted how we specified now if you want to set this back to how it was before and have those indexes be sorted then we can do that by

### sorting the index [5:21]

sorting the index so to do this instead of using the sort values method we'll use the sort index method so I can just say DF dot sort underscore index and run that and we can see that now our index is sorted over here on the far left in the order that we added those now I also want to point out that if you're simply wanting to sort a single column then we can easily just sort that single series as well series objects also have that sort value method so if we want to simply see the sorted last names and not the entire data frame then I can simply access that column by passing in the last name column there in those brackets and then I can just say sort underscore values I'm just going to leave all of the arguments as defaults here and run this and now we can see that these are sorted in ascending order okay so that's a brief overview so now let's go over to our survey data and see what this looks like on a larger data set and we'll also see some simple ways to view the largest and smallest results from our data okay so I have our stack overflow developer survey open here that we've seen throughout the series and again if you'd like to download this data to follow along then I have links in the description section below ok so one way that we might to sort these survey results is by country name we can see over here that we have a country column where the respondent answered which country they were from so maybe you're doing some analysis on information from different countries and it's just easier seeing them all sorted so for example let's say that we want to look at countries and salary so to sort these survey results by country then we can simply come down here to the bottom and say DF dot sort underscore values and I want to sort that by and set that equal to country and now I'm also going to make this in place so that it changes our data frame so now let's take a look at the country column for the first 50 results or so from this data frame so I'm going to access that country column of the first and I'll just get the first 50 of those by using the head method so if I run this then we can see that these are in alphabetical order so we have results from developers from Afghanistan and at the bottom here the results go into Albania now we only have the top 50 results here but if we were to look through all of these then we'd see that all of these countries are listed in alphabetical order okay so now let's look at let's take a look at the salaries reported from these countries so let me add that to our output here and I'll just do that by accessing the converted comp column so that is converted comp now remember here this is a mistake that some people make whenever we're accessing multiple columns here we have to put this inside of a list within our brackets so we're going to have two sets of brackets here okay so now let's run this and we're going to have these salaries here for each respondent listed on the right we can see that we have a lot of not a number objects here which just means that they skipped over the question but if we wanted a general idea of the higher salaries then we can sort these in descending order so like we saw in our earlier example if we want to sort the countries and ascend order and the salaries in descending order then we can do that just by going up here and passing in multiple columns that we want to sort on so country and we also want to sort on this converted comp column here and also I'm going to pass in an ascending argument here and I'm going to set this equal to a list and this will correspond to our column names whether we want these in ascending order so for the country I will pass in true that we do want those in ascending order but I want to see the highest salaries first which means that we want this to be false so that the salaries are in descending order so if I run this sort and then we take a look at this head again then we can see that now we have Afghanistan here and all the highest salaries are listed at the top for what people said they made and if we scroll down here we can see that we get to 0 and then not a number as the lower salaries and then once we get down to Albania then it restarts with those high salaries from that country now we can see that there are some big outliers here this is a lot larger salary than what other people said they made now there are some techniques that we can use in order to account for outlier data and we'll focus on that in the next video where we cover aggregating data and grouping data now before we end I'd also like to take a look at some other useful methods for seeing the largest and smallest values so maybe you're sorting results just so you can grab the largest or smallest values from a specific data frame well if you're doing that then there's actually a much simpler way to do this so maybe we want to see the 10 highest salaries from our survey so in order to do this we could simply use the n largest method so first I'm just going to run this on a series so let's say that I want to grab that converted comp column which is the salaries and I will grab the in largest and I want to grab the ten largest salaries so if I run this then we can see that we get the ten largest salaries reported so these salaries are all the same here at 2 million so I'm assuming that the survey capped their salaries at 2 million for this particular survey so you know I think that's pretty high so I'm curious to see what type of developers these people are and if they're in management roles or not and again we'll see how to further analyze these results in the next video but you know if any of you all are making 2 million dollars a year as a developer and are hiring then let me know because you know I'm not looking for additional work at the moment but I do think I would make an exception here for 2 million bucks a year now you can see here when we grab the tenth largest values from this series here from this column it only gives us the 10 largest values from that column but what if we wanted to see the other survey results from these rows well in order to do that then we can simply run this method on the entire data frame and pass in the column for which we want the largest results so I could say D F dot in largest and then I want the 10 largest from the converted comp column so if I run this then these will give me those same rows but now we have all of their survey results instead of just the salary so if I go up here to their index so this is 25 9 8 3 then we can see the first one here is 25 9 8 3 so this is the same result if I scroll over here to converted comp we can see that these are all 2 million dollars here and if we wanted to see these smallest values instead of the largest then instead of using in largest here then we can simply say in smallest so if I look at the smallest salaries here then this will give us the smallest salary values from our survey now I'm assuming this is probably just 0 for people who aren't currently working and yes this is a 0 here okay so that's a brief overview of how to sort our data and how to sort on multiple columns and get the largest and smallest values now before we end here I would like to mention that the sponsor of this video and that it is brilliant is a problem-solving website that helps you understand underlying concepts by actively working through guided lessons and brilliant would be an excellent way to supplement what you learn here with their hands-on courses they have some excellent courses and lessons on data science that do a deep dive on how to think about and analyze data correctly so if you're watching my Panda series because you're getting into the data science field then I would highly recommend also checking out brilliant and seeing what other data science skills you can learn they even use Python in their statistics course and will quiz you on how to correctly analyze the data within the language they're guided lessons will challenge you but you'll also have the ability to get hints or even solutions if you need them it's really tailored towards understanding the material so to support my channel and learn more about brilliant you can go to brilliant org forge slash CMS to sign up for free and also the first 200 people to go to that link will get 20% off the annual premium subscription and you can find that link in the description section below again that's brilliant dot org forge slash C m/s okay so I think that's going to do it for this pandas video I hope you feel like you got a good overview for how we can sort our data frames in the next video we'll be learning about aggregating and grouping data now this will be the video that a lot of people have been waiting for because this is what most people think of when they think of data analysis so for example we'll see how we can group our survey data by country and then get the median salaries for each country and things like that also would take care of some of that outlier data that we just saw before it's definitely a good skill to know in pandas and we'll be you know open up a lot of possibilities for exploring your data further but if anyone has any questions about will be covered in this video then feel free to ask in the comment section below and I'll do my best to answer those and if you enjoy these tutorials and would like to support them then there are several ways you can do that the easiest ways to simply LIKE the video and give it a thumbs up and also it's a huge help to share these videos with anyone who you think would find them useful and if you have the means you can contribute you're on and there's a link to that page in to description section below be sure to subscribe for future videos and thank you all for watching
