Python Pandas Tutorial (Part 3): Indexes - How to Set, Reset, and Use Indexes

17:26

Python Pandas Tutorial (Part 3): Indexes - How to Set, Reset, and Use Indexes

Corey Schafer 13.01.2020 460 143 просмотров 10 860 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

In this video, we will be learning about the Pandas indexes. This video is sponsored by Brilliant. Go to https://brilliant.org/cms to sign up for free. Be one of the first 200 people to sign up with this link and get 20% off your premium subscription. In this Python Programming video, we will be learning about the Pandas index. Indexes allow us to label our rows so that we can access them more easily. We'll learn how to set, reset, and use indexes properly. Let's get started... The code for this video can be found at: http://bit.ly/Pandas-03 StackOverflow Survey Download Page - http://bit.ly/SO-Survey-Download ✅ Support My Channel Through Patreon: https://www.patreon.com/coreyms ✅ Become a Channel Member: https://www.youtube.com/channel/UCCezIgC97PvUuR4_gbFUs5g/join ✅ One-Time Contribution Through PayPal: https://goo.gl/649HFY ✅ Cryptocurrency Donations: Bitcoin Wallet - 3MPH8oY2EAgbLVy7RBMinwcBntggi7qeG3 Ethereum Wallet - 0x151649418616068fB46C3598083817101d3bCD33 Litecoin Wallet - MPvEBY5fxGkmPQgocfJbxP6EmTo5UUXMot ✅ Corey's Public Amazon Wishlist http://a.co/inIyro1 ✅ Equipment I Use and Books I Recommend: https://www.amazon.com/shop/coreyschafer ▶️ You Can Find Me On: My Website - http://coreyms.com/ My Second Channel - https://www.youtube.com/c/coreymschafer Facebook - https://www.facebook.com/CoreyMSchafer Twitter - https://twitter.com/CoreyMSchafer Instagram - https://www.instagram.com/coreymschafer/ #Python #Pandas

Методичка по этому видео

Структурированный конспект

Мастерство индексации в Pandas: ускорение поиска и оптимизация структуры данных

Изучение методов установки, сброса и использования кастомных индексов для эффективного поиска данных в библиотеке Pandas для Python за 17 минут.

Оглавление (4 сегментов)

Introduction

hey there how's it going everybody in this video we're going to be learning more about indexes so we've seen basic default indexes and previous videos but in this video we'll learn how to set custom indexes and the benefits of doing this now I'd also like to mention that we do have a sponsor for this series of videos and that is brilliant org so I really want to thank brilliant for sponsoring this series and it would be great if you all can check them out using the link in the description section below and support the sponsors and I'll talk more about their services in just a bit so with that said let's go ahead and get started okay so I have my snippets file open here or my snippets notebook open here so that we can look at indexes using a simple data frame with a little bit of data and then we'll see how to use these with our larger survey data set that we've been using so far in the series so in these snippets

Setting an Index

we have the same small data frame that we saw in the last video where we just have three people with their first name last name and email address and I have this data frame displayed down here at the bottom right here so like I said in previous videos our data frames have this thing on the far left over here that looks like a column without a name and this is an index so since we've just seen these default and set indexes so far this is currently just a range of numbers that's basically an integer identifier for the rows so this is a 0 a 1 and a 2 now sometimes it may might make more sense to have a different identifier for each row and that will basically be the label for that row so it's usually unique now pandas doesn't actually enforce indexes being unique and sometimes it won't be but most of the time these will be unique values so what might be a better index for our sample data here well maybe the email address would be a good index for this data since that's usually a unique value for most people so right now if I wanted to view all of the email addresses then we could say DF and access the email column and we saw this in the last video but we can see that it displays all of these email addresses now what if we wanted to set these email addresses as the index for this data frame well to do that we could just say DF dot set underscore index and then we can just pass in the name of the column that we want for the index so if I run this then now we can see that the email is on the far left and it's bold and it actually does kind of look like a normal column because this index has a name it has the same name as our column when we set it so I want to show you something here we can just set this index in this cell but if I look at my data frame again so underneath here if I say DF to print out this data frame then we can see that our data frame didn't actually change it still has the default index over here on the left and that's because pandas doesn't do a lot of these changes in place unless we specifically tell it to do so and this is actually nice because it allows us to experiment without worrying about modifying our data frame in unexpected ways so let's say that we actually did want to set our index to the email column and have those changes carry over into these future cells so to do this back up here where we said set index we can just add in another argument here and say in place is equal to true now if I run that and then rerun the data frame again then now we can see that it actually dead did set that index and modified that data frame and we can actually look specifically at that index just by saying DF dot index and if I run that then we can see that we have an index here and it has the values as a list of what all the index values are and it also tells us that the name is equal to email okay so why would this actually be useful well like I said before the email address as the index gives us a nice unique identifier for our row and remember in the previous video that we used dot loke to search our data frame by label well these indexes are the labels for these rows so before we just use the default ranged index but now we can find a specific row by passing that label so if I say D F dot Lok this will be easier if we actually just look at an example here if I say D F dot Lok before we were passing in a zero as the label but now I can say okay I want to see the information for Cory M Schafer at gmail. com and then it'll come back and say okay that person has a first name of Cory last name of Schaefer and so on so now we get the row for that specific email index and like we saw in the last video we can still pass in values for the specific columns as well so if we wanted the last name then I could just pass that in as the second value so I'll just say I want the last name and we can see that there we get Schaefer now we actually no longer have those default integers as our index because now it's using the email so if I try to use those integers that we use before so if I say if I want Rho of zero using Lok then we're going to get a type error and I get an error because it no longer has an index with that label if you want to use integer location instead of labels then you still have the eye loking index are available to you and we saw that in the last video as well so if I change this to be eye Lok instead of Lok then it'll still give us that first row so that still works now if you accidentally set the index and want to reset it then we can do that with the reset index method so down here I will just say D F dot reset underscore index and I will do an in-place equal to true so that those changes carry over and then I'll go ahead and print out that data frame so if I run this then we can see that now we're back to having that email as a column and the default range index now if you actually know what you want the index to be when you're creating your data frame then you can simply set it there instead of setting it later using the set index method and we can do that as we're loading in data from a CSV or other source as well so let me switch over to our other notebook here with the stack overflow data that we've been using so far throughout the series and we'll take a look at some real-world examples of why using indexes is useful now again for those of you have been following along with the series so far this should look familiar to you but if you haven't been following along and this is the first video you've watched I here's a brief overview of what's going on we are loading in pandas here some CSV files here as well and I have a link in the description section below to the data that we are using for these CSV files and then we're also setting some options here in pandas to display max columns to where we can see all the columns and max rows to where we can see a lot of these rows and then this is what our data frame looks like these are just survey results from Stack Overflow so far if we look at our data frame here in the series so far we've been using this default index and we can look over here and see that it's just this range from zero one two and three now if we look at the survey response data then it looks like they actually have a unique value per row within the data itself so if we look at this respondent column here this respondent column is actually a unique ID so its respondent one respondent two and three and so on so really we should probably clean this up a bit and just use that respondent ID as our data frame index now we could do this just like we saw before by coming down here and saying D F dot set index and do it that way or we can do this while we're actually reading in the data by passing in an additional argument to the read CSV method so up here where we loaded in the data let's just add another argument here and we will call this index underscore call is equal to and now the name of the column that we want to be the index and in this case I want it to be this respondent unique ID here so I'm going to say our index call is equal to respondent I will rerun that cell now I'll come back down here and rerun our data frame head and now we can see that this is cleaned up because now we have this respondent as our actual index so now these are the labels so if you wanted the first respondent then we could just say DF Lok and that is one so this is the first respondent there okay let me delete that cell okay so now let me show you a real

Real World Example

world example of where I would use this so if you remember from earlier in the series we have our survey data that we can see here but we also have another data frame that tells us what each of these columns actually means in the survey data so let me display that data frame real quick so for example if I wanted to know what hobbiest meant then we can look at our schema data frame here and we can see that the answer that they are the answer on the survey or the question on the survey for hobbiest was do you code as a hobby so when we see yes and no questions up here for hobbyist they were answering the question do you code as a hobby so here's a question what if I wanted to locate what a specific column meant without needing to search through this entire data frame manually well in this case we can simply set the column name as the index and use the dot Lok indexer so I'll set this index up here where we loaded in the schema data frame so let me go back up here to the top and the column we want this here to be the index because these are all unique values so I'm gonna grab the name of this column which just so happens to be column and then up here I will say that I want the index to be equal to this column of actually let me get that within the string there okay index call is equal to column run that and now let's rerun our schema data frame and now we can see that column is bold so that is our index so now we can use dot Lok to search for those columns directly so if I wanted to see what the information for the hobbiest column was then I could just say let me scroll down here I could say schema data frame dot Lok and then we can just pass in the label of the index that we want so if I wanted to see what hobbyist was then we can see oh okay hobbyist has the question text of the you code as a hobby so let's go back to our survey data here and see if we can find a column that doesn't make much sense to us so if I scroll through these here okay like what would this one mean MGR idiot okay MGR idiot so let me scroll down here and now I'll paste that in to the schema D F dot Lok and rerun that now we can see that we get the information for that column now the text is actually truncated in Jupiter notebooks by default but we can change that setting if we'd like to see this entire question text but I kind of want it on since we have so much data to display but instead if you want to see the full text for that question then you can just access the data in that row and column directly by also passing in the column name into dot Lok as well so just like we've seen before when we're using dot Lok this is the row we want so what column do we want so we want to read that question text so I'll paste that in and now if I run this again now we can see that the full question text for what MGR idiot means is how confident are you that your manager knows what they're doing so this is one nice example here of when setting these indexes is useful because it allows us to make the Search these rows by labels very easily just like we did here with the schema now let me show you one more thing before we finish up here so we can see that we were able to set our indexes and all of that looks good but it might make it a bit easier to read this schema data frame if the ink indexes were sorted alphabetically so to do that it's just as easy as saying schema DF dot sort underscore in and let's run this let me scroll down here a little bit and we can see that now these indexes are sorted alphabetically so if we knew that we wanted to you know get to employment or something it's going to be in the ease so that just makes it a little bit easier to find what you're looking for and if you wanted to sort this in descending order instead then we could just say ascending is equal to false and now we can see that it is in reverse order here instead so depending on your data that might make it easier to read so we're going to go over a lot more advanced sorting in a future video but these are just some basics on getting our indexes in order now as usual with our data frame if you wanted to the sort to be permanent and carry over into future selves then you should set in place equal to true and we'll see that a lot throughout the series because doing it this way kind of allows us to you know see what things would look like without actually affecting the data frame itself so down here the data frame is still unsorted but if I come up here and say sort index in place equal to true rerun that and then rerun our schema data frame then now our schema data frame is permanently sorted so depending on your data learning about these indexes will be really useful especially whenever using dot Lok because it allows you to search by label there which is extremely useful depending on what type of data you are

Conclusion

searching through okay so before we end here I'd like to mention the sponsor of this video and that is brilliant org so in this series we've been learning about pandas and how to analyze data and python and brilliant would be an excellent way to supplement what you learn here with their hands-on courses they have some excellent courses and lessons that do a deep dive on how to think about and analyze data correctly for data analysis fundamentals I would really recommend checking out their statistics course which shows you how to analyze graphs and determine significance in the data and I would also recommend their machine learning course which takes data analysis to a new level well you learn about the techniques being used that allow machines to make decisions where there's just too many variables for a human to consider so to support my channel and learn more about brilliant you can go to brilliant org Forge slash CMS to sign up for free and also the first 200 people they go to that link will get 20% off the annual premium subscription and you can find that link in the description section below again that's brilliant org forge slash CMS okay so I think that is going to do it for this pandas video I hope you feel like you got a good idea for how to use indexes and why these might be useful and like I said you don't have to use indexes but sometimes it just makes sense depending on your data so for example in this video setting the schema data frame index made it very simple for us to look up column names easily so in the next video we're gonna start learning how to filter data frames and grab data that meets specific criteria so maybe we only want to see data where you know the salary is above a certain amount or maybe we only want to see data for people who said that they use Python in this survey so we'll take a look at being able to filter data frames on that type of criteria in the next video but if anyone has any questions about what we covered here then feel free to ask in the comment section below and I'll do my best to answer those and if you enjoy these tutorials and would like to support them then there are some ways you can do that the easiest way is to simply like the video and give it a thumbs up and also it's a huge help to share these videos with anyone who you think would find them useful and if you have the means you can contribute through patreon and there's a link to that page and the description section below be sure to subscribe for future videos and thank you all for watching

Другие видео автора — Corey Schafer

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник