Python Pandas Tutorial (Part 1): Getting Started with Data Analysis - Installation and Loading Data
23:01

Python Pandas Tutorial (Part 1): Getting Started with Data Analysis - Installation and Loading Data

Corey Schafer 09.01.2020 1 665 051 просмотров 24 742 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
In this video, we will be learning how to get started with Pandas using Python. This video is sponsored by Brilliant. Go to https://brilliant.org/cms to sign up for free. Be one of the first 200 people to sign up with this link and get 20% off your premium subscription. In this Python Programming video, we will be learning how to get started with Pandas. Pandas is a Data Analysis Library that allows us to easily read, analyze, and modify data. Pandas is a fundamental tool to learn in the growing field of Data Science. So we'll start by learning how to install Pandas, how to load data into a Jupyter Notebook, and how to see basic information about the data we've loaded in. Let's get started... The code for this video can be found at: http://bit.ly/Pandas-01 Virtual Environment Tutorial - https://youtu.be/Kg1Yvry_Ydk Jupyter Tutorial - https://youtu.be/HW29067qVWk StackOverflow Survey Download Page - http://bit.ly/SO-Survey-Download ✅ Support My Channel Through Patreon: https://www.patreon.com/coreyms ✅ Become a Channel Member: https://www.youtube.com/channel/UCCezIgC97PvUuR4_gbFUs5g/join ✅ One-Time Contribution Through PayPal: https://goo.gl/649HFY ✅ Cryptocurrency Donations: Bitcoin Wallet - 3MPH8oY2EAgbLVy7RBMinwcBntggi7qeG3 Ethereum Wallet - 0x151649418616068fB46C3598083817101d3bCD33 Litecoin Wallet - MPvEBY5fxGkmPQgocfJbxP6EmTo5UUXMot ✅ Corey's Public Amazon Wishlist http://a.co/inIyro1 ✅ Equipment I Use and Books I Recommend: https://www.amazon.com/shop/coreyschafer ▶️ You Can Find Me On: My Website - http://coreyms.com/ My Second Channel - https://www.youtube.com/c/coreymschafer Facebook - https://www.facebook.com/CoreyMSchafer Twitter - https://twitter.com/CoreyMSchafer Instagram - https://www.instagram.com/coreymschafer/ #Python #Pandas

Оглавление (12 сегментов)

Introduction

hey there how's it going everybody in this series of videos we're going to be learning how to use the pandas library and Python so pandas is a data analysis library that allows us to easily read in and work with different types of data so we can use this to analyze CSV files Excel files and other similar formats so if you're getting into the data science field then this library is going to be essential to learn it's one of the most downloaded packages for Python and that's for a great reason so not only does it allow us to easily read in and analyze data but it also has great performance since it built on top of numpy and we'll be learning how to do different types of an analysis or if data analysis in this series so in this video we're going to be going over how to get pandas installed how to download the data that I'll be using for most of this series and also how to get all of this open in a jupiter notebook so that we're ready to do some coding and analysis now i'd also like to mention that we do have a sponsor for the series of videos and that is brilliant org so i really want to thank brilliant for sponsoring this series and it would be great if you all can check them out using the link in the description section below and support the sponsors and I'll talk more about their services in just a bit so with that said let's go ahead and get started so first of all let's install pandas so I'm using a clean virtual environment for this series but you don't have to use a virtual environment if you don't want to if you don't know what a virtual environment is and would like to learn more about those then I'll be sure to leave a link to my video on that topic in the description section below if anyone is interested so it's really easy to install pandas here all we need to do is say pip install pianist and we will let this run through and once we have pandas installed then let's also install Jupiter so that we can use Jupiter notebooks now I was a bit hesitant to use Jupiter for this series because some people find it difficult to get the hang of but honestly if you're going to be doing a lot of work with pandas then it's definitely a nice tool to use for this so now it's not necessary so you should be able to follow along with this series just fine if you're using a regular editor but Jupiter notebooks allows us to actually see our data more easily by using the browser to print out our data and tables that make it year to visualize so I'm gonna use it in the series but you don't have to in order to follow along so to install

Installing Jupiter

Jupiter I want to say pip install and this is going to be Jupiter lab and this is spelled Ju py ter la B Jupiter lab so we'll get that installed now I'm not going to go into a deep dive and how to use Jupiter in this series I'm mainly going to focus on pandas but if you'd like a detailed overview of how to use Jupiter then I do have a video on how to use Jupiter in depth and I'll leave a link to that video in the description section below if anyone would like to learn more about the details of using that ok so now we have pandas and Jupiter notebooks installed now we're going to need to download the data that I'll be using for most of this series now for anyone who's been watching my latest videos you know that I like to use the stackoverflow developer survey for different kinds of data analysis now the reason that I like to use this data is because it's real world data and it has a lot of data in there that I think would be interesting to most people who are watching these types of videos I've seen some other tutorials where the data just seems kind of unrealistic and not very relatable so hopefully using this data will keep people interested and also give you a good idea of what it's like to actually download real data from a

Downloading Data

source and start analyzing it with pandas so to download this data I have this pulled up here in the browser we can go over to the Stack Overflow survey results page now this is easy to find if you just google it but just to keep things easy I'll have a link to this download page in a description section as well ok now on this page you can download the data in CSV form for any year that they have available and now I'm going to go ahead and download the 2019 data which is the top data here so I'm going to download this CSV here and then we'll click on download again and this should go ahead and download this for us ok it did and now I'm going to open this in my finder here and I'm going to unzip this data it comes zip drive and once that data is downloaded and unzipped I'm going to go ahead and drag that folder to a folder here on my desktop and that's where we'll also create a notebook and analyze this data so real quick I don't have this open let me open up this pandas demo folder and this will open this and find her and now I will take the data and drag this into this pandas demo folder that is on my desktop so your projects can be anywhere but I just had I just created a project folder here on my desktop called pandas demo and it's completely empty except for the data that we just dragged in here so now I'm going to rename this since this is kind of a long name here I'm just going to rename this to data that was named developer survey 2019 but I'm just gonna call that data so that it's easy for us to find that within our script okay so what files do we have here in the directory that we unzipped in this data directory let me make this a little larger here okay so first of all if you download data that comes with a readme then this is usually helpful we have a readme file right here it tells you what these other files are going to be so in this case we have this survey results public dot CSV and that contains the main survey results one respondent per row and one column per answer and the survey results schema here has the questions that correspond to each column name and the results now if any of this doesn't make sense now then it will once we open up this data in Jupiter so I'm just given a broad overview here don't let this overwhelm you by everything that I'm saying here this will make a lot more sense once we open this up in Jupiter so let's go ahead and

Starting a Jupiter Notebook

do that so to open this in a Jupiter notebook I'm going to go back to my terminal so ahead and close these Finder windows open here go back to my terminal and now within here I'm going to navigate to my folder where I place that data and this should be the same command on Mac and windows so I'm gonna say CD and I'm gonna go to my desktop this is going to be wherever your project directory is but mine is in this pandas demo on my desktop and once I am navigated to that directory to start up a Jupiter notebook we just need to say Jupiter notebook and run that and we should see a server start up here and it seems like it's taking a second ok there we go now back in our terminal here this will run a Jupiter server and you will need to leave that terminal open while you're working in Jupiter so Jupiter rum runs in the browser so if you shut down this server then you won't be able to access our notebook okay so let's go back here to the browser and this is where we have our Jupiter notebooks so let me zoom in here so that we can so that everybody can read this fairly well okay I'll zoom in to about right there I think is good okay so we can see our data folder here

Creating a Pandas Notebook

that we downloaded and placed in our Jupiter demo folder a little bit ago but now let's create a new notebook so to I'm going to click on new up here at the top right and then I'm going to use Python 3 and now we can name our notebook so up here where it says untitled I'm going to click here and I'm just going to call this pandas demo and rename that ok so now we're ready to start using pandas so we can import this by saying import pandas as PD now importing pandas as PD is just a common convention when using pandas so let's run that and I ran that cell by pressing Shift + Enter and again I'm not going to go into the specifics of working here within Jupiter in this series but if you'd like a rundown of the features and shortcuts that I'll be using then I do have a link to my Jupiter video in the description section below ok so for the rest of this video

Loading Data

we'll see how to load in our data and look at some information about that data so our data is in a CSV format so in order to in that CSV we can simply say DF which is going to stand for data frame we learn about all about data frames here and a bit we're going to say DF is equal to PD dot read underscore CSV we're going to use the read CSV method from pandas here and now we just want to pass in a path to our CSV file now mine was within that data folder and that was within the file survey underscore results under score public dot CSV so now if I hit shift enter then that will run that cell so right off the bat we can see that this is pretty simple to work with so when using native Python in order to read in a CSV file we need to use the CSV module to create a CSV reader and things like that but here

Data Frames

we're just doing this all in one line so when it reads this in it's going to read it in as a data frame so data frames are pretty much the backbone of pandas and we'll go more into what go over data frames and series objects in depth in the next video but for the basics a data frame is basically just rows and columns of data we can see what a data frame looks like but just by printing it out and this is the great thing about using Jupiter notebooks because it allows us to visualize these things in ways that we can't do in other editors so here in Jupiter I can simply just say DF and run that and it will print out our data frame here so we didn't even need to wrap this here in a print function now if you're using a normal editor then you can still print out data frame in from information but it's not going to look as good as it does here in Jupiter where we get this interactive table so this is a small look at our data now this is actually 85 columns here but if I scroll through these then it doesn't look like there's actually 85 columns printed out here so this is actually concatenated by default just to give us a broad overview of the data so by default Jupiter is displaying 20 columns from our data frame now how did I know that there was 85 columns for this data frame well there are a few attributes and methods that we can use to get an idea of what our data looks like so first we have the shape attribute and shape gives us the number of rows and columns in a tuple form so let's look at this so in our next cell down here I'm gonna say DF dot shape and I will run that now this is an attribute here it's not a method so you don't want to put parentheses so DF dot shape and we can see that we have 88 thousand rows and 85 columns now if you wanted a bit more information then we can use the info method will give us the number of rows and columns and also all of the data types of all the columns as well now before I run that it looks like my text is getting cut off here a little bit sometimes this happens whenever I'm within Jupiter in order to fix this I usually just come up here and restart and run all my cells again that usually takes care of the problem let's see if that works okay so that seemed to work another thing that you can do here is just to totally reload the page and the browser and when you reload the page I think it's just because of how my I have this text enlarged so it's kind of messing with how these look but now we can see these just fine okay so like I was saying we can see

Info Method

here that we have eighty eight thousand eight hundred and eighty three rows and eighty five columns now if you wanted more information then we can use the info method and that will give us the number of rows and the number of columns but also all of the data types of the columns so let's run that so if I do D F dot info whoops now this actually is a method so we do want to you put the parentheses there and let me run this and now let's go over this output so we can see here that it says that we have eighty-eight thousand eight hundred and eighty three entries so those are our rows we have a total of eighty five columns and then it lists all of our columns here for our data so these are all the columns in our CSV file that we have loaded in now it also gives us the data types of each of these columns and we're going to go over data types in a future video but for the most part objects usually mean strings and then we have other things as well so int 64 is just an integer float is a float so a probably a decimal number and there are no other data types in this data set but there are more data types in general so I will be sure to do a video on data types specifically in the near future okay so now that we know the number of rows and columns let's change a setting here within Jupiter so that we can see all of the columns so I think it would be useful to see all of these if we'd like to even if there are a lot of these to scroll through so to do this we can at change a setting and I'm gonna come down here to the bottom here and I'm gonna change a setting by saying PD dot set underscore option and within here I will say display dot max underscore columns and I will set that equal to 85 so that we can see all of our columns and I will run that and now if we print out our data frame so I'm going to go back up here to where we print it out this data frame and I will rerun that cell and now if I scroll through these columns then we can see that now it looks like we actually have these 85 different columns here so I can keep scrolling and it didn't just chop us off at that 20 like it was before now obviously the rows are also being concatenated here and we definitely don't want to print all 89 thousand of these rows but there probably are some examples with certain datasets where you might want to see all of the rows as well so for example I said that the survey results schema CSV file that was included in our download gives the matching questions for all of these column names here so if we wanted to see what these column names here mean for this data then we can load in that schema CSV file as well so let me do this I'll go down to the bottom of our notebook and I will just load this in by saying schema underscore D F now I don't

Loading Schema Data

want to just call this D F because we don't want to overwrite our other data frame and I will load this in just like we saw before by saying PD dot read underscore CSV and this is within the data folder and this was called survey underscore results under score schema CSV so I will run this and now let's look at this schema data frame that we just loaded in so here we on this column here this gives us all of the columns in our other data frame so we have respondent main branch hobbyist and if I scroll up to that data frame here I'm gonna delete this info here since we no longer need that if I scroll up to this data frame here then we can see respondent main branch hobbyist so if we want to know what these mean then that's what we use the schema for so we can see that main branch or hobbyist means d-u code as a hobby main branch means which of the following options best describes now it actually concatenates the text too in order to actually see this to the full text we could either change an option or we could just access this value directly and I will be showing you how to do that in the next video but for

Rerunning Schema Data

now we can see that we can't see all of the rows to the questions that correlate to each column name here remember we have 85 columns but for here we can only see the first five and then we get this ellipses here and then we can see the last five so let's set this up so that we can view 85 rows and then reprint this so that we can see all of these so back in the same cell where we set our max columns now let's also add one four rows as well so I'm just going to copy and paste that but instead of max columns here I'm gonna have this be max rows and I will run that and now we will rerun this schema here and now we can see that we can see all of the columns and the corresponding question text so if you wanted to know what any of these columns mean then this is how we do it so we can see IT person the question was are you the IT support person for your family so that's probably a yes or no question so that is what those mean so if you're going through this data on your own then you can use this as a reference anytime you don't know what a certain column means in our survey data and if you don't know or if you don't want to look through all of these to find a specific row or a specific column name then in a future video we're going to learn about filtering data frames and see how we can just grab a specific row where the column equals a certain value okay so now we have all 85 rows visible

Viewing a Certain Number of Rows

of our schema data frame here but you might be thinking well that's nice but I don't want to see eighty five rows of my survey data every time I want to look at it but there are a couple of methods that we can use to only see a certain number of rows which you'll most likely use a lot just to get an idea that your filters and data frames seem to be working correctly so we can see the first five rows by saying instead of doing a DF here we can say D F dot head and if I run that then we just get the first five rows here okay and you can pass value if you want to see a certain number of values so if you wanted to see the first ten rows then we could pass in a ten to D F dot head and this gives us the first ten rows so we can see it goes all the way down zero through nine there now if you'd like to see the last rows instead of the first rows then we can use the tail method instead so if we say DF tail and we could use it without a number also but if we pass in a number just like with head then now we're going to say that we want the last ten entries here in our data so those are the last ten items of our data okay so this is a brief overview of getting pandas installed and then downloading our data and loading our data in to Jupiter and how to read this in now before we end

Conclusion

here I'd like to mention the sponsor of this video and that is brilliant org so in this series we've been learning about pandas and how to analyze data and python and brilliant would be an excellent way to supplement what you learn here with their hands-on courses they have some excellent courses and lessons that do a deep dive on how to think about and analyze data correctly for data analysis fundamentals I would really recommend checking out their statistics course which shows you how to analyze graphs and determine significance in the data and I would also recommend their machine learning course which takes data analysis to a new level well you'll learn about the techniques being used that allow machines to make decisions where there's just too many variables for a human to consider so to support my channel and learn more about brilliant you can go to brilliant org Forge slash CMS to sign up for free and also the first 200 people they go to that link will get 20% off the annual premium subscription and you can find that link in the description section below again that's brilliant org forge slash CMS okay so I think that is going to do it for our first pandas video I hope you feel like you've got a good introduction on how to install pandas and load in your data to a jupiter notebook in the next video we're going to be learning more about data frames and also learn about the series data type so we'll learn how we can think about data frames in a way that's easier to understand and also see how we can grab certain elements columns and rows from these as well so be sure to stick around for that but if anyone has any questions about will be covered in this video then feel free to ask in the comment section below and I'll do my best to answer those and if you enjoyed these tutorials and would like to support them then there are several ways you can do that the easiest ways to simply like the video and give it a thumbs up and also it's a huge help to share these videos with anyone who you think would find them useful and if you have the means you can contribute the patreon and there's a link to that page in the description section below be sure to subscribe for future videos and thank you all for watching

Методичка по этому видео

Структурированный конспект

Pandas для анализа данных: установка, загрузка данных и первые шаги в Jupiter Notebook

Пошаговое руководство по началу работы с библиотекой Pandas в Python: установка, загрузка CSV-данных и базовые методы исследования датафреймов в Jupiter Notebook.

Другие видео автора — Corey Schafer

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник