# Python Pandas Tutorial (Part 11): Reading/Writing Data to Different Sources - Excel, JSON, SQL, Etc

## Метаданные

- **Канал:** Corey Schafer
- **YouTube:** https://www.youtube.com/watch?v=N6hyN6BW6ao
- **Дата:** 01.04.2020
- **Длительность:** 32:45
- **Просмотры:** 202,833

## Описание

In this video, we will be learning how to import and export data from multiple different sources. We will cover CSV, JSON, Excel, SQL, and more.

This video is sponsored by Brilliant. Go to https://brilliant.org/cms to sign up for free. Be one of the first 200 people to sign up with this link and get 20% off your premium subscription.

In this Python Programming video, we will be learning how to load and save data using multiple different sources. We will learn how to read/write data to CSV, JSON, Excel, SQL, and more. This covers the vast majority of formats you'll see in the data science field and will be extremely useful to know. Let's get started...

Video Timestamps:
Read CSV - 0:56
Write CSV - 3:20
Write TSV - 4:40
Read TSV - 6:00
Write Excel - 6:15
Read Excel - 10:42 (Start at 6:15 to see installed packages)
Write JSON - 12:18
Read JSON - 15:41
Write SQL - 16:59
Read SQL - 24:57 (Start at 16:59 to see installed packages)

The code for this video can be found at:
bit.ly/Pandas-11

StackOverflow Survey Download Page - http://bit.ly/SO-Survey-Download

Environment Variables Tutorial - https://youtu.be/5iWhQWVXosU

✅ Support My Channel Through Patreon:
https://www.patreon.com/coreyms

✅ Become a Channel Member:
https://www.youtube.com/channel/UCCezIgC97PvUuR4_gbFUs5g/join

✅ One-Time Contribution Through PayPal:
https://goo.gl/649HFY

✅ Cryptocurrency Donations:
Bitcoin Wallet - 3MPH8oY2EAgbLVy7RBMinwcBntggi7qeG3
Ethereum Wallet - 0x151649418616068fB46C3598083817101d3bCD33
Litecoin Wallet - MPvEBY5fxGkmPQgocfJbxP6EmTo5UUXMot

✅ Corey's Public Amazon Wishlist
http://a.co/inIyro1

✅ Equipment I Use and Books I Recommend:
https://www.amazon.com/shop/coreyschafer

▶️ You Can Find Me On:
My Website - http://coreyms.com/
My Second Channel - https://www.youtube.com/c/coreymschafer
Facebook - https://www.facebook.com/CoreyMSchafer
Twitter - https://twitter.com/CoreyMSchafer
Instagram - https://www.instagram.com/coreymschafer/

#Python #Pandas

## Содержание

### [0:00](https://www.youtube.com/watch?v=N6hyN6BW6ao) Intro

hey there how's it going everybody in this video we're going to be learning how to read and write data to different sources so we'll learn how to read and write data using CSV files Excel files JSON and also sequel databases now in this series so far we've been reading data from CSV files but in data science there are so many different ways for data to be stored so by the end of this video you should be able to get your data to and from pandas no matter what data format you're using now if you're watching this video because you're looking for how to read and write a specific file format then I'll be sure to add timestamps in the description section below to where we read and write from each different format now I would like to mention that we do have a sponsor for this series of videos and that is brilliant so I really want to thank brilliant for sponsoring this series and it would be great if you all can check them out using the link in the description section below and support the sponsors and I'll talk more about their services in just a bit so with that said let's go ahead and get started ok so first let's look at CSV files since we've already been using these throughout the series we should already be familiar with reading data in from CSV since that's what we've been doing so far but in case this is the first video of the series that you're watching let's go over this one more time and then we'll also learn how to write to a CSV file as well so up here towards the top of my notebook here we can see that I'm reading in this CSV file and this CSV file is within a data folder that is in the same location as this jupiter' notebook on the file system now if you have a CSV file loaded elsewhere on the system then you'll need to pass in the full path to that file instead of just this relative location that we have here and we can see that we have different arguments that we can pass in when reading our CSV files so in this example I'm at eclis setting the index to this respondent column here which is the respondent ID for each person who answered this survey and when I read in the CSV we can see that it sets this data frame equal to the data and we can print this data out down here at the bottom so that is the read CSV method and it allows us to pull data in to pandas now let's learn how to write this data back to a CSV maybe you're gonna make some changes and some different analysis here to your data frame and then we want to export this back to our file system for later use or so that we can share it with someone else or something like that so for example let's filter down for a specific location in this survey you know maybe you're doing some analysis for your specific country and you just want to see the survey results from that location we've seen this in previous videos but if we want to filter then we can simply say I'll create a filter here and just say that I want the country here and I'll grab if the country is equal to India so let's say you're doing some analysis and you only want the survey results from India so now I'm going to create a new data frame here I'll call this India DF and du ADF dot lok and pass in that filter so now if I do an India DF dot head to take a look at the beginning of this new data frame if we look over here in the country column then we can see that all of these countries here are now set to India so now let's say that we want to export this new filter data frame to a CSV file so to do this we can use the to CSV method so we can say I'll just say India underscore DF which is our data frame dot to underscore CSV and now I'm just going to pass it into that same location in that data directory and then I'll just call this modified dot CSV so if I run this we can see that we don't get any errors and now if I go back and look at my file system here then I have this modified D CSV so if I click on this then we can see that this is you know a little bunch together since it's a CSV file a raw CSV file that we're looking at but we can see that we have all of our column names here and then the second row should be the first result and I can see here that we have India for that country if I look at the second result we can see we have India again and India again down here most likely I can't see it but you we can just assume that it's there it's looking good oh and actually there it is right there so we can see that we did actually export that data frame where we filtered that down to a new CSV file okay so that was easy enough so now let's look at how to read and write to some other formats so one thing that you might run into is a tab delimited file these are almost exactly the same thing as CSV files but instead of your data being separated by a comma the data is instead separated by tabs so to do this in pandas we're still going to use the same CSV methods that we've already seen but we're going to pass in a custom separator so we can write to a tab delimited file just by changing the file extension here to dot TSV and I'm also going to specify a separate or argument so I'm going to say set s P and then you want to pass in your separator now you can pass in anything here if you want you know a file that is separated by hashes or anything but commas and tabs are probably the most common so I'm going to put a backslash T there because that's how we specify tabs in Python and now if I run this cell I'm going to go back to our data directory here we can see that now we have this modified dot T SV if I click on that then we can see that now this looks almost exactly the same as the comma separated file but now we have tabs here instead of commas now if you're reading in tab CSV files then all you need to do is take this se P equal to backslash T and you can just add that as an argument up here to read CSV so it's basically the same thing okay so now let's move on to some other

### [6:15](https://www.youtube.com/watch?v=N6hyN6BW6ao&t=375s) Excel

file formats now a very popular file format when working with this kind of data is Excel now if we want to write to excel then we're going to need to pip install a couple of packages so I have my terminal open with the current environment that I am using this is my Jupiter notebook running here let me grab my other terminal so I have the same environment that I'm you within Jupiter you want to be sure that you're using that same environment so that your pip installing and the right location and now we're going to install a couple of packages so first I'm going to say pip install and this is XL WT so XL WT will write to an older XLS Excel format but if you want to write to a newer xlsx Excel format then we'll also need to install open pi excel and you can pip install multiple packages but just by listening them all right here and finally we want if we want to read excel files then we can install the excel our D package so I think that is the three packages that we're going to need in order to work with excel files here so I'll go ahead and install all of those and let those finish and once those are installed let's go back to our notebook and now let's try to write to an excel file so I'm just going to write the same modified data frame that we have here and we are going to use the to underscore excel method and this is just as easy as passing in let's say I'll save it in that data folder again I'll call this modified dot xlsx so I'm going to write to the newer excel format so if I run this then it might take a second here for this to work because it's actually creating this excel file on the back end so let's let this finish and we can tell it's finished when this turns from an asterisk to a number here okay so once that's finished let's flip over to our data folder here and we can see that we do have that dot xlsx file now this likely won't open up in Jupiter because this is an excel file we can see here that we can't open this up in the browser we actually need Excel so let me open up my Finder window here I have this open down here and I am within this data folder and we can see that we have our modified dot xlsx file here now I don't actually have Excel on this machine I have numbers so I'm going to open this up in numbers it should basically be the same on Windows but you can just open it up with Excel now again this might take a second to open up because we do still have a lot of rows here in this data okay so we've got this opened up in Excel again I'm on numbers because I'm on a Mac and I don't have Excel installed but it should open up fine in Excel as well let me zoom in a little bit here so we can see and we can format these if we need to so for example we can change the column sizes here so that all these fit in but we can see here that we have our respondent IDs if I look over at country we can see that it did export the filtered data frame that we were hoping to export so everything looks good here now there are also some more advanced things that we can do with Excel as well if you're familiar with Excel then you might know that we have the concept of different sheets where we can have multiple spreadsheets and one excel file and if you want to read or write to a specific sheet then you can pass in a sheet argument to these methods actually trying to scroll over to my notebook here let me scroll down here to the bottom so like I was saying if you want to read or write to a specific sheet then you can pass in a sheet argument to these methods and there's also a way to start from different columns and rows as well but I'm not gonna go into all these little details here if you google this method name to excel then you can find the arguments that you can pass in and all the additional details and the documentation so for now let's go ahead and move on and see how that we can read in that same excel file that we just created and make sure that this works now by default it's going to load in with a default index just like when we read a CSV file so we'll have to specify that we want our index column to be that respondent column so to do that I'm just going to call this test since we're going to be creating a new data frame here from that excel file that we just created and we're going to use the read underscore excel method here and now we just want to pass in the location and I'll just go ahead and copy this here so that is modified dot xlsx on my machine and now I want to set that index column equal to and that was respondent on your data that might be different but I want my index column to be equal to that respondent so I'm going to run that cell and load that in and then I'm going to look at that test data frame now before I run this I'm going to make sure that this finishes processing here and that this asterisk goes away again it can take some time because it's actually you know loading in that data from Excel now which is a little more tricky than loading it in from a CSV so now if we look at that test data frame let me just look at the head here instead of looking at the whole thing so if I look at the head then we can see that we have the same data frame here that we had up here so that was exported to excel and imported correctly okay so

### [12:18](https://www.youtube.com/watch?v=N6hyN6BW6ao&t=738s) JSON

now let's cover some other popular file formats now JSON is also really popular for this kind of data so let's take a look at that first let's write our modified data frame to a JSON file now for writing then we can use the to JSON method so you're probably starting to see a pattern here these method names are very straightforward now this one is a bit different since there are some different orientations that we can use for JSON so just by using the default arguments I can just say so that was India D F dot to underscore JSON and now I'll pass in a file location here but instead of an excel file we want a JSON file now I'm just going to use the default arguments for now and then I'll show you how we can change this up a bit so if I run this we can see that ran very quickly now if I go back to my data folder here then now we have this JSON file if I look within here okay that took just a second to open up on my machine again we do have a lot of data in here but if we look in here then we can see that this is very dictionary like so we have a main branch key here and then the value for that key are all of the responses just for that column and if I was to scroll down here then I would be able to find the other keys and the other responses as well so by default this is a dictionary like JSON now there are also different ways that we can write JSON files again I'm not going to go into every single little detail here but let's say that we wanted this JSON to be list like instead of dictionary like which is how it is by default so to do this we can change the Orient argument so instead let's add on here to our arguments and I'm gonna say Orient is equal to and if we pass in records and lines equal to true then this will now make this records like which is list like and this lines equal to true let me spell that right we'll just make each of these on a new line so it might be a little bit easier to read now if you want to see the exact arguments that you can pass in to Orient then again just look pandas to JSON method and it'll take you to the documentation with all the different things that you can pass in here so let me run this and now let's go back and reload our JSON file to see how this looks and now what we have here is more lists like so before we had a single dictionary where the values were a list of all of the responses but now we have one response at a time so we have the main branch and then so this is actually this first one here if I scroll down we can see that this is the second response this is actually the entire first response so we have the main branch and then that answer and then open source so on and we can see here that for the country we have India and each response within this survey is actually on a different line so that's a little bit different than and how it was before but there's just different ways that we can export these JSON files depending on your needs okay so now that we've written our data to JSON files now let's also read this JSON file so that we can make sure that we know how that's done as well now since we wrote the JSON file with these different arguments here then we need to use those same arguments when we read the data in as well so if you're reading in JSON files and have any issues then you might need to play around with the different arguments to fit the data that you're trying to read in so in this case I'm just going to copy this whole line here and I'm gonna say test is equal to and actually let me just grab this part and I'll say PD dot read underscore JSON and now I'll pass in all those arguments here so we are reading the JSON file from this location we know that the Orient is list like instead of dictionary like and that all of these are on new lines and again depending on your JSON data you might need to go in and change these around depending on how your data looks so if I run this then let's see if we have the same data that we exported before and it seems like we do this looks exactly like it did whenever we exported this data okay so now the last file format that we're gonna look at let's learn how we can read and write data from sequel databases now this is probably the most complicated simply because you have to have the database setup and all of that good stuff but for the sake of this video I'm going to assume that you already have a database with the correct credentials to log into that database so I have a Postgres database set up on my machine that will be reading and writing to so first let's see how we would connect to this database now just like with excel we're going to need to install a package to do this so let me bring up my terminal here and I'll close this numbers file here let's see let me try to quit out of this actually I'll just minimize it it's having trouble shutting down okay so let me go back to the terminal that I hope opens the two where I can install some different packages and that's my Jupiter notebook where is my other terminal here we go okay so to connect to our database we're going to want to install sequel alchemy and this is a very popular ORM for a Python that allows us to more easily work with databases if you don't know what an ORM is it stands for object relational mapper and it's just a way for us to use Python objects in order to connect to a database I plan on doing a complete video or a complete series on sequel alchemy in the future but for now let's go ahead and just install this so this is pip install SQL alchemy and I'll install that and depending on the database that you're using you might not need to do anything else here so for example if you're using sequel light or something like that but since I'm using a Postgres database for this tutorial I also need to install the psycho PG to package that allows us to work with Postgres I'm not sure if that's actually how you say that package name but that's what I've always called it so pip install and to install this package to work with Postgres it's psycho PG - binary so I'll install that and with those packages installed let's go back to our notebook and see if we can connect to this database using sequel alchemy so first we're going to want to import everything that we need so from SQL alchemy I'm going to want to import there create engine and this will allow us to connect to the database now I'm also going to want to import psycho PG - so let me run this sell and now that those are imported we should be able to create the engine which is basically our database connection and again I'm going to assume that you've already created this database and have a username and password so to create this I can say engine is equal to and use that create engine function that we just imported from seek welcoming and now we need our Postgres connection string now if you don't know how to make Postgres connection or connection strings then you know they have this available on the sequel alchemy site as well let me make sure I spelled this correctly that is PostgreSQL and then we want to pass in the username and password for our database now for my case I just made a user of DB user and a password of DB Pass now another thing here that I'd like to mention is that you probably shouldn't put credentials within code like this I'll leave a link in the description section below where I show how in Python you should use you know something like environment variables or a config file to hide this information but for the sake of this tutorial I'm just going to put it directly in here but if you're doing this in a in production code I would highly recommend using environment variables so that you know you don't expose your username and passwords within your codebase okay so there we have our username and password and now the database that we want to connect to so this is on localhost this is on my local machine it's running on port 5 4 3 2 and now the name of the database now I have PG admin open here where I can see my databases and we can see that I've just created an empty database here called sample underscore DB so that is the database that I'm going to connect to ok so if I typed everything correctly here then I should be able to get a connection to that database so now let's try to write our modified data frame to a table in this database and this table doesn't need to currently exist by default it will create this table for us if it does already exists then we'll need to add another argument to handle that but we'll see that in just a second so to do this I can just say India underscore DF which is the data frame we want to export then this is to underscore SQL and now the table that we want to write this data to I'm just going to call this sample underscore table now again this doesn't currently exist but it should create it now we need to pass in our database connection here I called mine engine so let's pass that in and if I run this let's see if this works ok so we didn't get any errors whenever I read that or whenever I wrote that but now let's go back to my PG admin here and let's see if I can see this table so first I'm just going to right click and refresh I like to do that anytime I've made any changes and we can see there that here that we have a sample table down here I'm going to right click on that and go to view and edit data and look at all the rows here and we can see it does look like this worked I know that this is probably a little difficult to see on my screen here but we have all of our data written here into the database ok so that's good that we were able to get this from pandas into SQL but now what if we updated our data and wanted to rewrite that data to this database so let's go back to our notebook and see what this would look like now if I try to run this same line again where we export this to SQL then we're actually going to get it in error because this table already exists if you want to write over a table then we can add in an additional argument and the argument that we want to add in is called if underscore exists equals and now what we want to do if this table already exists now in my case I'm just going to replace that table with our new data but there are also other options as well we could have it throw an error we could which is what it does by default we could also append data to a table so if you're doing like a daily script where you're analyzing information then you can just append that daily data to your existing table but for this example I'm just going to have this replace the table so let's run this and once this is finished processing then I will go back to PG admin now again let me come up here and refresh this and dig back down into the database and let me close this view here and let's see if we still have this data okay so we can see that this worked we were able to rerun that command and it just replaced that data that was in that existing table with our new data in this case it was the same data but that's how you would do that okay so lastly now that we've seen how to add our data to a database now let's see how we can read in this same data using SQL now if you skip to this part of the video using the timestamps and the description sections below then please go back to when we wrote data to our database and see how I set up this database connection here because we're going to reuse that same connection to read in our data okay so this is pretty simple now that we actually have this database connection set up to do this we can just say I'll call this SQL underscore DF and we will just say PD dot read underscore SQL and now we want to pass in the table that we're going to read from and that was sample underscore table and now pass in our database connection my connection here I called engine and also I'm also going to pass in an index column just like we did when we read in our CSV so I'll say index column is equal to and that is going to be this respondent row right here for your data that might be different so whatever you want to be your index just pass it in there if you want pandas to just do a default index then you can just leave this off entirely okay so if I run this then let's look at SQL D F dot head to make sure this worked and we can see that worked well we still have the same data frame here that we started off with where we filtered down these countries to just be the results from India now

### [26:34](https://www.youtube.com/watch?v=N6hyN6BW6ao&t=1594s) ReadSQL

there might be instances where you don't want to load in an entire table but you want to run a specific SQL query in order to load in our data to do this we can use the method read underscore SQL underscore query to run a specific SQL query so let me just copy what I did here and paste this down here and now instead of reading in this entire table I'm going to actually run a query here so I'll do read underscore sequel underscore query and now instead of the table name here I'm actually going to pass in a sequel query now I'm just going to load in everything here so I'll say select star from sample underscore table and everything else here is going to be the same we're still have we still have our database connection and we still want our index column to be equal to respondent so this is still going to grab all the rows but if you wanted to customize this then you could add in a where clause here to filter this down so let me run this and now let's look at our sequel data frame here and we can see that worked as well so we loaded in this data using a sequel query instead of just reading in the entire table so that can be especially useful when you're working with large databases where you only want to load in specific data using a query okay so we're just about finished up

### [28:03](https://www.youtube.com/watch?v=N6hyN6BW6ao&t=1683s) URL Methods

here but let me show you one more tip before we wrap this up so you may have seen people load in data using a URL instead of a specific file for some of the methods that we've looked at before and we can do that all you need to do is you need to be sure that you're using the correct method for whatever form of data is on the URL so for example in my flask and Django series I created a JSON file of some sample posts for the website that we were creating in that series and I have that JSON file on my github page now if I wanted to bring that into pandas then I could simply use the re JSON method and then pass in that URL I wouldn't actually have to download that JSON first and then pass it in that way so I have this open here if you didn't know on github you can look at the raw files so we can see that this is a long URL here but I will have this code posted in the description section below if you'd like to follow along so I'm just going to copy this URL and this isn't on my file system and now let's see if we can just load this in so I'm gonna call this post underscore D F and I'll set this equal to PD dot read underscore JSON since this is JSON on the URL if it was CSV then you'd want to use read CSV and so on so now I can just paste in that URL there and now let's just run that cell and we can see that we didn't get any errors so let me now I look at the head of our data frame here and we can see that I do have my sample post here these are the sample posts that I used on that website series so depending on the data in that URL you should be able to use the methods that we've seen to load in data from a URL just like we did here now before we end here I would like to

### [29:58](https://www.youtube.com/watch?v=N6hyN6BW6ao&t=1798s) Brilliant

thank the sponsor of this video and that is brilliant I really enjoy the tutorials that brilliant provides and would definitely recommend checking them out brilliant is a problem-solving website that helps you understand underlying concepts by actively working through guided lessons and brilliant would be an excellent way to supplement what you learn here with their hands-on courses they have some excellent courses and lessons on data science that do a deep dive on how to think about and analyze data correctly so if you're watching my panda series because you're getting into the data science field then I would highly recommend also checking out brilliant and seeing what other data science skills you can learn they even use Python in their statistics course and will quiz you on how to correctly analyze the data within the language they're guided lessons will challenge you but you'll also have the ability to get hints or even solutions if you need them it's really tailored towards understanding the material so to support my channel and learn more about brilliant you can go to brilliant org /c ms2 sign up for free and also the first 200 people to go to that link will get 20% off the annual premium subscription and you can find that link in the description section below again that's brilliant dot org forge /c m/s ok so I think that's going to

### [31:10](https://www.youtube.com/watch?v=N6hyN6BW6ao&t=1870s) Outro

do it for this pain this video I hope you feel like you got a good idea for how to read and write data from multiple different sources what we covered here should cover the vast majority of file formats that you're going to be seeing and using in the data science field now I'm probably going to take a break from this Panda series after this video and do a few one-off videos that I've been wanting to cover but I know that there are a lot of topics and pandas left to cover and I will get around to those more advanced topics in future videos but in the meantime if you'd like a good source for learning pandas then I would highly recommend checking out the channel data school that's run by Kevin Marcum and he's done the pandas tutorials at PyCon for several years now he didn't you know asked me to suggest his channel or anything like that I just think that he does a good job and his channel was actually completely devoted to penas and data science so he's already covered some of the more advanced topics that I do plan to cover in future videos but if anyone has any questions about what we covered in this video then feel free to ask in the comment section below and I'll do my best to answer those and if you enjoy these tutorials and would like to support them then there are several ways you can do that the easiest ways is simply like the video and give it a thumbs up and also it's a huge help to share these videos with anyone who you think would find them useful and if you have the means you can contribute through patreon and there's a link to that page in the description section below be sure to subscribe for future videos and thank you all for watching

---
*Источник: https://ekstraktznaniy.ru/video/11756*