# Matplotlib Tutorial (Part 2): Bar Charts and Analyzing Data from CSVs

## Метаданные

- **Канал:** Corey Schafer
- **YouTube:** https://www.youtube.com/watch?v=nKxLfUrkLE8
- **Дата:** 11.06.2019
- **Длительность:** 34:25
- **Просмотры:** 380,493
- **Источник:** https://ekstraktznaniy.ru/video/11848

## Описание

In this video, we will be learning how to create bar charts in Matplotlib.

This video is sponsored by Brilliant. Go to https://brilliant.org/cms to sign up for free. Be one of the first 200 people to sign up with this link and get 20% off your premium subscription.

In this Python Programming video, we will be learning how to create bar charts in Matplotlib. Bar charts are great for visualizing your data in a way where you can clearly see the total values for each category. We'll learn how to create basic bar charts, bar charts with side-by-side bars, and also horizontal bar charts. We will also learn how to load our data from a CSV file instead of having it directly in our script. Let's get started...

The code from this video (with added logging) can be found at:
http://bit.ly/Matplotlib-02

CSV Tutorial - https://youtu.be/q5uM4VKywbA

✅ Support My Channel Through Patreon:
https://www.patreon.com/coreyms

✅ Become a Channel Member:
https://www.youtube.com/channel/UCCezIgC97PvUuR4_gb

## Транскрипт

### Introduction []

hey there how's it going everybody in this video we're going to continue learning about matplot lib and seeing how to create some different types of charts uh specifically we're going to be looking at bar charts in this video we're also going to see how to load in data from a CSV instead of just having our data directly within our python script because most likely when you're plotting data the data is going to be coming from another source like a CSV file now I would like to mention that we do have a sponsor for this series of videos and that is brilliant. org so I really want to thank brilliant for sponsoring the series and it would be great if you all check them out using the link in the description section below and support the sponsors and I'll talk more about their services in just a bit so with that said let's go ahead and get started okay so in the last video we learned the basics of matplot lib and how to plot some data and customize our plots in different ways I have a strip down version of the code that we wrote in that video open up here in my editor and I'll have a link to this code in the description section below if you'd like to follow along but just in case you're

### Code Overview [0:54]

not continuing from a previous video let me go over this code really quick so first we are importing a plot up here at the top P plot from Matt plot lib we are using a 538 style for our plots our ages here this is our x axis it's just a list of numbers uh Dev y this is the values that are going to be on our y AIS and here we are plotting out that data so we're plotting out uh our X values which are the ages uh the Y values which is our Dev y here we're giving it a custom color and a label and I've got some commented out code right here all of this data is median salaries for different ages uh so this is for developers in general uh this is for python developers here this is for JavaScript developers here but I've got those commented out for now uh we are also uh putting a legend on our plot giving it a title X and Y label uh giving it a tight layout which is helps with the padding and then lastly we are

### Bar Charts [1:56]

showing it so when we plotted our data in the last video we used this PLT do plot method and when you use the plot method it will use a line plot by default so if we run this then we'll see something kind of similar to what we saw at the end of the last video so we can see that we get a line plot here for the median salary of developers and again this is some data that I took from the annual stack Overflow developer survey but let's say that we wanted to show this as a bar chart instead well to do that we can simply use The Bar Method instead of the plot method so if I just change this to use bar instead of plot uh then we'll have a Bar Method or a bar plot sorry and just like that plot method we can pass in our X values first for our X values and the Y values for our y AIS and additional parameters here can be passed in as well like color and label so I'm just going to leave that as is uh just like it was with the plot method and if I run this then we can see that now this is plotting our data and it's represented as a bar chart instead okay so that is plotting the data for all developers who answered the survey so like I said I also have the data for Python and JavaScript developers as well and right now those are commented out uh so what if I wanted to include those in our bar chart well first of all you can mix and match some plots so if for some reason you wanted the python and JavaScript data to remain as line plots and just over lay that onto our bar chart then we could simply uncommon out our code here and we could just run these as plots and that will actually overlay uh line plots on top of our bar plot now that doesn't make much sense in this situation but depending on your data you might find that useful um okay but what if we wanted to include these in our bar chart as bars side by side with the other data so you might think that we uh could do this just like we did our line plots and just run those using The Bar Method as well but that's actually going to give us some issues so let's try that real quick and see what that does so I'm going to change these to use bar so PLT doar so I'm going to run that and we can see that this doesn't quite look right uh we can't even see the data for all of the developers and the data for Python and JavaScript is overlapping so how can we put these Side by side because right now they're just all stacked on top of each other uh so we can do this by offsetting the X values each time we plot some data now I actually think this is a lot harder than it should be uh it seems a bit hacky in my opinion but this is just

### Numpy [4:45]

how we have to do it so to do this we're going to have to import numpy and use that to grab a range of values for our x-axis now if you've never used numpy before then don't worry too much about it it's just going to use one simple function now I believe numpy should be installed when you install matplot lib so we should just be able to import it without doing any additional installs so up here at the top I'm going to say import numpy and I'm going to import that as MP that's a convention there when using numpy is to import it as MP and now below our X values here where we have our ages X I'm going to create a range from these values so I'm going to say xcore indexes and I'm going to set this equal to mp. a range and I'm going to pass in the length of our ages X list here and what that's going to do is it's going to create a variable called X indexes and that is an array of values and those values are going to be a numbered version of our X values so basically it's a lot like having a list with an index starting at zero and counting up to our last item but instead it's a numpy array so once we have that we're going to use that for our X values within our bar chart method so I'm going to copy that and instead of using our ages here I'm instead going to use those X indexes so I paste those X

### XIndex [6:13]

indexes into each of our bar methods here so if I were to run this right now then it would look very similar to what we had before but now we're just using those indexes instead but now that we're using these indexes we can actually shift the Loc of these by adding or subtracting to our values here so if we think about it they're all stacked up on top of each other right now so let's shift our first bar to the left and the second bar to the right but how far do we actually want to shift these well we want to shift them by the exact width of a bar so to do this uh it would be nice if we specify an exact width for our bars so that this is explicit uh I believe that they have a default width of like 0 or something like that but just to be sure let's create our own width variable so up here underneath X indexes I'm going to create a width and set this equal to 0. 25 and I think the default of 0. 8 is going to be a little thick with three bars uh being side by side so I think 0. 25 would be good here and but you can experiment with these different widths if you'd like to get different looks depending on your data so now that we have a width let's subtract that width from our first plotted values and we'll add that width to our last plotted values and that should uh shift those bars to all be side by side so with our

### Width [7:40]

first bar plot here which is right here we are going to say x indexes minus width then for our second bar chart we're not going to do anything because that's going to be in the middle and then for our last bar chart we'll say plus width uh since we want that to shift over to the right and lastly before we plot this we're actually going to need to tell our plot that we want the width of the bars to be equal to the width variable that we just created and we can do that just by passing in a another variable here so right before color on all these I'm going to add a width oops let me spell that right width equal to width and I did that for all three of these bar methods so width equals width here and there so now that we've done that if we run our code here then now we can see if I make this a little larger here we can see that now our bar chart has these all lined up side by side instead of being stacked on top of each other like they were before now if you have more or less bars that you need to fit side by side then you'll have to adjust the offsets accordingly uh for the number of bars that you have uh the way that I did this was with three but if you added another bar then You' need to do an offset with the width uh added twice and so on one now also if we look at our x-axis down here we can see that we no longer have the age ranges that we had before it's using the indexes uh since that's what we needed to do our offset so to fix this let's go back to our code so I'm going to shut

### Labels [9:15]

that down and down here towards the bottom we're going to need to use an xti label to change the labels so right here above the title I'm going to say PLT dox tix oops let me spell that right so within this x tix method we need to pass in a couple of arguments so I'm going to say tick is equal to and those ticks are equal to the X indexes now the labels for those ticks are going to be equal to our ages list here so we are using those X indexes for the ticks and the labels which are all of our ages that we saw before in the last video we're going to use that for our labels so now if I run that then we can see that now our plot uh has our xais labeled correctly Okay so we've looked here at vertical bar charts and how to add multiple different bars to that plot and in a minute we're going to look at how to create horizontal bar charts but first I want to load in some data that's more appropriate for a horizontal chart you usually want to use horizontal bar charts when you have a lot of data and it looks too crowded in a vertical plot and the data that I want to load in is going to be from a CSV file so far we've only used data that has been directly in our python script but most of the time you're going to be uh likely using data from external sources like a CSV file and sometimes you're going to need to work with that data a little bit before it's actually ready to be graphed so first let me get rid of the data that we've been using so that we can make room for data that will load in from our CSV file so I'm going to remove all the way from our PLT xtic there I'm going to go up all the way to our ages and remove all of that data and for now I'm also going to

### Open CSV file [11:07]

comment out our plot titles and plot show and things like that and now let me open the CSV file and show you what this looks like so I have this open here in my current directory and like I said all of this is going to be available for download in the description section below uh if you want to follow along so this is the CSV file that I'm going to be loading in here so this is also data from that stack Overflow developer survey but I cleaned it up a little bit and only grabbed the data for the programming languages uh respondents said that they worked with so we can see that the top line here tells us what information this is so this First Column here is the responder ID so these are just IDs for each person who answered the survey uh and the languages worked with these are the languages that specific person said they knew so this first person here said that they knew HTML CSS Java JavaScript and Python and we can see that these languages are all uh delineated by a semicolon here um so each line here uh has all these different languages and using these we can uh graph the most popular programming languages from that survey so let me go back to my script here and like I said let's say that we wanted to create a bar chart of the most popular prog programming languages that people said that they work with so first let's grab the data from that CSV file now there are multiple ways that we can load in a CSV file we could use the CSV module from the standard Library we could use the read CSV method from pandas uh we could also use the load txt method from numpy uh now first let's use

### Read CSV from Standard Library [12:49]

the CSV module from the standard library for since uh most people are probably familiar with that but then I'm also going to show you a faster way using pandas and that read CSV method so first let's use the standard library to do this so at the top here I'm going to import CSV and now I'm going to read that file using the CSV module now if you don't know how to work with CSV files using the CSV module from the standard library then I do have a detailed video specifically on that so I'll be sure uh to leave a link to that video in the description section below if anyone is interested Okay so the way that we can read this in is I can say with open and we want to open that file was called data. csv and it's in the same directory as this script so I don't have to specify a full path and now we can just say as CSV file use this CSV module to read this in so I'm going to say CSV reader is equal to and I'm going to use the dictionary reader method from the CSV module to read in this CSV data uh the dictionary reader actually makes a dictionary where we can access the values by key instead of by index and I find that pretty helpful so to do that is CSV do dict reader and now we just want to pass in that CSV file okay so now we should have that CSV data in our CSV reader variable and this is an iterator that we can Loop over now I don't want to Loop over all of these right now because I think there are like 90,000 rows in that data there so instead let me just print out the first row so that we can kind of see what this looks like and I can grab that first row by saying row is equal to next CSV reader and that will grab uh that first line from that iterator and now let's print that out so I'll print out row so if I save that and run it let me make my output a little larger here okay so we can see that this is an order dictionary and the keys are what we saw as the headers and the CSV file and the values are the responses for that particular person so like I said we want to plot the most popular programming languages uh so those are within the key languages worked with right here so let me just print out that key instead of printing out that entire row so if I save that and run it then we can see that now we

### Clean up Data [15:20]

get those languages and like I said these are deiminated by semicolons here so to clean this up a bit and turn this into a list of languages we can actually split the values on that semicolon uh by saying after we access that key we can simply say do spit and split on those semicolons so if I save that and run it then now we can see that we have a python list of those languages so sometimes you're going to run into Data that you need to clean up or analyze a bit before you're actually able to plot the data that you want so that's why I'm showing that process here so in our case we want to plot the most po programming languages from the results of this survey so we need to keep a count of each language uh that each respondent said that they work with so there are a lot of different ways that we could do this as well uh we could keep a list and count them at the end uh we could keep a dictionary and update the counts of that dictionary each time uh but this is actually so common uh that python has a built-in class for this uh kind of thing called counter and it's definitely the best way to do something like this now if you don't know how Counters work they can be extremely helpful and I plan on making a video specifically about counters in the near future but I haven't put one together just yet uh so first let me show you how uh a quick example of how counters actually work so

### Counters [16:44]

let me open up my terminal here and I'm going to run Python and let me show you how Counters work here really quick so to import these I'm going to say from collections import counter they are from the collections mod modle and now that we have a counter I'm going to say C is equal to counter and I'm going to pass in a list here so of python and I'll also pass in a uh of python and JavaScript those two values in my list so if I look at that counter we can see that this says okay I have a counter here I have a key of python and that's currently set to one I have a key of JavaScript and that's currently set to one so it's keeping count of how often it sees these values so to update this counter I can simply say c. update and now I'm going to pass in a new list so this new list let's say this time I say uh C++ whoops C++ and python okay so now let me look at this counter so now when we look at the counter we can see okay now python is two because it's seen python twice uh we saw it up here when we first created the counter and we saw it up saw it right here when we updated the counter it's still only seen JavaScript one time the first time we created it and it's only seen C++ one time so now let's do an update one more time so if I run that update statement again with C++ and Python and then look at our counter again now it's saying okay I've seen python three times C++ twice JavaScript once so this is what we're going to use to keep track of these languages so at the top of my script let me exit out of python here I hope that all made sense to you uh because these are the kind of things that you need to do sometimes when you clean up data for plotting okay so I'm going to close down that output now up here at the top of my script uh I'm going to import that counter so again that's from collections import counter spell that right um okay now I'm going to instantiate a new counter right after we read in our CSV data so right above our row here I'm going to make a variable and I'm going to call this language counter and set that equal to an empty counter so right now we only have the data for a single row but we want to grab the exact same list of languages from every row so in order to do this we can copy what we've already printed out here this big long thing here is what got us that list of languages from that single row so let's copy that and now we

### Updating Counters [19:24]

can Loop over all of the rows of our CS SV data and update our counter with the data uh that is within this list here so I'm going to say four row in CSV reader and this will Loop over every Row in that CSV file and I'm going to say language counter. update and we want to update that with that list of languages for every single row so I'm going to paste that in and this section here is what's going to give us those list of languages so now our langu counter should get updated with all those languages okay so now let's print out our language counter to see if it looks like we have some coherent data and I'm going to do this back on the main level of the Python script outside of this with context manager here so above our pt. tile I'm going to print out language counter so let's run that and it looks like we've got some good data here okay so since this is a counter it should print out sorted with the most responses at the beginning so we can see here that we have JavaScript with 59,000 HTML CSS 55 SQL 47 python 36,000 Java 35,000 and so on now we can see that there are a lot of programming languages here if I remember correctly I think there are 28 total here um so we probably don't want to plot all of these so let's say that we just wanted the 15 most common languages well the great thing about using a counter like we did here uh is that it actually has a most common method built in uh to do this for us so whenever I'm printing this out I

### Plotting Data [21:07]

could say print language counter. most common and just pass in a 15 and if I run that then that is the 15 most common responses and that most common method actually returned a list here and each item in this list is a tupal uh containing so this is one tupal here it's containing the language and the count so now let's try to plot this data so how would we do this well first we need to split out the languages into their own list and these corresponding counts so when we did our previous bar charts we had our X and Y AIS so we'll want all of our languages on one axis and the counts on another so that's why we need to split those up so there are also a couple ways that we can do this now let me show you a way that takes a little bit more code but I think is going to be uh where most everyone will be able to read it so to do this I'm just going to overwrite this line here actually I will uh keep that there for now but above this line I'm just going to say languages and set this as an empty list and then I'll say popularity that's going to be uh for the numbers so we want the languages in this list and the corresponding popularity in this list so now let's Loop over all those tupal that we got back from this most common method so I'll say for item in language counter. most common whoops and let me sorry let me uh go to the next line here and remember this is going to be looping over a list of tupal and the first value of that tupal is going to be the language and the second value popularity so I'll just say languages. append item index of zero to grab that first item and appin that to our languages and we want to appin the second item to our popularity so now if I print out our languages and our popularity languages print popularity save that and run it then we can see that now we have one list here that is all of our top 15 most common languages and the second list here is the corresponding popularity of that language according to that survey so now we can actually use these two lists for our plot now there's actually a way of doing this whole section right here that with a oneliner using the zip function and unpacking values and things like that but I wasn't sure how many people would find that confusing so I think it's easier to read this way so I just decided to uh do it this way instead okay so now that we have these

### Creating a Bar Chart [23:55]

lists here let me uh exit that output there and I'm also going to um get rid of those print statements so now that we have these lists let's plot these just like we did before so to do that we can just say PLT dobar because we want to make a bar chart here and on our x-axis we're going to plot the languages and on the y- axis let's plot the popularity and let's also uncommon out our titles and labels here and let's change those to match what we're actually plotting uh so instead of Med in salary I'm going to typee in uh let's just say most popular languages uh spelled that wrong that's okay for the xlabel here I can just say our xlabel is the programming languages so I'll say programming languages and for the Y label here I'll say number of uh people who use okay so now with that in place let me save that and run this and let's take a look at our chart now we can see right off the bat when we have this many items it's hard to see all of these using a vertical bar chart like we did here when you have a lot of items then it might be more readable to use a horizontal bar chart instead and we can do that easily just by changing our Bar Method to a bar H method so

### Horizontal Bar Chart [25:19]

right here where we're saying bar I'm going to change that and say barh so now we can leave our arguments exactly as they are because the horizontal chart expects the Y AIS values first uh so we'll just keep our languages there uh now we will have to change our access labels here because those are going to be different now so I'm just going to switch the X and Y labels here real quick so I'm just going to have programming languages as our y Lael number of people who use as our xlabel okay and now I think that's about it and actually now that I think about it I don't even think that we need this y label telling us that these are programming languages uh that's pretty self-evident since the names of the programming languages are actually the labels themselves so I'm just going to get rid of that that's one thing with plots is uh it's nice to be descriptive but you can also be overly descriptive so I'm going to get rid of that actually just let me comment it out instead okay so now let me run this and now we can see that we have oops a vertical bar chart here let me open this back up make this a little larger okay so what I'm to say is we have a horizontal bar chart here um so we can see that this is much easier to read with a lot of values and those aren't scrunch together like they were in that vertical bar chart uh so whenever you're plotting things out if you've got a lot of values to plot with a bar then it might be a good idea to use a horizontal uh for this type of thing now one thing here is that with a horizontal bar chart maybe you want the most popular language right now it's down here at the bottom maybe we want that at the top since we read from the top down so to do this we could simply just reverse the list that we're passing into the bar H method uh before we actually plot it so I'm going to close

### Reverse Method [27:06]

that down and now up here before that bar H method I'm simply going to say languages. reverse and popularity do reverse and the reverse method on a list actually reverses those in place so we don't need to uh set languages equal to this or anything like that it's actually going to modify that list in place so now if I save that and run it then now we can see that we have the most popular languages up top and I think that looks a lot better now I did say that I was going to show you a faster way to load in that data from the CSV using pandas so let me show you how to do that because for the rest of the series I'm probably going to use pandis to load in data since it's a bit faster and it's also a bit cleaner so first of all if we don't have pandis installed then we'll need to do that and it's really easy to install uh so first let

### Installing pandas [28:00]

me install that I'll just open up my terminal here and clear this out and we can just install that using pip by saying pip install pandas whoops got the wrong uh spelling there pip install pandas and now once that's installed we will need to I'm just going to assume that installs correctly and it did

### Importing pandas [28:21]

okay so back here in our script up here at the top we need to import this so I'm just going to say import pandis as PD that's another convention when you're using pandas is to import it as PD okay so up here at the top of our

### Reading data with pandas [28:35]

file instead of opening our file and using the dict reader method to read in the data we can instead replace that with a pandis method so now instead of doing it like this uh we can simply say so I'm going to get rid of this with context manager here and since we got rid of that context manager I'll unindent these other lines here but now where we were opening that file instead I can simply say data is equal to pd. reor CSV and pass in the name of that CSV file and it was data. CSV and now I can specify some columns so I'm going to say that the ID I'm going to create this ID's variable and I'm going to see ID is the let me see exactly what that column name was responder ID so I'll pass in responder ID there so that's going to set this ID's variable equal to uh all of the IDS in that responder ID column and we can do the same thing with the languages so I'll call this variable langore responses is equal to uh data and we want the key to be languages worked with so I'll grab that so we still want our language counter but now here for our Loop instead of saying for row and CSV reader this doesn't exist anymore now we have this list of languages here so I can just say for response in Lang responses update that counter so that simple update to our code there uh should work exactly the way that we that it worked before so if I save this and I run it then whoops uh name row is not defined okay so yeah I got an error here that says name row is not defined I also meant to update this section here because there's no row anymore uh so we just want to split the response instead so response. spit because remember these

### Split lang responses [30:37]

Lang responses here uh When We're looping through these each response is going to be this entire section here of all of the languages so we can simply just split that response okay so I'll save that and run it and this should work exactly like it worked before and we can see that it does that looks pretty good now like I was saying before this is actually real world data that I grabbed from their actual survey and I actually have those charts that stack Overflow put together when they analyze their survey data so let me open those up and see if we got similar results so I'm going to put my chart here on the right and their chart I have open here in the browser so let me open that up

### Comparing results [31:20]

okay so here is their chart plotting out the exact same thing that we just plotted now there could be some small differences here based on how i s sanitize the data compared to how they sanitized it uh but you can see that as far as the order goes uh we got the same results uh they've also styled their plot a bit further but with a little customization we could probably get something very similar uh so it looks like we just need to change up the colors a bit and add in a little uh spacing and also make these lines a little thinner and it would almost be identical so that's why learning things like this can be extremely useful uh because these companies are constantly looking for people who can analyze their data and present it in ways that can give insights like this so this is definitely a skill that you're going to be able to apply to a lot of different situations just like we did here okay so before we end I'd like to mention the sponsor of this video and that is brilliant. org brilliant is a problemsolving website that helps you understand underlying concepts by actively working through guided lessons they have computer science courses ranging from algorithms and data structures to machine learning and neural networks they even have a coding environment built into their website so that you can run code directly in the browser and that's a great way to complement watching my tutorials because you can apply what you've learned in their active problem solving environment and that helps to solidify that knowledge their guided lessons will challenge you but you also have the ability to get hints or even Solutions if you need them it's really tailored towards understanding that material so their computer science material is fantastic and I really like what they're doing they also have plenty of courses depending on what you're most interested in so they have courses in different fields of mathematics or astronomy solar energy computational biology and all kinds of other great content so to support my channel and learn more about brilliant you can go to brilliant. org CMS to sign up for free and also the first 200 people that go to that link will get 20% off the annual premium subscription and you can find that link in the description section below and again that's brilliant. org CMS okay so I think that is going to do it for this video hopefully you feel a bit more comfortable working with M plot lib and how you can pluck out the data that you need and create types of charts that you'd like uh in this video we covered bar charts but in the next video we're going to learn how to create pie charts and pie charts are great for seeing how our data is proportioned and quickly visualize what different categories make up large and small pieces of your data so be sure to check that out but if anyone has any questions about what we covered in this video then feel free to ask in the comment section below and I'll do my best to answer those and if you enjoy these tutorials and would like to support them then there are several ways you can do that the easiest way is to Simply like the video and give it a thumbs up and also it's your huge help to share these videos with anyone who you think would find them useful and if you have the means you can contribute through patreon and there's a link to that page in the description section below be sure to subscribe for future videos and thank you all for watching
