# Python Data Science Tutorial: Analyzing the 2019 Stack Overflow Developer Survey

## Метаданные

- **Канал:** Corey Schafer
- **YouTube:** https://www.youtube.com/watch?v=_P7X8tMplsw
- **Дата:** 17.10.2019
- **Длительность:** 51:27
- **Просмотры:** 65,607
- **Источник:** https://ekstraktznaniy.ru/video/11789

## Описание

In this video, we will be learning how to use analyze survey data in Python.

This video is sponsored by Brilliant. Go to https://brilliant.org/cms to sign up for free. Be one of the first 200 people to sign up with this link and get 20% off your premium subscription.

In this Python Programming video, we will be learning how to download and analyze real-world data from the 2019 Stack Overflow Developer Survey. This is terrific practice for anyone getting into the data science field. We will learn different ways to analyze this data and also some best practices. Let's get started...

The code from this video can be found at:
http://bit.ly/SO-Survey-2019

CSV Tutorial - https://youtu.be/q5uM4VKywbA
Jupyter Notebooks Tutorial - https://youtu.be/HW29067qVWk

✅ Support My Channel Through Patreon:
https://www.patreon.com/coreyms

✅ Become a Channel Member:
https://www.youtube.com/channel/UCCezIgC97PvUuR4_gbFUs5g/join

✅ One-Time Contribution Through PayPal:
https://goo.gl/649HFY

✅ Cryptocu

## Транскрипт

### Introduction []

hey there how's it going everybody so in the matplotlib series that I recently released I mentioned several times in that series that I had taken that data from the 2019 stack overflow developer survey so in this video I want to show you how you'd go about downloading that raw data from the survey how we can explore that data to see what's actually in there and how we can write our own scripts to make some calculations and perform some analysis on that data so that you can pull out that information that you're looking for so this is going to be great practice for anyone who's wanting to learn more about data science so this is likely something very common that you'll be doing downloading data seeing how the data is organized and analyzing that data with Python now I would like to mention that we do have a sponsor for this video and that is brilliant org so I'd really like to thank brilliant for sponsoring this video and it would be great if you all could go and check them out using the link in the description section below and support the sponsors and I'll talk more about their services in just a bit so with that said let's go ahead and get started so when the stack overflow developer survey results were first released I saw them posted to Reddit on the programming subreddit and I was reading the comments and this one popped out to me it had a hundred and fifty nine up folks here so we can see that this guy seems frustrated and commented another effing survey that fails to report language popularity by developer type many of those languages are never or almost never used in the embedded world so I don't want to give this person or the people who voted that comment too much pushback but there are tons of specific questions that could be asked about that data so it's a bit hard for Stack Overflow to anticipate everything that people want to know about the data but luckily the data is available for free and we can do that analysis ourselves so if the person who made that frustrated comment happens to be watching this video then you're in luck because at some point I'll also do the analysis that you were looking for where we break down the popular languages by developer type so let's get started and see how to do this so I'm going to open up Google here and first of all let's

### Download the data [2:00]

Google Stack Overflow developer survey results and the first link here should take us to the latest survey now this page actually takes you to the page where they formed some of their own analysis on the data so you can look through and see what they've already analyzed so if we scroll through here then we can see that they break down some of the survey data by geography some developer types that answer the survey and things like that but what we want is the actual data so that we can do some analysis ourselves so if we scroll back up to the top here and we read this first couple of paragraphs here down here at the bottom right here close to the bottom of these first paragraphs it says do you want to dive into the results yourself the results of the survey are available for download here now on this page here you can download the data in CSV form for any year that they have available they organize these CSVs really well now but if you try to download the 2011 data then it's pretty disorganized how they organized it back then so there might be some data cleanup depending on what year you're downloading but I'm going to go ahead and download the 2019 data which is right here so I'm going to download the CSV file and once that is downloaded whoops let's try to redo that it looks like that download didn't work okay and it looks like the download worked that time so now I'm going to open up my

### Analyzing the data [3:30]

downloads folder here and now I'm just going to unzip the zip file that we downloaded so I am extracting those okay so once that's downloaded and unzipped I'm going to go ahead and drag that folder to a folder on my desktop and that's where I'll write a script to analyze the data so I have a folder here on my desktop called Stack Overflow demo so I'm going to drag this over here and we can see that this says developer survey 2019 that's a little long so I'm just going to rename that to data okay so what files are in the directory that we just unzipped so let me close that down and make this a little larger here and let's look at the files that are here within this directory okay so first of all if you download data that comes with a readme file then that's usually really helpful so if we open this up then this will tell us all about the data so let me open up my sublime text here okay and I currently have my word rap turned off so let me turn that on so this is a little easier to read okay so this readme file will tell us about the data here so we can see here that we have these three files here we have survey results public survey results schema CSV and then we have this s Oh survey 2019 PDF file so the survey results public we can see that it says a CVS file with the main survey results one respondent per row and one column per answer the survey results schema is the CSV file with the survey schema ie the questions that correspond to each column name and lastly we have the Esso survey 20:19 dot PDF file and that's a PDF of the actual survey if you want to see how an exact question looked on the survey itself okay so now we know that the survey results public CSV contains these survey results and the survey results schema has the questions that correspond to each column name in the survey results so let me open up both of these to show you what this means so I'm going to use libre office so that it's a bit easier to see everything but you can use any program that you'd like okay so I will open this one here using libre office and these are the survey results public and I will also open the survey results schema as well and this might take a second to open up since there are so many survey results okay so I have the survey

### Viewing the data [6:10]

results public CSV file open here and let me make this text a little larger here just in case it is difficult for anyone to see so I will just bump that up to 12 and I think that should be good okay so like we saw in the readme file this file contains the main survey results with one respondent per row and one column per answer so if we look at the first row here we have a respondent number one and let me make this a little larger here so that we can see this complete column name here so we can see that we have a respondent and then we have respondent 1 as the first row and this row is that respondents answers to the survey so let's look through here and find a question and an answer so we can see here that one of the first columns is hobbiest so we have this hobbiest column and their answer to hobbiest was yes so we can probably understand pretty easily what hobbiest means but for some of these columns they're a bit more vague so to understand what all of these columns mean we need to look at our other CSV file that is the schema so let me switch over to the schema CSV file and again I will make this text a little larger so everyone can see and again in the readme file we saw that this schema gives us the questions that correspond to the column names so if we find hobbyists down here it's on row 4 we can see that for hobbyist that the question that the respondent actually answered was do you code as a hobby okay so now that we know how to find all of the questions and answers now we can use Python to analyze this data in just about any way that we'd like all we need to do is think of a question that we'd like answered and then find out a way to parse that information out of the data that we have here so let's start off simple at first and then we'll look at some more complicated examples so to start off let's just see what percentage of people who answered this developer survey said that they actually code as a hobby so first off we'll need to create a new Python script and I'll just do that here within my editor so I'm going to open up sublime text here and here in my editor I already have my folder open

### Importing the data [8:30]

on my desktop that I called Stack Overflow demo and I already have the data here that we downloaded so we can see here that we have the readme the PDF file and those CSV files within this data directory so now I'm going to create a Python script here and I'm just going to call this a stack overflow demo dot pie okay so first since we're working with CSV files let's import the CSV module so that we can load in our survey results now if you're familiar with pandas then we could use pandas to analyze this data also but I'm not going to do that in this video because pandas is a big subject to itself so I don't want to do anything using that library that might be confusing to anyone watching who doesn't know how to use that library yet I'm actually working on a panda series right now so though for those of you who are interested I can do some analysis on the same data using pandas in a future video but for now we'll just use the built-in CSV module to load in this data now if you are also unfamiliar with the built-in CSV module then I do have a video on that already and I'll be sure to leave a link to that video in the description section below if anyone would like to brush up on that ok so let's go ahead and import that CSV module so we can load this data in the Python so I'm going to say import CSV and let me actually maximize this so that we can take up the entire screen here okay so now let's load in our data

### Loading the data [10:00]

so to do this we can just say with open and remember I put that inside of a directory called data so within data we want to open the survey underscore results under score public dot CSV and we will open that just as F and now I'm going to use a CSV dict reader to load in that file so I'm going to say CSV reader is equal to CS v dot dict reader and we will load in that file so the dict reader from the csv module will load in our data as a list of ordered dictionaries so to see what this looks like let's just print out the first item of that list of objects which will be the first response in that survey now this isn't actually a list this behaves like a generator so we can't access them access the items using indexing it might be easier to work with this data just by casting this to a list but depending on the size of your data that might not always be an option so I'll leave this as a generator for now so we'll have to iterate over the items and to do this we can just say four line in CSV reader and now we will print out those lines and to be sure we only grab that first item I'm going to put in a break here so that the loop doesn't continue okay so now if we run this then we can see that we got one response here and this response is an order dictionary of the responses that were given for our first respondent of the survey and if we want to look at their answers for a certain question then we can access the column name for that question as a key to the dictionary so if we remember the column for the question do you code as a hobby was called hobbyist so let's access their answer to that question just by accessing that as a key so for our line we want to access that hobbyist column

### Running the code [12:10]

so if I run that then now we can see that we just get the response of yes and if we remove this break statement then it won't stop the loop after that first iteration and we'll loop through all of the responses to that answer in the survey so if I remove that break and now I run this code then we can see that it's looping through and printing out everyone's responses from that survey so there are a lot of responses here so it might take a second to print all of those out now I should mention that a lot of people like to use Jupiter notebooks to analyze data like this and there's a good reason for that Jupiter allows you to run single cells at a time so you can load in your data and run your analysis on the data without loading in the data every single time so I have a video on jupiter notebooks if anyone is interested but will be using a regular editor in this video where it just runs the whole script every time okay so I said we'd start off pretty simple here and just analyze what percentage of people who answer the survey said that they code as a hobby and how many of them didn't so to do this let's just track all of the yes and no responses from the survey and then we can calculate the percentages from there now there are several ways we can do this so let's walk through a few different ways that we can solve this answer now this is where I see a lot of people beat themselves up when it comes to programming so they might not be able to think of a solution to this question right away or if they do think of a solution then they might do something in a different way than I do it in these videos and if they do it in a different way sometimes they think that their way is wrong but that's not necessarily the case so let me quickly show you four different ways to solve this problem and if none of these jumped out to you immediately then that's fine too the longer that you're programming the easier that these solutions will kind of pop out in your head okay so I think that some people might be thinking about just having a variable for guess results and a variable for no results and increment those each time we see one of those responses so that would be one way to do it so let's do that first and then we'll look at some ways that we might be able to improve this so we can say right above here I'll say yes count is equal to zero and no now in order to keep count of these we can just have a conditional in our for loop that increments these by one each time we see a yes or a No so in here I'm going to overwrite this print line and I'm just going to say if line and we're going to access that hobbiest column if that answer is equal to yes then we will increment our yes count by one else or actually L if line is equal to no then we will increment that no count by one okay so now let's print out those counts and make sure that you print these outside of your loop because if you are inside the loop when you print this then it's going to print something out every time in increments so I will now print out our yes count and also let's print our no count so if I run this then again it's going to take a while to load in this data and we can see that we had 71 thousand roughly people who said that they do code as a hobby and around seventeen to eighteen thousand who said that they didn't so now let's calculate out these percentages so to calculate a percentage all we need to do is divide each of these by the total number of answers so first let's create a variable above for the total number of answers so right

### Creating a variable [15:58]

above our print statements here I'm going to say that our total is equal to and that's just going to be equal to our yes counts added to our no counts and now we can print out these percentages just by saying I'm just going to modify these existing print statements here and I'm going to say the yes count divided by the total and the no so now if I run this now we get these as percentages so we can see that about 80% of people who answer to the survey said that they code as a hobby and about 20% said that they didn't and if you're making a report or something on this data and wanted to clean this up a bit then we could multiply these numbers by 100 so that we actually have a whole number as percentages and we could also round these to the nearest two decimal places so let me do that really quick so I am

### Rounding the numbers [16:52]

going to actually create a variable here below total and I will just call this yes percent and I'll set that equal to and let me just grab this here I'll set that equal to the yes count divided by the total we'll multiply that by 100 I will do the same for the no count so let me grab that sorry they are made a couple of typos okay so no percentage is equal to no count divided by total times 100 so those will give us whole numbers here so instead of you know point zero or point eight zero one it'll give us 80 point one instead so now let's actually round those numbers as well since we don't really need this to be as accurate all the way out to these decimal places so I'm just going to do a new line here and I will say that our yes percent is equal to and now we will round our previous yes percent and let's round that just to two decimal places and let me also do this for the no percentage as well so I'll say no percentage is equal to the rounded version of the no count divided by 2 whoops and I didn't mean for that to be no count sorry I need to round that percentage to two decimal places okay and now let's print these out and I'll print F strings in order to add some information to what we're printing as well so within here I'm going to create an F string and I will say that the yes answers then I'll fill in a variable here and say that I want this to be the yes percentage and then I'm going to put in a % after that as well so let me copy that and do the same thing for no so I'll say that the no responses had a no percentage and then we will have the % there so now if I run this then we can see that this is a bit more cleaned up here so we can see that the yes responses are 80 point one seven and the no responses are nineteen point eight three okay so now we can see that we get some good information printed out here on how many people code as a hobby okay so this is one way to solve this problem and I said that I would show a couple of other ways that might improve on this now one way I could think that we could improve this is to use dictionaries to track the number of yes and no responses instead of the integer variables that we have already so let me show you what this would look like so let's scroll up here to where we initialized the yes count and the no count and instead of doing it this way let's actually use a dictionary for these instead so instead I'm gonna have a dictionary called counts and in my dictionary I'm going to say have a key for the yes results and I'm also going to have a key for the no results and I'm just going to initialize both of those with a value of zero so now we have a dictionary called counts where the keys of yes and no our keeping track of the yes and no responses now the good thing about using this approach is that we don't need to use this conditional inside of our loop anymore checking if the response is yes or no instead I can just replace this entire conditional so I will get rid of the entire conditional and I'm just going to say counts and then add in a key of our answer for the hobbiest question and I will just increment that key by one so let me explain this one more time here so since we have a dictionary here this line of hobbyists this is going to either be yes or no so if it's a yes then it's gonna access this yes key here in our counts and increment that by one if it's no then it's gonna increment that key by one and since we got rid of the yes count and the no count variables we need to replace where we had those so down here where we have yes count and no count I'm going to replace all of those and we just have two of them here so I'm gonna replace those with our counts dictionary and the value of the key yes I'm going to do the same thing with no count here we have two of those here so again I'm going to say counts and access the key of No so I think that is all the changes that we need to make if I run

### Using default dictionaries [21:20]

this then we can see that we still get the same results but this cleaned up our code a good bit since we got rid of those conditionals now one more little tip here if you ever find yourself initializing a dictionary with certain values like we did here where we have yes initialize to zero and no initialize to zero then it might be a good idea to use a default dictionary from the collection module instead so default dictionaries allow us to create a dictionary where we don't need to initialize values like this so let me

### Using counter [21:53]

import this and then we'll see how this works so I'll say from collections import and that is default dict and now instead of initializing our values like we did right here instead I'm just going to make this a default dictionary and I want to say that I want this to be a fault dictionary of integers and now our dictionary knows to expect integers as values to our keys and it will start at zero by default so now we should be able to run this just like we did before and we can see that now we get those same results but with our code cleaned up a little bit more now let me show you one last way that we could have done this now the reason I'm showing you multiple ways to solve one simple example is just because these are some of the same types of problems that you'll likely run into when analyzing data and it's nice to know what's available to you so there are many times when we run into problems where we're simply counting certain values and it's so common that there's actually a data type specifically for counting in Python and it's called counter and it's from the collections module as well so let me import this and we will see how this works so after our default dictionary I'm also going to import counter okay so now instead of setting this as a default dictionary here at the top of the file let's instead create a new counter and just with that small little change if we run this then it should look almost the same as what it was when we used our default dictionary so we can see that we got the same result here but the nice thing about using a counter is that we get some nice extra features such as being able to view the most common values with a single method and doing a few other operations that we wouldn't be able to with a default dictionary as easily okay so now that we've seen this easier example of just figuring out what percentage of people answered yes or no as to whether or not they code as a hobby now let's take a look at a more complicated example and find out the most popular programming languages among the developers who answered this survey so to do this we'll need to open up our schema CSV file again and see what column matches with the programming question so I'm going to open up the schema dot CSV file here and these are actually the results CSV okay here's the schema here now there are a lot of different questions here on this survey but just for the sake of keeping the video as short as I can I memorized where this question is that I'm looking for and it's down here on row 45 so let me find that and if you read this column in question then the column is called language worked with and the question on the survey was which of the following programming scripting and markup languages have you done extensive development work in over the past year and which do you want to work in over the next year if you both worked in the language and want to continue to do so then please check both boxes in that row so this was actually a two-part question on the survey where they chose the languages they're currently working with and also the languages that they want to work with next year and again if you'd like to see exactly how this question looked on the survey then you can open up that PDF file that was in our download and look at that directly so this language worked with column is going to be their answer to the languages that they are currently using at this time so let's go back to our script and see how we can analyze these most popular languages okay so first of

### Rename counter [25:35]

all I'm just going to go ahead and comment out everything here below our for loop and I'm going to we might use some of this in a bit but we just don't need it for now okay and I'm also going to rename our counter here I'm going to rename this to language counter since we are calculating the most popular programming languages and now I will instead say language counter and we don't want to use this hobbyists key here instead we let me say that was language worked with and again if you ever forget one of these then you can just look in those CSV files now before I actually print this out I'm going to comment out that for now and let's instead just print the first result of the survey just to see what one of these responses might look like so again I'm going to print out the answer for language worked with and then I'm going to put in a make statement so that we only get the first result of the survey okay so if I

### Split languages [26:45]

run this then we can see that we get the languages that that's that this specific person said that they work with so this person uses HTML CSS Java JavaScript and Python and we can see that these are separated by semicolons so this isn't going to be as easy as simply incrementing our counter with the value from this languages worked with field because we need to split those into individual languages first so to do this we can use the Python split method and split these on semicolons so instead of let me above our line here above our print I'm going to say that our languages are equal to and we want to take their answer to this language worked with question and we want to split that and let's split that on semicolons and now let's print out those languages so if I save that and run it now we can see that we have a list of those answers instead of the string separated by semicolons okay so now to find the most popular languages we can loop over this list of languages and increment every one of those with our counter so now above our print statement here I'm going to say for language whoops language in languages and now we will increment our languages here oops I actually want to grab this line here we will increment our language counter with that language and we'll just plus equals one on that and I'm actually not going to be using this outside of the loop here anymore since that would give us a false answer there so I'm going to get rid of that and now let's print out that language counter and make sure that it looks right just for this one person so I'm going to print out language counter

### Language counter [28:45]

and let's run this and we can see that we have a counter object here and we have one result for HTML CSS one result for Java one for Java Script one for python that's good now another nice thing about counter objects is that you can pass them a list of items and just have it increment the counter for each one of those items and we can do that with the update method so what we're doing here with this for loop right here where we are incrementing all of these values by one we can do the same thing as this simply by saying language counter dot update and just pass in that list of languages to that update method so now if I run that then we can see that we got the same results using that update method so now let's remove this Brinks break statement here and see what we get for everyone in the survey and I'll actually remove the print statement as well there and outside of this loop I'm going to print the language counter and again make sure that you are outside of your loop or you'll get something printed for every person in the survey so outside of the loop here I'm just going to print out our language counter I will save that and run it make this a little larger here okay so now we can see that we get our counts for how many people said they know each language and the cool thing with the counter is that we can use the most common method in order to see the most popular languages so if I wanted to just see the five most popular languages then I could down here on my print statement instead of printing that entire counter I could say print dot most underscore common and let's pass in a value of five to that method to say that we want the five most common languages so now if I run this

### Most common languages [30:39]

now we can see those top five programming languages okay so we can see how many people said they know each language and when we use the most common method there it returned a list of tuples and each tuple had the name of the language and the number of the people who knew that language and that'll be good to know in just a second because we're going to loop over these as well but before we do that just like we did before it would probably be better to have the and percentages instead of raw numbers so we can do this similar to how we did it before but first we're going to need to get the total of all the responses so that we can use that value to divide those numbers by before when we only had yes and no answers we just added them together to get the total and we were able to do that because the respondents could only respond with a yes or a No but here they can choose as many languages as they want so if we total all of those languages up then we would get a lot more than our total number of respondents because each respondent can have more than one language so instead we're going to need to keep a running total up in our loop where we are looping over the lines from our survey now if we had converted this CSV writter

### Converting CSV to list [31:57]

reader to a list then we could simply take the length of that list as our total but since this is a generator I think it'll be easier just to keep a running total so above our CSV reader line here or actually it's right below it doesn't matter I'm just going to say total is equal to zero and below our for loop here at the end I'm just going to say total plus equals one and now down here at the bottom let me uncomment out what we had here before and actually I can just get rid of that altogether because we've already calculated the total in our for loop okay so now let's loop over those most popular languages and we'll calculate the percentages for each one of these so we can say I'm going to copy out this these five most common languages here so we can say for language and well that's just language and value in our language counter dot most common and five now remember that returned a tuple where the first value was a language the second value was the number of people who said that they knew that language so within this loop I can take the same logic that we did before to calculate percentages and I can just say let me uncomment out these here and instead of yes percentage I'm going to call this language percentage and the value that we are dividing by our total here it is no longer our yes values instead it is going to be this value here that was in our language counter most common tuple and lastly let's go ahead and print this out so again I'm going to take some of the same logic and copy this down here that we used before and I will print this out and instead for our value here I'm just going to print out the language and then percentage okay so now if I run this whoops and it looks like I got an error here some of you probably noticed that as I was typing it out we are no longer using this yes percentage value here I forgot to replace that we wanted to round our language percentage to two decimal places and not that yes percentage so now if we run this then

### Stack Overflow Survey Results [34:22]

now we can see the percentages of people who said that they use these languages now if you actually go to the Stack Overflow survey results where they ran some of their own analysis on the data then their results are very similar to this but some of these are different by a tenth of a percent or so and I'm assuming that they used some different criteria for sanitizing their data and filtering out some bad answers but our results here are very similar to the official results on the Stack Overflow results as well so for example some people didn't put any languages down here which would have been an n/a result so Stack Overflow likely got rid of those and there's probably a couple of other forms of criteria that they put in there as well but these are very similar to the results that they got for their survey results as well okay so lastly now that we have the most popular languages programming languages for all the people who've responded to this survey now let's answer that question that we saw at the very beginning of the video and break down the most popular languages by developer type so let's think about how we want to do this so right now we're already getting the most popular languages for all of the people who respond to do the survey so the code is likely going to be similar but now we want to break up the most popular languages based on the type of developer they are so what I think that we should do here is to have a dictionary where each key is the developer type and the value for each key in that dictionary will be another dictionary so these will be nested dictionaries so I'm thinking that the dictionaries for each developer type will have two key value pairs they'll have a key for the total number of people in the survey who said that they work as that type of developer and also they'll have a language counter key that has a value that is a counter object with the language breakdown for that developer type which is similar to what we already have here now I know that might sound a little complicated just by hearing it but it should make more sense once we actually see this and again if you're solving this differently than I am then that's fine this is just one way that I thought of where we could solve this problem there might be even be some faster more efficient ways of solving this in a more functional way with comprehensions but this is how I'm going to solve this in this video so when you're trying to solve a problem like this I think it's extremely beneficial to break it down into small steps so right now I know that I want a dictionary with all of the developer types as keys so let's start off with that so up here above my language

### Coding the Solution [37:06]

counter I am going to create a variable called dev underscore type underscore info and I'm going to set this equal to an empty dictionary and right now I'm just going to remove this language counter here since we're no longer using that specific one and I'm also going to remove the total here since we are no longer using that specific total and now within our for loop here I'm going to comment out everything that we have in here so far as a matter of fact I'm going to go ahead and comment out everything below this loop so that we can just work on this small part of this solution for now okay so let me get back up here to being underneath this loop here okay so now let's just break this down one step at a time so right now we just have we just won a dictionary of all of the developer types now like we saw before we can look at the schema dot CSV file that we've seen before in this video to get the column name for this information but I went ahead and wrote this down earlier to save us some time so the column name that we want is called dev type and just like our programming languages this is a list of values separated by semicolons so if so people can also select multiple developer types that match their line of work so right now let's just split those developer types on a semicolon just like we did the programming languages and go ahead and loop over those so I'll go ahead and copy when we did this for languages and I'm just going to say dev underscore types is equal to and this is called dev type so we want to split those dev types on a semicolon and now I will say for dev type in dev types let's go ahead and create or say dev type info and remember we wanted this to be a dictionary with all the developer types so we will set that key just equal to an empty dictionary for now and now I'm going to print those keys from that dev type info dictionary to see if it looks like we got all of those developer types and remember print this outside of the loop so that it doesn't trigger that print statement multiple times so outside of the loop here I will say print and we want to or actually what we want to do is loop over those keys so I'll say for key in dev type info and we will print out each of those keys so now if I run this then our response should be all of the different development types that people filled in on the survey so if I scroll up here we can see okay there's that n a value which means that somebody didn't answer but we can see that we have desktop developers front-end developers designers back in full staff academic researchers all kinds of different types here we can even see that we have the embedded applications or devices that the person and was specifically asking about on that reddit post ok so right now we have a dictionary and all of these developer types here are the keys for that dictionary now we want the values of those dictionaries or these developer types to be dictionaries themselves and these nested dictionaries will have a key for the total number of people who said that they were this type of developer and they will also have a language counter of the count of the languages for that developer type now first we have to think about the loop

### Set Default [40:50]

where we are setting the values of the dev type keys if that key doesn't exist then we just want to create a new dictionary with the keys of total set to zero and language counter sent to a new empty counter if the key does exist then we just want to grab the current total and the language counter and update those accordingly so one way that we can do this in Python is by using the set default method so let me put this in and I'll explain a bit what it's doing so instead of setting the key like this here I'm instead going to say dev type info dot set default and now we want to set the default for our dev type key and if this doesn't have a value yet then now we are putting in our default value so our default value is just going to be a total set to zero and a language counter which is just an empty counter so let me go over this again just to make sure that this is clear so what set default does here is that it checks if we already have a value for the key of dev type and if we do then it just returns those values and leaves them unmodified and if it doesn't have a value for that key yet then it creates a new dictionary with the and this is the new dictionary here with the keys of total set to zero and the language counter set to a new counter instance so we now we can just update those values like we did before in our commented out section of code and we'll use these keys instead so let me scroll down to our commented out code here and this is where we were updating the different languages in the loop before so let me actually cut these out and paste them up here so now for each developer type in those dev types we still want to grab all of the languages that they work with and we still want to split those on a semicolon but now the language counter here that we are updating is no longer going to be the global language counter that we created out here instead we want it to be the language counter for this developer type so we want this to be dev type info and then access the dev type key here and then we want this to be that language counter key so now we are updating those languages for that language counter for that specific developer type and now the same thing for the total instead of this being a global total here instead this is going to be dev type info dev type and it's going to be that total key for that developer type okay so that should give us the values that we're looking for so now let's loop over all of those developer types and print out the top five languages and percentages using a similar approach to what we did before with our commented out code here at the

### Loop Over Developer Types [44:02]

bottom so now I'm still going to loop over our dev type info dictionary here but instead let me make this more clear instead of using from ki I'm gonna say or for for dev type and also let's access the values of the keys as well and so I'll call that info and now when we're accessing the key and the value we need to say dev type info dot items don't forget the dot items there because if you just loop over dev type in whoa it'll only loop over the keys to get both the key and the value you want to do dev info dot items there okay so first let's print out the developer type just to see what developer type these are languages for and now let me uncomment out everything else that we have here and I can get rid of these yes and no percentages here that's from awhile ago all we need now is our language counter so now when we're looping over this language counter here we want this to be the language counter for this specific developer type so that is going to be remember this info here the dictionary for that developer type so we want info we want to access that language counter key of that developer type and just one more change here instead of total we want this to actually be the total for that developer type as well so we will access the total key for that developer type as well and lastly I'm going to add a tab to our inner loop here where we are printing this out that way it visually separates this information for each developer type and we'll see what this looks like just in a bit when we print this out okay so that should be all of the changes that we need to make so if I run this let me make this a little larger here then if I

### Most Popular Programming Languages [45:57]

did this correctly now we should have the most popular the five most popular languages for each developer type so here at the bottom we can see marketing and sales professional that most of them knew HTML CSS that sounds correct but let's grow up and look at some of these and again you can filter out na up here in our loop if you want to but we can see that desktop use JavaScript HTML CSS front-end developers mostly use JavaScript almost 88% that makes sense JavaScript HTML CSS SQL are the top three that makes sense so designers are mostly HTML back-end developers Java Script full stack developers academic researchers are mostly Python so I believe that makes sense as well because pythons becoming very popular in the data science field and yeah we can see a lot of them here data Sciences specifically or machine learning specialists 80% Python database administrators mostly SQL that makes sense as well now let's actually go down to the embedded systems since that was the specific question that the person asked on reddit so if I keep scrolling down here system administrator we can see that bash made that list that makes sense scientist 70% Python oh I think I missed the embedded applications yes here it is right here okay so we can see that 60% of people who said that they worked in embedded applications new JavaScript 57% said HTML CSS now C++ made this list and it didn't make many other lists and that makes sense because C++ is a very popular language and embedded applications and then we also have SQL and Python here setting at roughly 51% okay so the calculation that we did here broken down by developer type is a bit more advanced than what we started out with but with the data that's available to us we can make just about any analysis that we'd like and this is great practice for anyone who's getting into data science because this is a good example of downloading some real world data and parsing out exactly the information that you'd want it's also gonna make it fun since this is a survey about the developer field it's kind of fun to just poke around in the numbers and see what we can find so for example we can parse out the median salaries for these developer types what programming languages have the highest job satisfaction what the most preferred development environment is for the different development types and languages all kinds of stuff like that to keep it interesting while learning how to do this stuff at the same time and like I said before this would probably be a bit easier if we were using a library like pandas but I didn't want to overwhelm anyone with that library who isn't familiar with it so once I release the pandas series then I'll re release an updated video doing the same thing that we did here but I'll use pandas to parse out this information instead now before we finish up here I'd like to mention the sponsor of this video and that is brilliant org so we've been talking a lot about data science in this video and how to analyze this data but to learn more about data science I would definitely recommend brilliant org so brilliant is a problem-solving website that helps you understand underlying concepts by actively working through guided lessons and they've recently added some brand-new interactive content that makes solving puzzles and challenges even more fun and hands-on and if you'd like to learn more about data science and programming with Python then I would recommend checking out their new probability course that covers everything from the basics to real-world applications and also fun things like casino games they even use Python in their statistics courses and will quiz you on how to correctly analyze the data within the language so they're guided lessons will challenge you but you also have the ability to get hints or even solutions if you need them it's really tailored towards understanding the material they even have a coding barment built into their website so that you can run code directly in the browser and that is a great compliment to watching my tutorials because you can apply what you've learned in their act of problem solving environment and that helps to solidify that knowledge so to support my channel and learn more about brilliant you can go to brilliant org ford slash CMS to sign up for free and also the first 200 people that go to that link will get 20% off the annual premium subscription and you can find that link in the description section below and again that is brilliant dot org ford slash CMS ok so I think that is going to do it for this video I hope you found a video like this helpful where we go over a real-world example of some data analysis using Python and like I said this is great practice for anyone looking to get into the field and it's also pretty fun just playing with the numbers and seeing what we get from that data but if anyone has any questions about will be covered in this video then feel free to ask in the comment section below and I'll do my best to answer those and if you enjoy these tutorials and would like to support them then there are several ways you can do that the easiest ways to simply like the video and give it a thumbs up and also to huge help to share these videos with anyone who you think would find them useful and if you have the means you can contribute through patreon and there's a link to that page in the description section below be sure to subscribe for future videos and thank you for watching
