# Solving Real-World Data Analysis Questions with Python! (Internet Usage Analysis)

## Метаданные

- **Канал:** Keith Galli
- **YouTube:** https://www.youtube.com/watch?v=m6v7a3sZlL8

## Содержание

### [0:00](https://www.youtube.com/watch?v=m6v7a3sZlL8) Segment 1 (00:00 - 05:00)

all right I think we are Live Here let everyone trickle in for a few minutes and while we do that I should probably figure out how to share my screen new year new me yeah all right what's up everyone let me know where you're uh tuning in from in the chat as I uh get the slide deck up and running we're going to do some fun data analysis today always a good day to write some code that's what I always say where are people tuning in from okay I don't know what it is every time I live stream and I probably say this in every live stream that I do I'm always just like overwhelmed by all the different configurations and whatnot uh and I guess different windows I have open and also like I don't I need to work on my video feed because it's so bright behind me that I I'm just like all right all right here we go Bangladesh Venezuela Helsinki India Georgia Moscow we got a lot love it I'm in uh new the state of New Hampshire in the United States right now kind of close to the city of Boston all right I should be able to screen share now oh no that's weird I want to make this a full screen got a Houston Texas in here got Chile um all right I'm just uh I don't know why the settings are weird and why the screen share is only showing this little teeny bit I don't want that one either I want this one I think this what work nope I want the picture and picture where's the picture we're gonna try this one more time there we go now you can see me and the slideshow that's awesome also why is my chat where's my at all right now we're rolling now we're rolling turn off your Halo oh no I don't know what my halo is that just how bright it is I have a light on but the background is just super bright all right well enough chatting on my part I've just been rambling on but welcome everyone that's tuning in and anyone that Tunes in Into the Future doing a little internet usage uh data analysis today so we have a data set from the World Bank and I'll show how you can kind of collect this data and access it and whatnot and also we will be raising some money for give internet. org so awesome cause excited to do a little bit of both and I'll provide information but um so goals overall for the presentation learn some real world python analysis skills and raise money for a good cause a little bit about the sponsor so in the YouTube live stream you should be able to see like some information about give internet uh and you can like make donations anything that comes through this live stream will go straight to give internet and there's even a donor that is doing onetoone matching but I think like a little bit of background context and I kind of share this in some LinkedIn posts and whatnot that I made prior to this but uh YouTube has been

### [5:00](https://www.youtube.com/watch?v=m6v7a3sZlL8&t=300s) Segment 2 (05:00 - 10:00)

invaluable for me at least in my Learning Journey uh like a concrete example was when I was a senior in high school I ran out of math classes to take because I'm a big nerd and instead of just not taking math that year I was able to kind of keep learning math keep learning calculus and I did it 100% basically through um different instructors on YouTube that just offered lessons for free one of the biggest ones was this guy Patrick JMT and I just watched like every one of his calculus videos and you know continued the journey so I've benefit a ton from YouTube and I you know started making videos largely to you know give back a bit and teach whatever I could on YouTube but I think the big kind of like challenge is that while YouTube is great for everyone that can access the internet and it's you know free of charge there's a lot of people that don't even have the ability to access the internet so this live stream is to hopefully raise some money for people in places where it's harder to get that internet access and give internet ultimately um strives to get more and more people connected around the world and also just a quick mention we got Elaine in the chat I want to put her on blast um she can answer any questions that people have about um give internet so she'll be kind of moderating the chat throughout the live stream so feel free to ask any questions or you can also check out give internet. org this is the first time I've ever done like a live stream fundraiser so still learning very much myself um little bit more information about give internet some Partners kind of recapping some of the stuff I mentioned But ultimately providing laptops and internet access to people in different communities that have a harder time accessing it um some specific countries and not all of these I don't think are currently active but countries like Nigeria Kenya Georgia Ghana Indonesia Uganda India kind of one of the reasons I was asking everyone to see where people were tuning in from was because I was curious to see if there was any overlap and definitely with India seeing a couple people from there um you know there's a lot of parts of India that uh you know doesn't have a great high percentage of people with internet access um and specifically I was just kind of curious this is uh views on my channel and watch time over the last 90 days India is at the top of the list but I was also just curious about some of these countries served by give internet and so here's kind of a breakdown of over the last 90 days um watch time from these places and you know it's a decent chunks so my goal is hopefully we can raise some money and get higher numbers here more people uh connected in these places um but it just kind of uh I think something that it's hard for me to like realize sometimes with YouTube is just that I'm just sitting here in my little small uh you know room in office and it's pretty cool that like YouTube can connect all of these places and it's just like sometimes hard to realize that because commenting and all of this it's not I know I just I don't sometimes uh think about it properly all right I think that that's about it I think let me see if I can monitor anything here um if donations are made uh something to know about it is that you can direct donate directly through the YouTube stream um all the funds go to give internet uh YouTube doesn't take any cut for fundraisers the donations will be matched one to one and this last thing I don't know if this will actually work and maybe I'll just do it myself at some point just if I need a break from analysis but if you donate and then you can you can feel free to type out an exercise so if you donate $25 and you type out push-ups in the chat I'll do 25 push-ups so maybe an extra bit of incentive to donate and for me to uh monitor my health and hopefully uh get a little exercise in while I uh while I stream but I think that's about it let's get into some fun data analysis and also make sure like feel free to ask questions throughout this entire stream I'm happy to just whether it be about the actual data we're analyzing or just python or anything in general I'll try to pop up some questions and whatnot and answer additional questions um throughout so as I kind of mentioned will be collecting some

### [10:00](https://www.youtube.com/watch?v=m6v7a3sZlL8&t=600s) Segment 3 (10:00 - 15:00)

internet data from the World Bank uh if you want to access the data right away you can oh no I typed I have a typo in this but it should be github. com Keith galy interet usage analysis GitHub repo um hopefully that went into the chat but you can find some information about the data source there but I'll also as part of the process show how we can actually collect data the World Bank has a lot of good stuff when it comes to getting data to use to like if you want to analyze World data the World Bank is a great spot to go so I'll kind of show how you can get some data from that source for any potential projects you might want to do some data skills that we will use uh as kind of every one of my maybe not videos but a lot of my videos use a python pandas component this one will be no different we'll probably use some in libraries M plot Li plotly Etc and we'll make sure to use the fun old uh AI tools because I feel like you can just code you know 2,000% faster with uh with these tools um if they're used properly and then you'll start with data collection cleaning do some data analysis and then I'll answer some additional questions um and some questions that we might aim to answer with the data set that we're analyzing and feel free to like type in questions that you have as we start actually getting the data but here was just some that I wrote down and we'll kind of see how we can use pandas and different tools to kind of visualize and and get answers to these questions all righty I think that that's it so let's actually start getting into the data collection process okay hopefully you can see this internet window but I'm going to start by going to that GitHub repo that I mentioned and I'll make this a bit bigger so everyone can see I guess it didn't make my search bar bigger but github. com Kei g/ interet usage analysis and the reason I'm going to this GitHub repo first is that um the data source this portal is right there and I forgot the link otherwise so I'm going to go to the World Bank to start getting some data on internet usage around the world and if I click on that link we get to this datab bank. org so to kind of be as inclusive as possible I'm going to select all of these countries to get our data for and then we have a bunch of different types of data that we can get and this might be still too small so I'm going to increase this a bit and I think I can even expand this window so if you can kind of read this is like all sorts of data that you have access to um for all these different countries I feel like it's a great uh data source if you have any sort of like um portfolio project that you want to build or you just want to practice your data analysis skills and learn you know maybe something about your country and countries around you in the process tons and tons of stuff here I'm kind of getting overwhelmed by how much information and columns that you have uh available but because we're looking at internet usage data I will specifically type in internet here in the search bar and there might be other things that you would want to access but I think that this is probably these five um columns are probably a good starting point one thing if you were to like take this to the next step that I might recommend is looking up some education um metrics and kind of adding that to your data source as well but we'll just start with the internet data I'm going to select all here um and then over the time period I will use all years here that I have access to

### [15:00](https://www.youtube.com/watch?v=m6v7a3sZlL8&t=900s) Segment 4 (15:00 - 20:00)

so that's from 1960 to 2023 and I will apply changes we see we get a little sample of the data set here for Afghanistan uh but I could also you know preview the data for other countries as you can see so here's Brazil I saw most recent comment was from Algeria so here's Algeria in the data set as we can see we can go back here um and I'm going to download all of this data so I'm going to go ahead and download I'll download a CSV because that's probably the easiest for us to deal with um let's just call this uh World data and I want to save it in my internet usage analysis data folder so maybe I call it even World internet data and for convenience uh oh no it's a zip folder I don't like that hopefully you saw me doing all that um ah where did my window go sorry everyone I'm lost oh and my computer's about to die all righty all right um okay there we go um I downloaded the CSV it's a siipped folder I don't think you can see the CSV but um basically have it now saved locally in a directory where I have that GitHub code so I'll go back to the GitHub spot um and yeah if you want to access the data but don't want to have to download it the way I just showed uh you can access this CSV file in the GitHub repo um so one cool thing one trick is you can copy this link and actually within pandas uh you can read the data from this URL directly actually I guess you have to go get the raw data first but we see we have this raw. GitHub user content data you can just basically copy this and access the data you want via that so let's go ahead and start um doing some data cleaning so we can start answering some questions about this data so we got the data from datab bank. org and now we are going to go ahead um actually start analyzing and cleaning up that data so let me share my um Visual Studio code tab all right hopefully people can see this screen I'll move my visual studio code over there hopefully people can see Visual Studio code now the only issue is that it's quite small on this screen but I will try to make it bigger so I'm going to go ahead and create a new file called something like analysis IPython notebook generate a code cell maybe that's slightly too big and I can just do something like import pandas as PD DF equals pd. read CSV and then paste in that link I shared and that will give me my data so this I'm pasting In The Raw data link right now um it's a really long link I guess but if I go ahead and do DF head we see we have the data locally here uh let me know in the chat if people can see this well enough I I'll hide this over here I also need to check to make sure that I can see it on my side um yeah it seems like it's all showing but here's what the data looks like currently and I see a good question in the chat that I will go ahead and answer um how do you come up with analytical questions for such projects I think it really depends on who you're

### [20:00](https://www.youtube.com/watch?v=m6v7a3sZlL8&t=1200s) Segment 5 (20:00 - 25:00)

working with on the project uh if it's something you know you're doing yourself like honestly just ask your questions ask yourself like what are you curious about with the data so here for like the internet usage data we have like a bunch of different countries like one thing that I'm you know curious about by just looking at this data is what countries maybe have the most amount of growth recently so if maybe they had very very little internet access and then let's say the year 2020 comes by and it spikes up I want to be able to find that Insight um from the data so I'm kind of just thinking about what I'm curious about if you're working in a corporate setting you know you'll want to talk to kind of the business stakeholders people on your team uh ultimately they'll often dictate what the analytical questions are and then oftentimes as a data analyst or a data scientist like you're responsibility is to take those maybe High Lev questions and break it down into simpler you know concrete um steps that you can take to kind of ultimately get to that answer so that's a little bit of information on the analytical questions all right I'll make this full screen um I'm trying to remember all my shortcuts how do I do what is the hide Explorer or H hide Explorer shortcut I think it's like command I feel like I'm going to accidentally just uh close out of everything if I try to use shortcuts uh command shift no I want to hide the Explorer command shift e no that didn't work how do I hide the sidebar command B okay that's the shortcut I need to learn more shortcuts all right so we have our data set here and I'm loading this long Link in but if you download this file locally then you can just do something that's more simple like data SL World Bank Country internet dat. CSV and we've loaded that in as uh our data frame all right so oh thank you for the command B help there too um okay so we have this data and I think like before I do and try to answer any questions and do any serious analysis on it I think that like this first step you kind of take with anything is like how can we clean this to make it a little bit more usable and there's a few different steps we can take here I think the first glaring thing that sticks out to me is if we look at our columns like I really hate how these are formatted with the 1960 and then year 1960 um so I want to clean up those names also maybe these names as well so I'll show two steps that we can take to do both of those things um so first starting with these names basically we can just look for a pattern here so the pattern that I see is like anything that has this bracket in it I want to just strip away everything um to the right here so if I do something like we can use a list comprehension to help us out here I could do like column for column in DF do columns and honestly I could probably just do column four right for this would give us the first four characters for each column if we run this we see we get this only issue is that um it also takes the first four column names that we have so what we could do is add an if statement to this so if how about the space bracket like year just even like the start of this even if you just did a bracket because none of the other um other columns has brackets if Brack in column this will work maybe it's not the you know I think it's clean enough like this is a one-time thing that we have to do so we don't have to worry about it being the most elegant code uh long time uh and I see a question do you save it um if you clarify what you mean by the do you save it comment I will provide it extra information uh all

### [25:00](https://www.youtube.com/watch?v=m6v7a3sZlL8&t=1500s) Segment 6 (25:00 - 30:00)

right else we just want to leave the column as is so I'm going to see if this works I might have to move this if statement before the in call I forget the order in these list comprehens I'm going to try this real quick okay that did not work um for column in column four for column I think it might be moving this here I always forget where my uh if statements go inside of list comprehensions okay that looked like it worked so we see that we take the first four Val or um first four characters of our column name if the bracket is in it otherwise we just leave the column alone so that would be one way to just rename our columns so I'm going to go ahead and save this as our column so if I do DF columns equals this uh then when we look at it again look at our DF columns oh yeah the live stream will be saved don't worry it will be accessible here so if you have to like uh leave you you'll be able to access it whether that be tomorrow the next day or any day in the future um here we go we got our columns here I think that these are a little bit easier to work with we stripped away that annoying year syntax from it another thing that I have liked at least recently and there's probably other ways to do this but I use a library called janitor here and if you don't have this I guess one cool thing within a jupyter notebook you can install um libraries directly by doing a exclamation pip install and we need to install Pi janitor gives us access to janitor see we already have it but if I run this janitor command basically it doesn't it's kind of I guess weird because it doesn't necessarily tell us that we're using anything from janitor But ultimately this janitor command gives us access to this clean names function and so if I run this and then I look at my data frame instead of having you know spaces capitalization Etc in names it's going to just lowercase everything and connect it with underscores and I just find that this is uh having a standardized way to deal with your columns is just kind of nice so I've been liking using uh This Clean names function that's accessible once you import janitor uh at least recently so um you have to uh pip install Pi janitor get access to this all right so we have cleaned up our column names overall but we still have a lot of I guess other stuff to clean up here in the data to make it workable I think uh one question that I have is uh does anyone in the chat think that we need all of these columns I don't know if you're looking at the data I could even uh another nice thing about Visual Studio code is I can uh go ahead and look at the CSV and they give you this preview button so I'm going to go ahead and preview the data hopefully you can see this on the screen but this is all of our data and I guess the open question is like do I need all of these columns or do you think that uh like there's a subset of them that I can use and if so which subset should I go ahead and kind of filter by uh um yeah I definitely think I could use replace instead of renaming columns um actually another I think thing that the uh P janitor Library gives you access to is like you can use functionality called like replace columns there's some uh a little bit more English readable functions that it gives you access to um instead of having to do something like uh like traditionally you might have to do d f. replace then you need to specify I think your columns uh and like do something like this would definitely work um as uh Alexander is saying but I think and maybe I should go ahead and quickly look at um the py janitor documentation it's always fun to learn new libraries um P janitor uh okay py

### [30:00](https://www.youtube.com/watch?v=m6v7a3sZlL8&t=1800s) Segment 7 (30:00 - 35:00)

janitor so it gives us some extra functions let me see if there's a replace here uh where am I oh here are functions oh like rename columns like it gives you spe uh special functions to do this in a potentially easier way um depending on I guess who you ask I was hoping to find the GitHub link here okay here we go yeah this is like how you would do it normally in pandas and we see we have like a rename here um and this is like a way to do it with P janitor so I'll share this in the chat uh I do like this idea um I think we'll get to this in a bit but transposing the year columns I do very much like this idea um because I think that's a little bit more natural is to have our years so when we say transpose we're basically meaning like take our table data uh and I guess I'm blocking myself as I highlight this comment but we take our table of data and then we basically just like all the rows become columns and vice versa uh that's what happens when we transpose the data so I think transposing the data does make sense uh as we get into this um one second I just want to pop open one window wait shoot what just happened okay we're good the live stream closed on my side so I'm just opening a window so I can see the live stream uh as we go and anyone that has joined recently I encourage you uh to check out the SP or not the sponsor uh the partner here that we're raising money for give internet. org um we have a lane from give internet in the chat to answer any questions about the nonprofit basically trying to provide laptops and internet to people in communities that you know don't have as much access and hopefully you know in the future that will mean they can watch tutorials like this uh I see a question about the hollow Vis Library uh that is definitely a library that I want to play more around with but I have not personally used it uh at this point in time so I definitely something that I plan to check out soon um I see some other people suggesting recommendations you can melt the DF yeah I think we'll get to that kind of on the same lines of the transpose there's different ways to do this melting would be one way transpose would be the other I guess getting back to the original question of like is there any columns that we can get rid of I think that one thing that I you know sometimes is not immediately obvious in a data set like this is like we also have to ask ourself like about the internet in general uh back in 1960 1970 1980 the internet didn't even exist so we're going to have no data for many of these years even though we pulled the data for it so I think basically if you go back to the early start of the internet and Tim burner Lee um when was the internet invented it's kind of weird for me to even think of a life before the internet but we see like 1983 is when like the first research and like first semblance of an internet was created but I don't think that you really saw any public internet until uh Tim burners Lee invented the worldwide web here in 1989 so really like realistically we would not have to I think look at any data before 1990 let's say because it's just kind of so irrelevant um that we can basically drop all columns uh after or between 1960 and 1990 so I think just one fun thing to just think about like sometimes like where the data is even relevant and a lot of these years are not even relevant to our data so before we even like get

### [35:00](https://www.youtube.com/watch?v=m6v7a3sZlL8&t=2100s) Segment 8 (35:00 - 40:00)

into any transposing or um anything of that sort we can just delete uh some of these columns so our new I always have trouble with like naming my data frame like I like just using DF because it's two letters but I feel like as you kind of iterate things sometimes if you rename and save things as what you had it before um it uh like just gets messy um I'm trying to think of what I can name this DF I don't know DF small D I might even I don't like df2 either I maybe I'll just save it as DF still um I think okay so I want to only take the columns that I need so I want these first four columns so honestly like selected columns again I'm just doing this probably once so it doesn't matter but I want these first four ones uh I could just copy the names so I might do that there's different ways to do this I could use indic values so I could do something like um I'm trying to think DF do columns I think I could do something like 0 to four and there's some way I think that I can paste in multiple indexed values maybe it's using IO or L but we could use it do it by indexes or you could do it by just using language but I'm going to do country name country code series name and series code uh that gives us kind of our base these first four but then we want 1990s onward so ultimately we will want to um like selected years equals uh string year for year and range uh this is doing some good autocomplete so I have co-pilot going in the background I see that someone is asking a question as I highlight this um so if I go to my extensions which I also forget the um command shift X um I use co-pilot here I think co-pilot now there's some co-pilot that is free I think you can access it for free or something up to an extent I pay for it but another one that I've heard recommended in the past is codium which I know they have a free tier so co-pilot is what I use but codium is another one you might check out um all right so I want 1990 up to I think the last year in this data set let's see what we have 2023 so I'm trying to think if the 2023 is inclusive I think it is not so I'm going to go up to 2024 and let's look at our selected years and I might do this in a new cell just so we don't run the clean names function again wouldn't do anything okay here's our selected years and we see it has up to 2023 that's good so ultimately if we wanted to filter our data set we could do something like uh DF equals DF uh selected columns plus selected years and now if we look at our data frame so I might even just show print data frame or length data frame columns before we run this and then print length data frame columns after we run this and we see that we have now eliminated a bunch of those columns and if we look at our data frame we have 1990s onward so much more feasible amount of data to work with I might honestly like recommend kind of checkpointing this and just being like you know df2 CSV data cleaned World bank. CSV as kind of a good starting point and I usually make the index so if we open up

### [40:00](https://www.youtube.com/watch?v=m6v7a3sZlL8&t=2400s) Segment 9 (40:00 - 45:00)

our few this is so embarrassing right now that I'm not using the shortcuts I don't know what it is I feel like just because of the okay command shift e I'm need to get better at these explorers um let's see all right what am I trying to do okay so now run this command and we will see we have a cleaned World Bank kind of I just like sometimes like checkpointing my files as I go uh especially when you're working with a Jupiter notebook um it's pretty frustrating uh to like accidentally run cells out of order and whatnot and then like delete stuff that you've done I feel like my notebooks sometimes get very messy so I typically like my Approach in any sort of data science task is I create a notebook first and play around with the notebook um but then when I really want to like finalize and make something more stable more like uh repeatable long term I'll convert it into Python scripts that I can just run a little bit more uh easily if that made any sense I also see there were some donations in the chat so shout out anyone that has donated so far much appreciated um all right okay where are we okay so now we have our cleaned data frame that's pretty good uh and now I think we can kind of get into some of the stuff that people were talking about with transposing the data or um similar so what I might even do before that is let's look at our data again in the chat real quick the next I guess three people to comment their country we will look at so the first three uh countries I see in chat I will filter the data by so quickly type your country in doesn't matter if you've already typed your country in next three counted if it's in the data I think that every country should be in the data source we got Greenland up in here wow that's cool let's see if I can find Greenland in this data if I can't find the country in the data um so I'm going to do f. country we got Germany Greenland and turkey as our three and maybe if a couple more trickle in I'll also included them uh country name uh so I'm going to do DF doc country name. unique just to see what countries we have okay it is in alphabetical order we do see Greenland so let's see how the data for Greenland looks uh we we missed I'm going yeah I might include others as they typ typed in uh but Greenland turkey and Germany are the first that we'll look at so I'll call us like selected countries equals Greenland and one thing that we have to be careful of is like making sure we type in the country name the same as it's listed in our data frame a probably better way to actually access these countries because you don't know what spelling they're going to use like uh if we were you know in the United States I would say United States I could say United States of America uh maybe I would say estos unidos if I was in Mexico like there's many different ways that we can represent a country in language so I think that the best way to kind of get the selected countries we want uh like in a more scalable way would be to use the country codes that you see but Germany and Turkey were the three do I see turkey in here we'll see if this gives us our data okay uh and now if I just wanted to do like country DF I might do DF country name. is in selected countries and then I would do country DF doad um so I would probably use the country code as a more scalable way to do this but let's see if we get everything we want um I only see Germany right now but that's probably fine because there will be five rows for Germany so we want to see Greenland turkey is not there it's probably spelled a different way what does the country code for turkey uh let's see if co-pilot knows this give me the threel country code for turkey I feel like Works smarter not faster oh no just use your brain uh I'll just look it up

### [45:00](https://www.youtube.com/watch?v=m6v7a3sZlL8&t=2700s) Segment 10 (45:00 - 50:00)

country code turkey no I don't want the uh I forget what you call it it's like ISO 3 I think tur okay so let's see if we can find what the country is listed in for turkey okay I'm going to create a new cell real quick DF uh so see we'll find Turkey this way I bet you okay it's spelled this way the way it's spelled this just reminds me of the classic meme that came out of the Olympics turkey Olympian shooter this guy I just I should have known the spelling from his this Legend's shirt but I did not I feel like this guy became very famous out of the Olympics um TR would be the two-letter code I think tur is the three-letter there's different codes that are used at different times I don't think wakanda by the way will show up in the data set I do see someone asked for wakanda in the countries unfortunately I don't think the World Bank is collecting internet usage data on wakanda at the moment but yeah maybe in the future uh all right so we now know Turkey would be spelled this way the more scalable way to do it would be um I don't want this discard this would be to use the country codes but if you're just doing like some selected analysis it's totally fine to kind of uh write some stuff out manually from time to time you just have to be careful that you're getting it correctly and we now see our um five or three different countries um I think now it it's going to be hard to do any sort of meaningful analysis maybe using all of these different series names so I would recommend just kind of drilling down into a given one these another quick tip I guess is that sometimes it might not be clear what each of these series names means so when we downloaded the data another thing we could have done is we could have looked at the metadata why is it saying blocked oh no I wonder if we're causing I don't think there's not enough people in this live stream to cause it to have issues with the data um what I was trying to say is that we could have also gotten the metadata from our search so let's see if this works the second time I try this so I want just internet take all the internet columns and then let's like it doesn't for this purpose I can just use a single year so maybe I just did 2023 if I clicked download metadata and I wonder if the metadata will show now come on metadata okay so one thing to note is that like sometimes it might be worthwhile to get metadata to so like if you wanted more information about any of these different pieces of data often times in a data set you'll have access to something like this metadata that we're accessing right now um so I think specifically the easiest thing to look at to start is um this it. net user. Zs which is just the individuals using the internet as a percentage of the population but it does tell us about where the data is coming from so it's coming from the international telecommunication Union um so just kind of cool to see where World Bank is getting their source and it you know gives us some additional information I'm trying to see if there's like so this is a little bit more I guess interesting if we look at the secure internet servers column that we extracted and just kind of it gives us more information on what that actually means and if this is hard to understand any of these columns like what I would do is just be like explain this information I'd paste in this metadata into like chat GPT or something and be like explain what the heck this column means um so little bit of extra information but I think that everyone here should kind of have a good sense of uh what percent of population means here for this First Column it t. net. user. Zs so I'm going to go back to our data

### [50:00](https://www.youtube.com/watch?v=m6v7a3sZlL8&t=3000s) Segment 11 (50:00 - 55:00)

set and we can go ahead uh what am I about to do let's filter our data by just um this individuals using the internet so I'm going to go ahead and do country DF equals uh country DF uh Series code I think Series code is easier than trying to match this series name so I want the series code to be equal to this and now if you look at our country DF and maybe I call this like uh percent internet I hate like the issue is the challenge with this is you want to give a descriptive name of your variable but then within pandas when you have to work with this variable a ton um it gets frustrating because this is just a long thing to keep typing in so I'm going to call this CDF it stands for Country data frame just to make it easier uh it doesn't stand for cumulative density function which is a fun math term so country data frame like I'll just say like percent internet I don't know now it's getting long again CDF is fine we'll just use CDF um and now if we look at our CDF we're going to just have three columns this is just percent internet usage uh and now what we could do is we could do that transpose recommendation that um someone had we could also use the Melt um I always forget how to do melt off the top of my head so I'm going to start by doing the um transpose option so I'm going to go ahead and do basically we don't want any of these columns right now we just care about these percent values so what I'm going to do is CDF um trying to think of the best way to get just those columns CDF now I'm trying to like go back to my uh IO uh documentation so IO let me just do something real quick I always forget the order of things so if I did IO let's say three onwards what does this give me okay this gives me the third column onwards so what I want is I want all rows so this is good but what I want here is I want the First Column and I want everything after I'm trying to think if this there's something cool you could do is like something like this I thought but I think that this is not going to work yeah I thought there was a way I could get creative like you could do like something like 1 three comma 5 and we see we get like different things um I think I'm over complicating it like I definitely like wouldn't need to do this I could just drop columns so I could just do dot drop columns and I could drop I don't need the let's just look at our data frame again uh we got we just need to drop these three so I'm going to just go ahead and do CDF equals CDF do drop columns equals country code series name Series code there's different ways to do this but then I will be just left a much more manageable to work with data frame and now the next steps I want to do to actually like it's easier for us to work with something that has our years as the rows and the country names as our columns so a trick we could do to get this um I think that yeah definitely like melt actually I wonder if this would work right here this melt it's suggesting me um I don't know if this would exactly work anyone in the chat want to put the Melt command uh I'm trying to think yeah

### [55:00](https://www.youtube.com/watch?v=m6v7a3sZlL8&t=3300s) Segment 12 (55:00 - 60:00)

melt command I think should work we ultimately want melt or pivot um might both work we want our country names as the columns the years as the uh row or as the index and then the values to stay the same I'm going to do it a different way um I'm going to set my index to be the country name and this in place equals true just makes it so I don't have to reset things so now we have the index set as our country name and if I then like I'll just call this like final DF equals the index set as the country name I guess let's just show what this looks like first oh this is what always happens with uh rerunning cells is you change the cell and then you try to rerun something if we look at CDF now we see that the index is actually the country name and what we can do is our final DF might be something like CDF do transpose and this will just basically flip the index and the years so now we look at final DF do head let's say we get it nicely in this format much easier to visualize something like this and as I kind of mentioned previously like right now we're working with in a Jupiter notebook if I was to do something that I wanted it to be repeatable so that I could run this function and do some analysis on a bunch of different countries I would start baking some of this code up here into different functions so that we could easily reuse it based on let's say like an input variable such as our selected countries that we used here so instead of like being you know having to be careful to not rerun sales again and again okay hopefully that makes sense but as we can see we get something like this anyone have a different command so if we look back at CDF um I'm going have to create a new cell I don't want to rerun some of this in place code that's why sometimes it's better to not do things in place and just set things as different new variables because if we rewrite different pieces of memory like if I rewrite CDF then when I try to rerun this first line here it won't be able to execute but if I give each of these steps like different logical logically named things uh then it does make more sense so CDF if we see this anyone have another command that we could use to uh convert the look of this basically just to flip the um actually maybe you could have even done it from back here when we had a bit more to work with but to flip the uh country names and columns also shout out to everyone that has uh sent a donation that's awesome to see the progress I can't see it easily on my side uh but I did just check it's awesome to see that we're at $192 that's really cool and that's matched one for one so uh that's ultimately like 400 USD uh all right what was I trying to look at where is my stream I lost my window h no there we go uh yeah anyone know the pivot command I I'm like embarrassed because I feel like I always uh um I always screw up my Pivot uh but I think it's never bad to uh have to try something so I'm going to just do pivot DF equals DF so CDF pivot are can I do this easily I don't know if I can do it easily like this because this is not this is already the commands I would look at I'm just going to use my transpose method just so we can kind of keep moving on but I would it's some very use ful functions to know are pivot and melt

### [1:00:00](https://www.youtube.com/watch?v=m6v7a3sZlL8&t=3600s) Segment 13 (60:00 - 65:00)

here to easily convert this into different formats again though like I think one thing that I always will say about Python and just learning programming in general uh it really like you don't have to remember always the exact syntax what I think is important is that I know that these are two commands that could help me here and if I need to use them do something and I don't you remember a transpose method or something like this I will look up I will type in like you know please convert this using like I'll look it up I guess a chat TBD command or stack Overflow or something to get the command I need uh so I you know remember that these are helpful functions but I don't always remember the exact syntax um just kind of a good I think thing to think about um okay so we have this kind of final data frame and I'm going to give this now a more descriptive name like uh selected country G's internet usage over time DF and again this is the challenge this is such a long name it's very descriptive but if I needed to run uh you know useful panda uh functions on this not super ideal because it's so long but I maybe I would like also save this as like 2 CSV data SL selected countries. CSV index equals false and now maybe I just load in none of this was necessary but maybe I just switch back to my name DF so that I can process and visualize things a bit easier here um all right so I think the next step here would be to visualize this data so different ways you could do that you know simple way would be to use map plot lib I could use something like plot so what is our x axis what is our y AIS um our X variable will be the years so oh shoot I actually did want to save the index this case uh I'm going just go back to my um that's not country name okay this works um our xaxis is going to be our index so if I do DF do index that's going to be our years and then our y variable would be like different values here so like for example we could just type in greenlands internet usage over time and let's go ahead and do a plot. show and we see a very pretty graph here of Greenland over time obviously that is not look the best uh so you'd want to format that a bit better um I might also do this in plotly Express potentially I could do a px. line chart taking in the data frame our x- axis is going to be the index Y selected countries I think I could also do yeah I guess this would probably work let's see if this works that's a much better graph I think it's getting infused though here and there's also missing data so if we look at our data there's a lot of zeros in greenlands so we might want to interpolate that or do something with that um make this a little bit neater so let's look at the full data okay we're looking at Greenland um I also want to just get some info on our data frame

### [1:05:00](https://www.youtube.com/watch?v=m6v7a3sZlL8&t=3900s) Segment 14 (65:00 - 70:00)

I'm G just do DF just to make this explicit DF year equals DF index might just duplicate our index real quick I feel like I sometimes just have an easier time working with columns than working with my index uh just want to see what type that is uh the year is an object we want to convert this to a number just to make things easier df2 as type int okay now it's an N64 I feel like this might work a bit better going to do okay that looks much more reasonable uh I think it why is it getting so confused is this also these values are also objects instead of numbers um they look like zeros here but I feel like if we looked at the raw data something weird is going on so I also want to just make sure that all of my data frame values uh basically are of type int or I guess they don't want we don't want them all of type int we uh DF equals DF uh as type float does this work convert all my columns except for one to a float value okay you can do something like this using a little GitHub co-pilot to help us let's see if this works that's having trouble too um like the easiest way you always have to think about like how much you need to do something like it would be very easy to just do DF Germany equals this this I know that this works and that will give us the values that we want okay that's so weird this is so strange I know exactly what our issue is I don't know why it's displaying things weird why it's saying that this is a zero when it's not so if I printed our data frame. head this is so strange it's saying zero here but ultimately if we look at the actual value so if I go to our data why is it it's saying zero here that's weird I think I'm going to save it as that if we look at our data it's giving us these two dots so we ultimately like want to make these two dots actually a zero so if we do a replace and sorry that this is getting a little bit messy but if we do a DF equals DF do replace dot with how about zero or N I think like zero is totally fine I guess you don't want to do zero because that would kind of screw up everything um I would say pd. na here let's see what happens here and I'm going to get rid of this so we're doing okay let's see our info now like we don't want these to be zeros like that's what's kind of screwing things up we want them to be not a number and then we can kind of interpolate and do things on them I'm just surprised why it created these zeros in the first place though to take a step back and this is the challenge too I feel like with live streaming and like working with notebooks like now we have this kind of looming just annoying thing with these dot dots that I want to get rid of but it's like weird because when I print my data frame it's showing me Zer instead of the dot dots um I wonder if there's a way to

### [1:10:00](https://www.youtube.com/watch?v=m6v7a3sZlL8&t=4200s) Segment 15 (70:00 - 75:00)

like show the dot dots so I'm just going to go back run these lines again okay why okay I'm going to just look this up real quick like why is pandas showing dot as a zero when I print my data frame this is weird Okay let's try the two string method okay there we go oh I guess these were actually zeros okay that's why I'm getting tripped up these were actually zeros that's fine but we get the Naas down here okay sorry I was confused I was thinking that these should be just the dot dots but those I guess are actually zeros uh let's just look at the tail real quick okay we see for some of these values that we don't have for Greenland so if we look at the entire data frame I just want to see what values are n's and I'm trying to think if I look at this as a scrollable element it's really just the end values for Greenland here that are n's I'm curious if now we can convert all of these columns corvert by columns Germany Greenland and turkey to float types uh I guess we could fill the N first so maybe one thing that would make sense sorry I'm jumping around a little bit here why is this window still open um let's look at our data frame I'm going to show the full data frame again one useful thing we can do here is we could do DF uh or like I want to call this interpolated DF f. interpolate this interpolate function on pandas data frames lets you fill in different pieces of information uh one thing that's interesting here though is that this is at the very end of the data source so it probably won't work out of the box so if I do DF equals DF do interpolate let's see what IDF looks like it doesn't do anything to those columns but if I went ahead and did something like method equals linear uh I think that it would automatically do this and then I think I can do uh limit Direction equals both huh why are you not filling in values what I could do is this method spline I'm going to get into the details of what we're doing here these Nan's don't want to change basically we're trying to get rid of these n at the bottom of the IDF and while we don't have real values we often can fill with values that kind of would make sense based on the current trends so that's kind of what I'm doing um I feel like things are being weird though right now um I see someone that has a problem importing matplot lib how to install this uh I would recommend if you can

### [1:15:00](https://www.youtube.com/watch?v=m6v7a3sZlL8&t=4500s) Segment 16 (75:00 - 80:00)

just doing like a in your terminal pip install M plot lib uh maybe I can attach after this live stream some more concrete resources but that would install it for me I feel like my everything has gotten very messy in this code which is frustrating but I feel like that's just the life of coding sometimes um what I might like check is like I'm trying to do this interpolate on three different columns at once and so I think what's kind of uh being tricky is that I'm calling this on all of my data so what I might do is Germany specifically get the data frame for Germany Greenland and turkey and call interpolate on those columns specifically and then maybe I just save this as those specific values I'm going to see if this works and I spelled turkey wrong again it is really getting mad at the data types DF G to try this one more time DF Germany Greenland I'm going to just I guess I could just call these selected countries if I wanted to be more precise um I think that there's an errors equals course option I can pass in this as type let's see if this works okay we're going to debug um convert columns to float type pandas and uh course errors if Nan's exist or ignore maybe see what Google Gemini says here okay we could call pd. two demeric that will work I like this I'm being like this is way more sophisticated than it should be here uh let's see if this works come on H I'm sure there's a much more elegant way to do this but okay that finally worked I believe let's look at our DF info now okay now Germany's a flat that's good I'm going to just repeat this code for uh Greenland and turkey now we got all float columns there's probably definitely a better way to do this to handle the N I'm now curious to look at the full data frame that looks pretty good okay we see our n that's great okay now let's just try to interpolate on Greenland if that makes sense I just want to fill in data that makes sense for these last few values just as kind of a proof of that you can do it what happens if I just do this okay so we see that if you don't give it anything you just use the base interpolate method it just takes the last value and copies it uh a little bit better of a so I'm gonna call this IDF equals DF do copy or something uh and then IDF so this is the interpolated data frame of Greenland all the other

### [1:20:00](https://www.youtube.com/watch?v=m6v7a3sZlL8&t=4800s) Segment 17 (80:00 - 85:00)

countries have data equals uh IDF Greenland interpolate so I'm going to go back and get our DF again and now if we look at IDF we see that it's yeah the same values here at the bottom there's other interpolation methods so like a cool thing oh W I still have the mat plot lib comment on I still have so if I do method how about splines look up spline interpolations they're pretty cool I'm going to do a quadratic spline and I think that by default that this should work but real quick I'm going to copy this I'm going to move this to the next line I just want to show how this changes things interpolations are very helpful and you can do this like basically what we're doing is just like an inline regression like you can find Nan values and fill them with other methods and you could use just like a fill Na and you know fill it with like an average value but obviously with a Time series progression you kind of want to use either a linear or quadratic some sort of curve to fill that data in so if I look at IDF here and then I do the interplate and then I look at the result I going to show you what these two look like and maybe I just do the tail 10 run this line index column must be numeric or I think okay so let's see how this worked initially our values looked like this for Greenland and then it stopped here at the end I call this interpolate method on it and then it tries to fill in data for that and as you can see this is where we our last data point I guess was 2017 it actually thinks based off this data for however reason maybe I need to do um limit Direction both this just basically makes it Go in both ways let's try this one more time interesting th this method is giving us values for these last few maybe if we looked at a little bit more it would actually be more helpful so if we look at Greenland I think it uses like 10 data points by default do we ever have a decrease in Greenland Yeah we actually uh let's do tail 20 now I'm getting into the weeds of the data I'm trying to see if we ever had decreases up to 2017 was what we do actually know in our data 2017 maybe just the quadratic is not the best way to use it yeah interesting but there's different ways to interpolate you could just do like a linear one I think that the issue is this just fills the last value I'm spending maybe a bit too much time on the interpolation but interpret it's helpful to fill in information that you need um I guess getting to our actual line chart using plotly if I now pass an IDF we get a view of what these different countries internet usage looks like over time I can zoom in and kind of look at things a little bit more detail um I can zoom out with I like plotly because it's interactive zoom out all maybe I'll reset the axis um and as we can see uh both Greenland and Germany kind of shot up in the 90s but then you know since then turkeyy is kind of caught up and the Greenland resident here in the CH if he's still here uh might be able

### [1:25:00](https://www.youtube.com/watch?v=m6v7a3sZlL8&t=5100s) Segment 18 (85:00 - 90:00)

to answer this better but I'm guessing maybe this is leveling off here around 70% due to the native tribes in Greenland maybe not even uh wanting to access the internet so some interesting things you can start playing around with here and that kind of showed a full process from that initial data source to um our a result like this with some specific selected countries again this Panda this Jupiter notebook that we're seeing on the screen is quite messy because it just has a bunch of cells we're kind of figuring things out as we went so like if I wanted to make this um neater I might make this into a function and maybe it's worthwhile of me doing that real quick so I might call this like graph or like I don't know what's a good uh internet usage. piy or something like that I might Define a function called uh compare countries which takes in a list of countries that we want to pass in and it uses the code that we had in our jupter notebook um trying to think if there's an easy way that I could maybe convince AI to deal all this trying to think where we like we probably would want to break up a couple functions we'd want to do a function for uh cleaning our raw data so I might have a function called get uh cleaned data that takes in our world Bank raw data and it outputs uh maybe takes in this maybe it in the series that we want to look at the series code and maybe it also takes um could also take in the countries here may maybe we just do a get clean data that takes in these three parameters um and then basically we would take all of this code above here you would maybe repeat the same process and just let's see I'm debating whether it's worthwhile doing this import janitor import I'll share all this code too um okay basically just paste all of the code that we had in our messy Jupiter notebook and I think it is worthwhile doing this just so you can see um what is useful here this we want to take this stuff don't need the print statement so now we have our selected year selected columns what else do we need to do here uh maybe we take the select countries and maybe that comes from the countries here so uh country DF would be equal to something like this and it wasn't country code it was country name here but you could use country code or country name and then series DF equals this and then I think we could basically just maybe this would work out of the box but turn our series DF I'm curious what this will give us if we try to run this function so if I'm just kind of showing this as a process um that you might use if you're like's let's say doing some analysis and then you get things working but you want to have more repeatable ways to process

### [1:30:00](https://www.youtube.com/watch?v=m6v7a3sZlL8&t=5400s) Segment 19 (90:00 - 95:00)

things so this would be our data is data slash World Bank Country data read that internet usage this looks good I might just use the same countries we had just to check to see if it works as expected turkey Germany Greenland and then let's see what happens if we run this file saying we don't have a file called World Bank Country internet data I think we do but I guess it's called CSV let's try running this file and look at that gives us the output we see here so run that one more time and then if we really wanted to we could just take uh get rid of the Columns that we don't need and you could name this whatever you wanted but uh maybe I do final DF equals series DF and I copy that drop line that we had and then we could even if we wanted to do the whole transpose stuff and I still would probably like get this function working and then I might break this up into even more functions I might not go into all of this in the live stream I feel like I'm kind of rambling on a bunch of different things but I my goal is to just show kind of real world ways to approach things and I feel like my typical real world way to approach any sort of data science data analysis task is to get things working and then be like okay it works now let's make this reusable so I'm going to start making functions out of my code that works and maybe it makes more sense to do it the other way make the functions first uh different people have different preferences but that's what I'm trying to kind of show I guess uh the naming I might play around with final DF do set index and then was the last step to transpose it there's a couple final DFS so I would I think this do transpose works or you can do T okay there's a lot here I would probably clean this up might even pass this into a like function to clean up but if I run this file we now get exactly the kind of data that we wanted to and we see the dot Dots here so everything is a lot easier now to work with um I might also want to make sure we do uh this is what I hate is there's a lot of steps here and you can use uh there there's that chaining syntax within pandas which probably would make sense for me to um to highlight in a video but like I'm kind of writing these all as single lines there's ways to kind of chain these commands around so you don't have to like do all these intermediate steps but I might also want to make sure that I um repeat some of these last final steps which was replacing this making the uh I don't necessarily need a year I didn't copy that code I'm totally fine with if I run this again uh this is what our data frame looks like now what else do we need to do I do want to copy the code that makes

### [1:35:00](https://www.youtube.com/watch?v=m6v7a3sZlL8&t=5700s) Segment 20 (95:00 - 100:00)

these numeric types so I might tell copilot for each country in countries column name DF execute the following int or float conversion or numeric conversion okay and it's not quite this because we oh actually this would work yeah uh let's now just do this okay this looks good this is all floats and now we could pass in basically any country we wanted to so I could do something like I saw Iceland before I could run this alt shift n we see now we get Iceland in the results I oh Iceland has great internet coverage I could do something like um let's see um I guess we could take this function into our notebook hopefully some of this is making sense I definitely struggle sometimes with um showing how to do certain things uh or showing things neatly when I'm working in a live stream setting so import uh what do we want to import that function that we just created which was called uh internet usage and we could do internet usage. get cleaned data the World Bank data we need to provide and we loaded that way at the top I'll repeat this line so I'm just kind of showing a another way now to do basically what we did with all of our notebook I'm going to call this WB for World Bank WB and now we get as you can see internet usage data for other countries um I could pass in the united Stat or India I could pass in United States and another thing you would want to do is like make things more robust and then we could start visualizing all of this so if I go ahead and copy basically the um line we had before if I pass in now internet usage here oh I guess I should name this something else so uh internet stats um year doesn't work we want to use the index all right we're very close to just having this function work I will make sure that all this code is uploaded okay so we have this um cool if we look at the info on that they're all float types I might do internet stats equals internet stats interpolate does this work now if we look at internet stats does that give us values okay this gives us at least values for everyone we

### [1:40:00](https://www.youtube.com/watch?v=m6v7a3sZlL8&t=6000s) Segment 21 (100:00 - 105:00)

don't have any n it does kind of just repeat the end um instead of filling it in anything useful I'm wondering now if this px. line function will work oh oops select a countries why should be this is weird countries pass in the countries I think this should work now there we go cool and you could play around with this you could you know save this I think there's a save button here on download plot as PNG I could zoom in on things uh see India still um which I'm kind of surprised that it's 43 it's just India is such a highly populous country but you know part of the reasons that um give internet the fund ER that we're raising money for is working in India as we can see you know it kind of just levels out I think that this increases a bit so if we used an interpolate like spline method so spline or method equals spline order let's see if this would work order equals 2 I'm curious if this will change the look of this graph I'm going to run this again uh internet stats. index equals so one thing that's cool is as you can see with the splin value this like now increases a bunch I don't think that this is May maybe it's right but I think if you actually look at the values they do not think that India also like this it ended uh United States above 100% so I guess this method didn't work the best um but you can play around with different like regression methods to kind of interpretate the data in a way that makes sense this kind of just cuts off the last value um any other countries we should look at so this is looking at internet usage over time we very easily could change this to be um one of the other parameters that we have so like individuals that say using the internet female so I might copy this code in so let's see maybe the female usage in these different places and if I remember looking at the data correctly uh some of these did not have nearly as much information so like India I don't think has much information on the female percentage as you can kind of see it kind of just doesn't give us a lot to work with it might be only a single value at all in the data set um Iceland's a little bit more complete uh we could look at male I think that's Me Maybe I typed that in wrong let's see what is the male version oh it's ma that makes sense Fe for female ma for male okay so yeah also not a ton of information I'm very surprised let me like this seems off 74% of United States males seems like it would be higher I'm going to just do the full I think the best accuracy is going to be the full percent of population not percent of males or percent of females as you can see both of those are close to 100% for Iceland and United States which makes much more sense there are you know certain segments of the United States population that um you know is not going to have access to Internet Maybe by cultural Choice maybe it's a member of a native tribe Etc I want to pause there though uh I'm going to do some other I think analysis stuff because I took way too much time like cleaning up some of this data where we didn't actually answer um many of the questions that I intended to but hopefully they still educational but I'm going to pause there for a sec any questions on anything that we've covered or questions on give internet um who we're raising money for

### [1:45:00](https://www.youtube.com/watch?v=m6v7a3sZlL8&t=6300s) Segment 22 (105:00 - 110:00)

today I mentioned this very briefly uh can I donate okay we're gonna I'm gonna try something hopefully you can still see the screen I lost my uh I lost where my live stream is coming from okay hopefully you can see this I'm little stream setion I'm now sharing the stream what I said at the start I this didn't work at all um let's see how do I donate what I was trying to say add another $25 donation on my side hopefully this doesn't yeah it hides all my information good what I was trying to say is you guys were free to uh type in after I donate do some push-ups I'm going do some push-ups right now didn't really work but uh 25 push-ups just because I want to do something how's the view that's good enough yeah you can't probably see these at all I was trying to think I could take breaks from the stream throughout and do something like this but uh I didn't I mean I didn't really make that clear all right there was my 25 all right let's I'm now I'm ready to go now we can do some more analysis um I think the last thing I wanted to do like we could have defin dove into questions more um but I think we should look at some high level kind of for all countries look at some information so this get clean data function maybe isn't the most useful I guess I could change this up a bit so that if I don't filter by countries it could just use a bunch of countries trying to think how I can do this all right so what I want to do real quick is I want to look at some like trends like I want to find certain countries that are maybe like have the least access so I think like a quick thing we might do is if we look at like our original data so if I just look up World Bank what we could do is maybe look at a specific year so like 2020 for example so I might do World Bank 2020 equals World Bank maybe I just grab the series code and I want to clean uh how did this work some of this is cleaned it's interesting how memory Works in okay so some other like analysis I might do we kind of spent a lot of time on like that transpose and stuff but um what I might do is I might just get the series code that's internet usage so we have this I might just look at a year like 2020 so I might just grab only the columns uh country name thank you for whoever liked my hairstyle uh country name and then let's say like 2020 let's just look at uh save this as

### [1:50:00](https://www.youtube.com/watch?v=m6v7a3sZlL8&t=6600s) Segment 23 (110:00 - 115:00)

2020 let's look at World Bank 2020 um all right what we might want to do like as another just interesting thing to look at is you could just do sort values um I guess Yeah by 2020 ascending do we want ascending order ascending equals false makes sense ascending oh ascending equals true so these are the top countries in 2020 and maybe we make this even 2023 but some people didn't have data so I guess it's um let's see if this works so in 2023 these are the top countries of internet usage um but some of these are missing so like I feel like the 2020 we actually saw more values and I might like a good practice is like if I have a variable like this in two places I might do selected year equals 2020 and I might even specify that this is a like a mutable variable kind of a static variable by using uppercase it doesn't actually change any of the functionality but it just might make it more clear to me that this is not changing and then I would make this selected year so if I run these now 2020 we see actually more with 99% I think it's just because some of these values haven't been filled in on the um more recent years but then if we looked at ascending equals true which I think is the default we can see some of the lowest values which I guess these are all n so maybe we go to like 2016 or something or we could do a in our filter here we could do Series code equals this and um WB uh for the year that we have selected so our selected year uh is not equal to dot and then I would surround these conditions with parentheses and so this would be kind of like not equals dot would also be kind of equivalent to as like nonn or not null does this work there we go so we see places here um WB stands for World Bank so our data is just coming from World Bank um so I just as an abbreviation instead of using DF I'm using World bank right now uh so in the selected year of 2016 we have like arria Somalia Democratic Republic of the Congo all some of the lowest places if we move this to like 2020 um I don't know what happened there let's try this again did I screw something up I don't know why I think's 100 I don't know if there's something weird that happened to this UAE number but in 2020 the values that are not null we have like Yemen Niger Somalia some of the lowest so like all these different things like I'm showing some techniques to get answers to different questions um which are all very useful um I think one last thing that I want to do is if I I think one cool thing would be to see who is increasing the most at least in recent years um it seems like we're missing a lot of data so I might do this over like until 2020 but if we get our cleaned data for all countries so iess guess what I could do if I really wanted all countries is I could do uh something like I'll show what I mean here take this I'm going to create a new sell I'm do what countries are increasing the most over or in recent years so if I wanted all countries I

### [1:55:00](https://www.youtube.com/watch?v=m6v7a3sZlL8&t=6900s) Segment 24 (115:00 - 120:00)

probably should change up how this function works I feel like I'm not great at making very clean code while doing a live stream but what I could do as a simple thing is I could get the unique um country name value so I could do worldbank country name. unique here and I think the data cleaning is being all weird I should have um made it a little bit more clear when I'm cleaning what but if I did this ah shoot this might mess something up ignore this is one spot that we have as our original name uh if I look at internet stats we have all the different countries in that nicely formatted way that we dealt with before so one cool thing that we could do I'm trying to think what the best way to do this is it might honestly be better to do it H basically I want to use this shift function so if I look at like the tail 10 or basically there's a shift function so I'm going to create a sample data frame with columns a c and numbers as values I just want to show a to example of what I'm talking about because this could be useful if you wanted to see what is growing the most in this data using pandas so if we look at our sample DF I could create a new column called like uh sample DF shifted C which is equal to sample DF C and then shifted how about two and now if we look at this data frame we have the C shifted two spots so now if I wanted to see the percent change of C I could do percent change which is equal to uh oh shoot does this actually maybe this percent change function just works maybe that would actually be a better uh way to do it I'm curious I was going to do this kind of manually but maybe we just learned a new function that I didn't actually know um percent change I don't really know how this function Works i' would have to look at the documentation what I was thinking we could do is just do percent change equals sample DFC divided by it's the new value divided by the original value shifted C uh and if we look at that so like from and maybe it makes more sense if I just shifted it by one so like from this value to the value below it so from that row we went up 9% about uh from 13 to 12 we went up about 8% uh you could even do like one minus this um to get your percent change so this would be up 9 or is it sorry it's this value minus one um I think we could but we see that we get so I was thinking that like it' be cool to see what's growing the most and maybe I should just use this percent change I'm going to look this up on Google real quick sh I'm looking at the chat right now shout out uh Michael who has a interview

### [2:00:00](https://www.youtube.com/watch?v=m6v7a3sZlL8&t=7200s) Segment 25 (120:00 - 125:00)

good luck with your interview hopefully you crush it with the pandas and numpy stuff um I also see a question about my games what do you use to plan all your games on I don't know what games are we talking about um I'm going to look up this percent change function real quick because I feel like that could be very useful with pandas percent change pandas docs the percent difference between the valent change between the percentage difference between the values in the current row and the previous row okay so what we want to do here and this will be I think the last thing that we cover in this tutorial and I'm going to just kind of show how I would use AI I guess to help me in a situation like this so we have this data frame that we've been looking at um all of these countries right and the percentage internet usage uh for given year for that country what I want is a equivalent basically data frame that's just the percent differences um from each row and so what I might do to figure this out is I might just grab internet stats I might just get like some of the columns so I going to do ILO I'm going to get all of the rows and some of the columns so how about like the first five columns and get the tail of that okay so we have this and I don't need the tail I'm going to just get the full I could yeah I guess the tail is fine yeah I think I will use percent change without shifting but I'm going to just um actually yeah I guess that should work I think if we just did let's see percent changes sometimes it's as easy as just a simple comment what happens here and if we look at percent changes actually that wow the shifting was way overly complex we could have just done a percent change uh this is exactly what I wanted so thank you for the recommendation Michael um so what we see here is the percent change uh from each row to the next row um so what we could do is like for a given year how about or you could even like lump this if you wanted to if you got more creative you could lump this in like groups of five or something so you could take the mean or something of the past five years and then do percent uh change uh but what we could do is something like we have this percent changes data frame now and for a given year so I could do something like Loke 2020 and we could just look at this value so this is now a series for 2020 and I could take uh the max value um I want not only the max value but the column that has the max value given a series in pandas return column that has the max value and the value um I don't really you oh okay I'm just looking at some comments right now uh okay yeah Max column max value index Max and Max uh so we would get okay so like for example in the year 2020 Somalia increased the most year-over-year with a percentage increase of 6% if we looked at again we could do the kind of selected year thing so maybe I go to

### [2:05:00](https://www.youtube.com/watch?v=m6v7a3sZlL8&t=7500s) Segment 26 (125:00 - 130:00)

2023 and largely the index Min is probably not going to matter much I maybe it could because you might see negative values um uh low income that's not very helpful I think if we look at like 2015 I can't pronounce this country if we looked then you could also get the Min so index Min index or and Min so I'm curious for the year 2015 uh looks like Cambodia went down in Internet percentage use 2020 uh Yemen went down just a bit it's is saying so a lot of cool things we can do using this percent change you could look at um you know if you got more creative you could do some fun stuff here and I'm trying to think uh given a series so instead of using index Max I'm trying to think of Panda's data return turn the max five indexes indices and the max five values I'm curious if we could expand this to five easily like getting the top five n largest yeah that's another good function so if we looked at the N largest indices and values that also is cool so like and you could even pair this up nicely you can make this a data frame if you wanted to so I could do something like DF equal or results equals pd. datf frame uh the data is going to be top five values and the columns are indices and now if I looked at results I think that should work maybe I have to pass these in as lists uh one sec I'll answer some questions in a sec I'm going to just finish this up why does this not work I expect this would work I must have copied something weirdly some weird blank space after just want to get this last value what is the last value here should all okay like I use copal a lot to do things uh I guess I could do it this way so you could do something like this I was trying to make these The Columns but this works too um all right I see um I'll have to check out polymars as far as whiteboards go I guess recently one platform I've used a bunch to generate like slideshows and stuff and I've you know done a partnership with them is uh pait and within if you create a slideshow VI via um pait and like the reveal JS format so I guess specifically reveal. js so this is like a JavaScript based framework if I create slideshows with reveal. js it has a whiteboard feature built in so I sometimes use that um I'm trying to think of other ways I I'm trying to think of yeah I don't know I guess I haven't done a lot of drawing on

### [2:10:00](https://www.youtube.com/watch?v=m6v7a3sZlL8&t=7800s) Segment 27 (130:00 - 135:00)

the screen recently um I have this like if I get it it's not working on my Mac anymore but I have this Wacom tablet I guess you can't see because I have this Wacom tablet that I used to have like software for where I could just draw things using the pen on this and like it would show up on the screen so I used to tutor a bunch of math on the computer and like having a tablet like this was super helpful where I could just like write out the math equations uh but I haven't I guess done a lot recently um I am analyzing the data uh what I was checking right now was what countries had the most growth in 2020 um so we see Somalia Leading The Way with Libya and Republic of Congo Ghana Etc I'm curious if we looked at like 2018 how that would change um I think it would be interesting to look at if you just took let's say in our data so we have our internet stats I feel like it'd be interesting I'm going to try something real quick if I just took the rows um if I did something like internet stats and I did I want only let's say every fifth row so I'm going to do IO this so watch what happens if I print this so this gives us every fifth row and so now it'd be kind of interesting if I said internet stats equals internet stats with only every fifth row run this and then I ran the percent change function here on 2020 then we're seeing the overall biggest increase from 2015 to 2020 so it might be more of a significance um yeah I guess Michael what I'm showing right now is the fiveyear period so like arria from 2015 to 2020 uh increased uh you know 20% Sudan 20% um so some cool stuff to see can even skip you know every 10th year uh see a th% uh increase in Sudan over that period I mean if you have let's say 2% of your if you have 0. 1% of your population on and then it you know gets to 10% that's a th% increased so maybe that's a zero or something that was listed in the 2000 period basically zero or 2010 uh any other questions I'm G to like I apologize that some of this was kind of scattered around and I think in an ideal world like I kind of showed the analysis process but in an ideal world like I would take all this analysis and package it in a much more meaningful way so it's very clear uh some of the things that we saw like you'd have clear answers of like okay this is what we looked at found um whereas this is kind of just a scattered you know look at a bunch of different things uh my preferred language is python okay I'm gonna pause there I'm gonna stop sharing my screen anyone have any questions on any of this I will upload all the code um that we dealt with here to the GitHub repo that's Linked In the description so I'm curious to check out the live stream right last chance too to donate and feel free to donate like my last donation I uh I donated and then I commented push-ups and I will do any number of any excise exer you put in there within reason if you do make a donation I will do it in these last 10 to 15 minutes but I'm also happy to just uh answer some

### [2:15:00](https://www.youtube.com/watch?v=m6v7a3sZlL8&t=8100s) Segment 28 (135:00 - 140:00)

questions right now uh this is a good question we kind of answered this a little bit earlier but I mean what I'd recommend is like the first step is you want to kind of just explore the data a little bit see what's in your data once you've done the basic data exploration I would say ask yourself questions on like what you find interesting or maybe you're working in a company setting and ask them what they want to know given this data like most companies even if they're not python data scientists or data analysts they'll be comfortable in Excel they can kind of poke around the data in Excel and they might have some questions about the data and so I'd kind of just write down all of those questions that they have and then in my head I just pick off those so like one of the questions we had was Trends over time so I know that that's a question and so I'm going to break it down into simpler steps so maybe I have to you know manipulate the data frame a bit to get to a point where I can easily showcase Trends over time uh so I'm just basically trying to gather questions and then I break down each one of those questions to do what I want to do with the data and then ultimately try to package all of up all those results into a nice report or something which we kind of skipped over given we're already two hours into this live stream I mean there's definitely no answer with how many columns should be in your data uh I just find oftentimes if you have tons and tons of columns in your data it gets hard to work with so I typically go from like a big data frame or data source and then you kind of want to just narrow your data down to a point where it's more manageable to work with but there's no one right answer here uh am I able to make a timer with sounds you could definitely do that in Python if you wanted to um I'm trying to think how I do it in Python maybe I would use like uh tkinter in Python to make like a little graphical user interface timer I think uh Pi QT it's fun name Pi qt5 I think is also a popular way to do it um yeah I have probably some videos you could look at building models I think you know my background comes in the language processing space and I think right now honestly with like NLP basically all of uh most of the work that I do I feel like I just piggy back off of the large language model company and maybe I'll do a little bit of fine tuning in addition to that but like the SK learn Library I actually did a I think a recent like logistic regression video so if you check my recent videos maybe I'll find it and link it in the chat you could see a basic a very basic it's kind of a trivial example of a um sklearn model so maybe I'll share a couple links one sec okay here's a real I shared a link I don't hopefully that goes through I'm gonna check the live stream to see if the it the link went in the chat it should be because it's sending from me yeah check out that link if you want to see a modeling um I see a question about real world projects let me I keep losing some real world projects um mean have done a bunch I like some I'm still in the works of so I might not go into Super um detailed information um so one cool project from probably a couple years ago now a very YouTube I guess specific project so in addition to making these YouTube videos I spend a lot of my time freelancing and doing different types of projects for various companies um a company reached out to me to basically try to help them find good YouTube creators that might fit their kind of uh might fit their Target demographic they were basically looking to not get into like Tut two specifics

### [2:20:00](https://www.youtube.com/watch?v=m6v7a3sZlL8&t=8400s) Segment 29 (140:00 - 145:00)

but they were just looking basically for YouTube content creators that had a team so imagine that you're a big YouTube Creator you probably have an editor maybe you have a dedicated videographer and so they wanted to be able to build a payment platform where they these creators could easily split their revenue which between let's say the YouTube Creator the uh editor the videographer and a number of other people that um you know could be part of their team and so what I did for this company was I used the YouTube API to basically look through a bunch of they had like a list of potential creators they wanted to look at that met a certain criteria so like let's say 100,000 plus subscribers so they had this list and basically I programmatically looked through their uh YouTube videos and I looked to see which of those creators um consistently had mentions of like an editor a videographer Etc in their videos and basically what I returned back to the client was like hey these are the creators that in like let's say 80% or more of their videos mention in editor and a videographer and I think that they would be good for you to Target so it's kind of a combination of like understanding how people write their YouTube descriptions uh and then basically writing a program to automatically try to find information in the YouTube description YouTube video description that would represent that Creator is probably part of a big team and then basically uh filter that list down um to just uh yeah just the creators that they're most interested in uh so I haven't really scraped YouTube and the reason I haven't scraped YouTube is because YouTube has an API and I would say that web scraping is super powerful but if you ever are trying to web scrape a website that already has an API usually you're going to just have better success using their built-in application programming interface their API so like because YouTube has an API I did things via their you their API over versus like versus trying to do special like beautiful super selenium scraping um so yeah the HPI is kind of the better use case for YouTube specifically so I didn't try scraping it I just I basically scraped it programmatically by using their API I found a bunch of creators I found their channels but this was all like done through ways that YouTube allows I didn't want to anger YouTube uh I'm not sure if it has an API for personal history like what videos maybe you watch or whatnot um another kind of cool project just because it's uh documented on YouTube is I did a kind of a historical research analysis project uh about I don't know a year ago now uh two years ago over the past kind of two years basically we had a bunch of historical documents from what's known as the Freeman's Bureau so the Freeman's Bureau was set up in the United States after the end of the Civil War to basically help um Aid formerly enslaved individuals and kind of help them transition into this post Civil War American society and there's a bunch of documents from that Bureau and basically I was on a team I helped kind of manage a team to analyze all of these hundreds of thousands of documents from that Bureau and see if we could figure out insights from those documents and given this was like you know this was a paid freelance project but one cool thing is uh I did make a video very much going into details on how we approach this project so I'm also going to share in the chat um the solving real world data science problems with llms um video that I made previously I recommend checking out uh honestly the AI models are probably even better now than when I was using that but it kind of shows some cool ways that you can be creative and leverage large language models in a real world use case so I recommend checking out the solving real world data science problems with llms video that I shared in the chat but also you can just look at my channel to find um I see another question have you ever done project in iot and Smart Homes project or know something about it um I feel like this goes way back and I just I feel like always reference what I've already done on my YouTube channel I have programmed uh Amazon alexas to do special things I actually have I think some of my earliest videos ever on my channel I'm just going to be shamelessly promoting myself uh right

### [2:25:00](https://www.youtube.com/watch?v=m6v7a3sZlL8&t=8700s) Segment 30 (145:00 - 150:00)

now but some of my oldest videos on this Channel show how to program Alexa but it probably is outdated at this point but if you're curious um I can share this video from six years ago and maybe there's still some tidbit bits of useful information I haven't done a lot recently I feel like it would be cool to do more iot things it would also be really cool to I feel like document like let's say maybe using Philips Hue Lights or like uh using let's say Alexa or something like integrating all these different components and having really cool like voice commands to trigger a bunch of different things all at once feel like that would be really cool to document because you could show the educational component but there's also pretty entertaining component if I say like code red and my entire house turns red like would be fun so definitely something maybe to explore in the future uh this is an interesting question what courses certifications do you recommend for data science um I feel like my perspective is like I don't have as a formed of a perspective here as I could because I kind of went down the traditional path I got a degree in computer science um Etc so I don't know I'm not like the most educated on you know certifications or courses that kind of are accessible to anyone I mean I personally like there's no actual think certification maybe that comes out of it but there's a lot of great MIT open courseware stuff I guess I'm biased on the MIT side so looking up like MIT ocw you can find a lot of good courses but in a lot of regards for like data science uh I would specifically just focus on learning python first and honestly just watching YouTube videos is a great way to do that um and then once you get to a certain point with your python knowledge just think of something cool that you can build uh I'm trying to think of like exam I feel like this is always the question of like oh build a cool portfolio project um trying to think of a cool like simple example that I might do so I guess a cool project that I have an idea for that maybe I'll make a video on at some point but just to give you an example like think about things in your daily life that maybe you could use python or code to like help improve or do something cool with something for me is that I've been going to the gym a lot recently and I document my workouts in like a little notebook but I just like I don't want to use my phone while I'm at the gym so I just write my workouts down in my little notebook uh a cool python project I might recommend building off of this like concept is like I would love to convert my written notebook basically take SC like pictures of each page and then write a Python program to automatically convert that into um text and then format the text and like store it as like in a database or something like that full project would be an excellent project and if I was trying to get a job in data science if I was able to document that project well and this is a more complex project you don't have to have a project that's as complex as this but if I was to document a project like that and then like just share it with an employer and I have videos on you know how to make your GitHub more interesting and whatnot but like if I shared a cool project like that with an employer I think that often times goes as far as like showing a certification or even a degree sometimes so building cool projects and really building a like you want to get to the point where if you had an idea and you could logically think about the steps that you need to take like to execute on that idea that you could convert that into code so you want to be like comfortable with um just building projects and starting from blank slates and knowing how to turn logic into code I rambled on a bit there but hopefully that kind of makes sense to what I'm saying on the answer uh I don't know if I understand exactly what you're saying here oh just maybe like teaching courses feel free to reach out to me on LinkedIn regarding like stuff like that um a good question here though is freelance work how do you get freelance work I probably said this before but just to kind of reiterate on getting freelance work um i' recommend like trying to figure out a skill that's like very needed in the freelance world so like I do a lot of freelance work on upwork and a ton of people need help scraping websites so like my first goal would be to get good at scraping websites uh um directly scraping

### [2:30:00](https://www.youtube.com/watch?v=m6v7a3sZlL8&t=9000s) Segment 31 (150:00 - 155:00)

websites uh and on upwork let's say like I would start trying to Market my skills uh pretty cheaply like maybe below what I'm worth but what I want to do is get a couple projects under my belt that I can get some reviews for so like if you can earn a couple five-star reviews doing a project for you know very little money those five-star reviews will go a long way that you can start you know doubling how much you charge each project like you basically start slow with a skill like web scraping get really good at web scraping and then you know each time you get a new client charge a bit more than the last time and I feel like this slow buildup you start can accumulate some reviews in a platform like upwork and then can kind of like take that and run with it and it becomes bigger and better opportunities I don't know if that made sense at all I'm like getting a little tired so I might jump off in a sec thank you Michael uh and yeah obviously no pressure to donate but if anyone has the ability to donate we're ultimately just trying to get more people access to laptops um internet access Etc uh anyone else have questions I might head out in a second if anyone else wants to make a final donation actually the donation will still be there but I would if there's anyone that's been on the fence now is the time to do it uh but it will be there after this live stream ends I'll leave it up at least I think right now it's set up to stay up for 50 days but uh have I developed an agentic AI app done a little bit here uh at a startup that I was working at I wasn't super directly writing a lot of the agent type code myself but um basically we had this uh AI coach and sometimes uh we would want it to just like this is kind of uh making it a little bit more trivial than it is but sometimes let's say we just wanted an llm like chat GPT to directly respond to the query and then other times maybe it was something like hey what is the weather today where that we knew that the AI from chat GP wouldn't know the answer to like what is the weather today without going to this to um scrape the web so like the agentic component was trying to figure out what tool to use when so sometimes to use a search sometimes to just use chat GPT sometimes to use chat GPT with like extra context we sprinkled in um so I've done a little bit of that uh might be something to explore more um in a video in the future something I've solved many leak code questions over the day over the years I would recommend like one strategy I like with leak code is if you're learning Python and like still kind of developing leak code is a great way to get your logic skills down so like what I recommend to some people that I teach python to is like maybe just set the habit of doing one leak code problem a day like first thing when you wake up do your leak code problem then carry out your day or maybe the last thing you do before bed but if you can set that consistent um practice leak code I think can go a long way because you get build up that muscle memory for how to like logically think about problems in Python uh so there's no like one right answer to how many that you should solve but if you're learning python you know doing a few each day is a good approach or like doing one each day and just trying to be consistent is often times better than trying to um power through a ton in one day and like cram I would also I guess add to my answer that you want to kind vary what types of things you practice on leak code they have a good they do a good job of breaking it down into like different categories so maybe do uh you know some data structure stuff one day and then maybe look at like dynamic programming in the next Etc all right any last things I'm going to check this stream I'm curious what the donations are at I think it's still 252 I see but I do see some people donated via the website but $252 that's awesome so I love to see any amount and they all the donations are getting matched so even if it's $1 $2 anything if uh I encourage anyone to donate but I appreciate everyone that has and I also know that not everyone can but hopefully this was educational um I had fun I definitely got tripped up a bit throughout times during the session uh this session will stay up as

### [2:35:00](https://www.youtube.com/watch?v=m6v7a3sZlL8&t=9300s) Segment 32 (155:00 - 157:00)

uh if you look at the live um live stream tab of my YouTube channel this video will remain up for uh you know the entirety of my YouTube history I might try to also go back and retroactively type up some timeline stuff for what we did so people can navigate it a bit easier cuz it is 3 hours but it will be up it's here for you to watch I'm sorry that you just joined at least you know that it will be up uh um yeah I think that's about it any one maybe I'll do one more question if there's one more question in the chat thank you Michael you have a good day month and year as well all right thank you everyone for tuning in this was fun hopefully I will share some new YouTube videos soon I'm finishing up a couple projects and then I'll kind of get back to being a bit more disciplined here on YouTube try to do some more live streams as well till next time everyone thanks for watching um thank you for give internet for reaching out to me and providing the opportunity to you know do some positive and try to raise some money I think that that's one thing that I would like to do more of with my channel is just try to give back a bit I think I sometimes get siloed and just think about uploading the next video but kind of forget the human aspect that's involved uh I don't know just the human aspect of having a platform like YouTube and being able to uh share content with people from all around the world so hope to do more stuff like this uh in the future all right take care everyone peace

---
*Источник: https://ekstraktznaniy.ru/video/44518*