Data Science SQL Interview Question Walkthrough | SQL Sundays #9
13:53

Data Science SQL Interview Question Walkthrough | SQL Sundays #9

Tina Huang 01.06.2021 64 575 просмотров 1 842 лайков обн. 18.02.2026
Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
Hi friends! Here's a SQL Sunday as part of the SQL Sundays series in celebration of the launch of my SQL for tech and data science interviews course! I know I know it's Tuesday :x Can we let this one slide? SQL for tech and data science interviews course with 10 mock interviews: https://www.udemy.com/course/sql-for-tech-and-data-science-interviews/?couponCode=DF947818F15A561AE84E ($12.99 promotion until July 18th, 2021!) [expired] https://www.udemy.com/course/sql-for-tech-and-data-science-interviews/?couponCode=A92E9C5BFB043A8FF039 ($9.99 promotion until June 6th, 2021!) discord: https://discord.gg/5mMAtprshX livestreaming google calendar: https://calendar.google.com/calendar/u/1?cid=cDBtOGgxOG1waW92bTJxYzdpZmkzNmgwODhAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ ______________________________________________________________________ Timestamps 00:00 intro 00:08 question 01:51 assumptions 02:47 answer 10:43 check answer 11:53 announcement ______________________________________________________________________ Other videos you might be interested in How to learn data science in 2021: https://www.youtube.com/watch?v=Axu4tJl8gbM The resume that got me into FAANG as a data scientist: https://www.youtube.com/watch?v=vx-x-yXXE9I ______________________________________________________________________ Subscribe: https://www.youtube.com/channel/UC2UXDak6o7rBm23k3Vv5dww/?sub_confirmation=1 ______________________________________________________________________ SQL for tech and data science interviews course with 10 mock interviews: https://www.udemy.com/course/sql-for-tech-and-data-science-interviews/?couponCode=A92E9C5BFB043A8FF039 ($9.99 promotion until June 6th, 2021!) Real SQL interview question walkthrough series: https://www.youtube.com/watch?v=Td-cmLfQ7uU&list=PLVD3APpfd1tuXrXBWAntLx4tNaONro5dA Check out StrataScratch for SQL interview prep: https://stratascratch.com/?via=tina ______________________________________________________________________ About me Hi, my name is Tina and I'm a data scientist at a FAANG company. I was pre-med studying pharmacology at the University of Toronto until I finally accepted that I would make a terrible doctor. I didn't know what to do with myself so I worked for a year as a research assistant for a bioinformatics lab where I learned how to code and became interested in data science. I then did a masters in computer science (MCIT) at the University of Pennsylvania before ending up at my current job in tech :) ______________________________________________________________________ Contact youtube: youtube comments are by far the best way to get a response from me! linkedin: https://www.linkedin.com/in/tinaw-h/ email for business inquiries only: hellotinah@gmail.com *If you're reaching out through linkedin, please leave a youtube comment just letting me know that you reached out :) ______________________________________________________________________ *The StrataScratch affiliate program give me a small portion of the sales price at no cost to you. I really appreciate your support in helping improve this channel! :) #SQLSundays #DataScience #SQL #TinaHuang

Оглавление (6 сегментов)

  1. 0:00 intro 25 сл.
  2. 0:08 question 263 сл.
  3. 1:51 assumptions 189 сл.
  4. 2:47 answer 982 сл.
  5. 10:43 check answer 162 сл.
  6. 11:53 announcement 395 сл.
0:00

intro

hi welcome to another sql sunday where i go through a walkthrough of a real data science interview question let's get started all right so
0:08

question

this question over here is growth of airbnb so let's read the question estimate the growth of airbnb each year using the number of hosts registered as the growth metric the rate of growth is calculated by taking the number of hosts registered in the current year minus previous year divided by times 100. okay i'll put the year number of hosts in the current previous year and the rate of growth round area grew to the nearest percent and ordered the result in the ascending order based on the view assume that the data set consists only of unique hosts meaning that there are no duplicate hosts listed okay so let's actually check out the table so this is a really big table um a lot of stuff going on over here but really though like what we actually want to find ultimately is the year the number of hosts in the current year number of posts in the previous year and the growth rate so we actually don't need most of the columns over here so this is what the table looks like um it has like a bunch of stuff in it so really what we need here is just the ids and we know that a data cell only can some unique hosts so each of these ids would be one single host and we also need the year that they registered posted this column that's pretty much it yeah we actually don't need any of the other columns here even though there's a lot that's being shown okay
1:51

assumptions

so some of the assumptions that i have um looking at this table now and just looking at the question um so my first assumption was answered already it was like the data sentence is only unique host okay so they are only unique hosts um the other assumption that i want to make here is because we're looking at rate of growth that means in the first year the rate of growth there would not be rate of growth because there would be no previous year so if i were an actual interview i would probably ask the interviewer like what would you like me to do about that scenario like should i just leave it as no leave it as is or should i you know get rid of that rule or do something like that so the assumption that i'll be making here is just that i will have a row um that will have some blanks in it and just leave it as it is that's pretty much the only assumption that i have so before i get started and actually dive into the query
2:47

answer

i'm gonna go through my general strategy of first writing out exactly how i'm going to approach this question in english and then go and write the query let's first write down what it is that we ultimately want so we want the ear and the number of hosts i guess i call it num host current and the num hosts previous and we also want the growth rate so that's the ultimate table that we want okay so now how i would approach this question okay so first i'm gonna have a cte over here um that is gonna just get the year and the number of posts for that year um and then after that i'm gonna do a self point of this table over here and that's going to be based on so self join on um so it's like the previous year is equal to current year minus one right um and it's this is gonna be like a left join because we want that first row as well okay and then after that i do the join and then what i would do is get the current year um and the number current num what did i call it num hosts current and the num hosts previous and i also want to get the growth rate which would be calculated by um current year so it would be like num host current minus the num host previous uh divided by the num host sees around that multiply by 100 so that would mean a growth rate and then ultimately i want to get the iran to round the growth rate two nearest percent um and order by the current year okay so this looks like the correct approach to me so i'm just going to go ahead and actually write the query oh i spelled this wrong order okay so to actually write this create quick query um i would do like from this table airbnb search details um and i want to select the year which is not called year it is host hostess it's called host since and i want to actually extract the year from that because this is like a date um so i want to extract oops ear from post-its i'm just going to call that ear um and then i want to get the number post so that would be i think it's id i'm pretty sure it's id yeah id so and then i want to get the um count id as numbers register that year and we don't need to do like distinct over here um because it says that it's unique already okay and then we need to group by one which is the year over here uh this is going to be cte we're going to call this table registers let's call it registers as like that um and then we're going to do the joins now so from registers let's call that current year um join left join registers previous year um and we are going to be joining on it's left join previous zero is equal to occurring or minus one so oops current year dot year is equal to previous year dot here minus one no plus one wait oh no did it did the other way around oh okay i'm just going to do it the way that i wrote it although it doesn't really matter that much previous year is equal to current year dot year minus one okay so i did that and then i want to get the current year numpost current number closed previous as well as the growth rate okay um so for that i would get the current year dot year and then i want to get the current year dot num host uh we can call that what did i call it num post current and then we can get the previous year dot num host as num hosts pbs and we want to get the growth rate so that is the current year dot num hosts minus the previous year dot numbers divided by multiply by 100 did i put a space in here no i didn't okay um and then we want to put parentheses around this and then also just multiply that by 1. 00 so that it won't be integer division um and then we're going to call that um okay and then next we have to round the growth rate and an order by current year okay so to do that we can just do round over here that should do by the percent and then we want to order by the year so order by one which is the current year so let me just go over this query and make sure that i did everything correctly pull this down a little bit uh let's see so registers here okay from airbnb search details i want to do extract the year from host since the year county ids okay so i get year and number of hosts and then um i'm doing the join over here so from register's current year left my registers previous year on previous year is equal to current year minus one that makes sense and then i'm going to get the current the year of the current year and then get the number of hosts for the current year and number host for the previous year and then finally the um growth rate which is current year number of host minus previous year times 1. 00 divided by the previous year um and then multiply that by 100 as the growth rate and then we round by the growth rate okay so we did that and then finally we order by the current year so ordered by one okay so that looks correct to me
10:43

check answer

fingers crossed let's see if it works hey nice okay yep this is what it looks like just to double check look at their expected output um 200 2009 uh navajo's current previous year growth rate cool i mean that looks correct awesome so in terms of efficiency and optimization here uh so we do have a self join over here you know what i think we can actually also just do this with a window function yeah i think we can actually do that with a window function as well why do i actually leave that to you guys with homework write in the comments below um how you would approach this question using a window function and also if you think that it'll be more efficient or less efficient than the way that i've done it here using a self join so before i wrap up this video i do have an announcement my sequel for data science and tech
11:53

announcement

interviews course in partnership with 365 data science has launched this week it's a step-by-step guide on how to ace the sql portion of the interview complete with 10 full mock interviews so you may actually be wondering what's the difference between that course and the sequel sundays and all the other free content i already have on this channel let me explain if you follow this free video where i outline how i pass my own fang interview and practice the sequel sundays in the style that i explained in that video you would be mostly there i mean that's how i did it myself and i didn't even have the sequel sundays or anything like that and i was clearly fine this course though is for people who are looking for a little bit more and a resource that i personally wish that i had when i was interviewing it outlines the exact steps to do to learn enough sql to pass the interviews but i think the biggest value is my guided walkthrough and coaching through 10 full mock interviews unlike the sequel sunday mock interview walkthroughs the 10 full mock interviews has an actual interviewer that asks follow questions gives clarifications and provides feedback i also walk you through how to install pg admin which is for developing and using a postgresql database and then i provide you the data and sql script that adds the interview questions data into the database so you can actually answer the interview questions yourself it's basically as close to the real interview as you can get my goal is for you to feel like you have done 10 full sql interviews before your actual interview so that your actual one will feel just like another mock and finally you will also get my direct support if you have any questions i honestly wish i could answer every single question on youtube of course but unfortunately if i did that i would literally not do anything else in my day so i will be prioritizing answering all questions from the course i've linked a coupon code in the description so you can get it for 9. 99 on udemy for a limited amount of time if you're interested please do check it out and i'll see you guys in the next live stream or video

Ещё от Tina Huang

Ctrl+V

Экстракт Знаний в Telegram

Транскрипты, идеи, методички — всё самое полезное из лучших YouTube-каналов.

Подписаться