Data Science SQL Interview Question Walkthrough | SQL Sundays #5
13:34

Data Science SQL Interview Question Walkthrough | SQL Sundays #5

Tina Huang 05.10.2020 14 697 просмотров 349 лайков обн. 18.02.2026
Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
Hi friends! Welcome back to SQL Sundays! A weekly SQL data science interview questions and answers walkthrough. This is another data science SQL interview question and answer done from beginning to end in a real interview style. 🔗Affiliates ======================== My SQL for data science interviews course (10 full interviews): https://365datascience.com/learn-sql-for-data-science-interviews/ 365 Data Science: https://365datascience.pxf.io/WD0za3 (link for 57% discount for their complete data science training) Check out StrataScratch for data science interview prep: https://stratascratch.com/?via=tina 📲Socials ======================== instagram: https://www.instagram.com/hellotinah/ linkedin: https://www.linkedin.com/in/tinaw-h/ discord: https://discord.gg/5mMAtprshX 🤯Study with Tina ======================== Study with Tina channel: https://www.youtube.com/channel/UCI8JpGrDmtggrryhml8kFGw How to make a studying scoreboard: https://www.youtube.com/watch?v=KAVw910mIrI Scoreboard website: scoreboardswithtina.com livestreaming google calendar: https://bit.ly/3wvPzHB 🎥Other videos you might be interested in ======================== SQL Sundays Playlist: https://www.youtube.com/playlist?list=PLVD3APpfd1tuXrXBWAntLx4tNaONro5dA How I learned SQL from Scratch in 11 Days to Pass my FANNG SQL Interview: https://www.youtube.com/watch?v=vaD3ZFFNwhM How I consistently study with a full-time job: https://www.youtube.com/watch?v=INymz5VwLmk How I would learn to code (if I could start over): https://www.youtube.com/watch?v=MHPGeQD8TvI&t=84s 🐈‍⬛🐈‍⬛About me ======================== Hi, my name is Tina and I'm a data scientist at a FAANG company. I was pre-med studying pharmacology at the University of Toronto until I finally accepted that I would make a terrible doctor. I didn't know what to do with myself so I worked for a year as a research assistant for a bioinformatics lab where I learned how to code and became interested in data science. I then did a masters in computer science (MCIT) at the University of Pennsylvania before ending up at my current job in tech :) 📧Contact ======================== youtube: youtube comments are by far the best way to get a response from me! linkedin: https://www.linkedin.com/in/tinaw-h/ email for business inquiries only: hellotinah@gmail.com ======================== Some links are affiliate links and I may receive a small portion of the sales price at no cost to you. I really appreciate your support in helping improve this channel! :) Music Credit: Unity - TheFatRat

Оглавление (3 сегментов)

  1. 0:00 Question Walkthrough 465 сл.
  2. 3:16 Solution Walkthrough 613 сл.
  3. 9:21 Analysis 605 сл.
0:00

Question Walkthrough

hello my friends it is time for another sql sunday all right let's jump straight into it alright so today's question is from microsoft uh it is called download fax find the total number of downloads for paying and non-paying users by date include all records where non-paying customers have more downloads than paying customers the output should be sorted by earliest date first and contain 3 column three columns date non-paying downloads and paying downloads okay so sorry about the noise i am clearing the space on this table all right so for ms user dimension we have user id account id and we have ms account dimensions account id paying customers and we have ms download fax which is date user id and downloads let us look at these tables so ms user dimensions okay it has user id indeed and their account id okay so i'm curious if the user id one user only has one account or it is not the case that's a question that i'm going to be keeping in the back of my head um and another one see let's see ms account dimensions account id and paying customer so i'm assuming paying customer will show me some sort of data um either identifying that it's a paying customer or not or it's a table that only has paying customers so let us see okay cool so this is a table that um has account id and it tells you if it's a paying customer or not so i assume that it's either no or yes and for ms download fax this is a fact table that tells you date user id and the downloads okay it is look at the expected output um so it's going to be date non-paying and it's going to be paying and it's sorted by date okay cool this question has quite a few steps to it so let's try to break this down um so let's see find a total number of downloads from paying and non-paying users by date so we only use we only want to include records where non-paying customers have more downloads than paying customers so that's going to be a filter over there and then the output should be sorted by earlier state first and contain three columns okay so some of the assumptions i'm going to make here as i was saying earlier the um actually okay so for it for use i was talking about how like user each user would only have one account id um i don't know if that actually matters here because even if they did have multiple account ids and then they used it on every single like multiple times a day we're like you know using different accounts
3:16

Solution Walkthrough

since we're just summing up the different downloads they're doing i don't think it actually matters in this case all right let us tackle this question okay let's see so first we want to do some joins so we're going to be joining together these tables ms user dimension ms account dimension and ms download fax okay and then after that we will be we'll be grouping by date and for each date we want to use a case one so we want to sum all downloads for non-paying customers and also for paying customers and finally uh oh yeah we want to sort by earliest date so we want to order by date and then finally include only records where non-paying customers have more downloads than paying customers so this is going to be another filter so okay so non-paying customers downloads is greater than paying customers cool let's get started on the query okay let us see so from ms user dimension join ms account dimension join ms download fax so for this joint oh no okay for this drawing we are joining on uh let's give this name so user account dimensions call that a and download fast call that f so on you dot what are we joining on um account id is equal to a dot account id and for this one we are joining on you don't user id is equal to f dot user id oops cool um group by date so i'm going to write the select name in first select so we want the date and then so we want to sum all downloads from non-paying customers so here we're going to need to use the case one statement so we can do case one what's it called paying customer is equal to no then downloads and as uh is it called non-paying so this will for each time that we have a paying customer uh not paying customer then we're gonna get the number of downloads and what we want to do here is that we actually want to sum this together and that will give us all the downloads for that day for the non-paying customer and we want to do the same for the paying customer so in case when paying customers equals yes then downloads and as paying okay so we are going to be grouping by the date here and then we want to order by date as well um we want it ascending right yeah so we can just leave that and finally um so we need to do this filter so there's like a few ways of doing this i generally prefer using like a cte um because i think it's just a little bit clearer to read for the interviewer and for myself so let's do that um it's kind of like with temp as not the best name but whatever it is okay we attempt as that and then we will say from temp we will select all and where what is it again non-paying customer is greater than paying customer so non-paying is greater than pay this is a longer query it's also a little bit harder so let's just make sure that we actually got everything right um let's see from ms user dimension okay we're joining these together that looks fine okay group by order by temp as okay we're finding a sum of it and then we're also finding some case when paying customers equal to yes then downloads cool um okay i think this should be correct shall we check the solution all right okay yay you got it right
9:21

Analysis

hey okay um i want to actually see here because this is like um i wonder what they did in terms of their solution um let's see okay so this is their inner query so let's also look at that some case when okay so this looks about the same um group by date order by date okay so and then select here yeah so they over here they did like a group by statement and then a having statement um yeah i think that's yeah that's that's the same here i don't know if you need to order it again over here um this is already in order i don't know i don't think you actually need to order again but i'm not sure um yeah i mean i guess like having it in there is better so better safe than sorry i can definitely see like there's some uh servers in which if you don't explicitly order it again then it would not it would be out of order but i think it's like a pretty minor detail so it's okay if you didn't catch that i mean i didn't catch that um so i think that's correct oh they did a left joint here so okay join on both of these why did they do a left join um ms user dimension okay so i would actually argue that maybe you shouldn't do a left join yeah so i think joining here my reasoning is because if you're joining these together and say like you're joining together the user dimensions and then the account dimensions and it doesn't exist on account dimensions then you could get um that you wouldn't know if they're a paying customer or not you might just you would just get like a bunch of nulls in that case and the same over here so i think um i guess like it doesn't really matter um that much in this case because it seems like all the um what is it called the dummy data as well as like the checks that they have are really just not really looking into that case um and also because you are explicitly saying like when paying customers you go like yes but when paying customers equal to no um so i guess like it's okay and you're also summing it yeah so i don't think it actually makes that much of a difference um in this case okay cool enough of my ramblings um yeah in terms of like what we can do to optimize or improve it you know this is like this is a good question i like this question i would love to see your solution for this one because i feel like there's a lot of ways of approaching this one um and i will not doubt that you guys come up with much more optimal ways of doing things than i have um even yeah so i would love to see that leave it as a comment below um but currently like off the top of my head i can't really think of anything that will make him more optimized maybe i don't know if there's a way of doing it so that you can only have one pass through but if you guys i can figure out a way of doing that please do let me know that's it for me and that's it for this episode of sequel sunday if you guys want to check out strata scratch check it out in the link below alright see you guys in the next video

Ещё от Tina Huang

Ctrl+V

Экстракт Знаний в Telegram

Транскрипты, идеи, методички — всё самое полезное из лучших YouTube-каналов.

Подписаться