# Data Science SQL Interview Question Walkthrough (real interview style) | SQL Sundays #8

## Метаданные

- **Канал:** Tina Huang
- **YouTube:** https://www.youtube.com/watch?v=ZSXjj0HW6Kg
- **Дата:** 28.10.2020
- **Длительность:** 7:07
- **Просмотры:** 10,723
- **Источник:** https://ekstraktznaniy.ru/video/3586

## Описание

Hi friends! Welcome back to SQL Sundays! A weekly SQL data science interview questions and answers walkthrough.

This is another data science SQL interview question and answer done from beginning to end in a real interview style.

🔗Affiliates
========================
My SQL for data science interviews course (10 full interviews):
https://365datascience.com/learn-sql-for-data-science-interviews/ 

365 Data Science: 
https://365datascience.pxf.io/WD0za3 (link for 57% discount for their complete data science training)

Check out StrataScratch for data science interview prep: 
https://stratascratch.com/?via=tina

📲Socials 
========================
instagram: https://www.instagram.com/hellotinah/

linkedin: https://www.linkedin.com/in/tinaw-h/ 

discord: https://discord.gg/5mMAtprshX

🤯Study with Tina 
========================
Study with Tina channel:
https://www.youtube.com/channel/UCI8JpGrDmtggrryhml8kFGw

How to make a studying scoreboard: 
https://www.youtube.com/watch?v=KAVw910mIrI

Sc

## Транскрипт

### Introduction []

hey guys this is sql sunday number eight let's jump into it all right so today's question is from google called activity percentile find the email activity percentile for each user email activity percentiles defined by the total number of emails set the user with the highest number of emails sent will have percent help one and so on i'll put the user total emails and their activity percentile and order records by the total emails in descending order okay let's take a look table so google gmail emails id from user to user as well as the day okay and the expected output here is going to look like user total emails and activity percentile okay let's take a look at the table here okay so it seems pretty standard this is pretty much what i expected so it's zero from user to user and we have the day over here so we want to get the user the number of emails that they sent and as well what percentile it is that they rank cool so let's look at the expected output okay so this is a little unexpected on my part um when they said activity percentile i assume that the end tile here would be you know what percentage or what percent how that person is in terms of their total emails but actually seems like it's just the ranking in terms of total emails um let's see here okay it looks like it's just total ranking and also the ranking so you have 1919 but ranked one two so it's you know ranked like where you want each number and just because they have the total number of emails is not ranked the same cool all right so it's assumptions up

### Approach [2:07]

front uh i kind of stated it already it's the fact that if you have total number of emails then you would actually be ranked different ones as we can see over here um i don't think i don't think i have any other assumptions to make um let's see yeah looks about right all right let us write our approach so we're gonna say here we're going to group by from user and then we're going to count number of emails so that would just be count and then after that we will get the rank so it says activity percentile here but we can actually get the row number window so we're going to use a little nifty little window function here all right so let's actually approach this question so from google gmail emails so i had our select statement from user i want to do the count oops count as total emails and for our function a window function so i'm just going to write this down i'll explain it a little bit as well what this means so row number over order by um count here and we want it in descending order and we'll call that and tiles okay so row number here actually means um we're ranking it we're ordering it by the count which is the total number of emails by descending because we want the total emails that are the highest to show up first and we're getting the row number of that so it's just ranking it from one all the way up to however many people that there are and because we want it so that everybody gets a different number despite the fact they have the same total number of emails that's why i want to use row number as opposed to rank functions all right cool and we have to group by one that should work so let's see uh shall we check it i think this looks right to me okay cool yep yep this all looks good to me cool looks like we got it right um so in terms of

### Solution [4:55]

anything i can do to actually make it more efficient oh actually let's check out their solution first okay so they did a sub query from user count and then grouped it over here and they used n tile so n tile also works it's just another function um that you can use to rank it in this case so yeah you should get the same results using end tile both would work cool so it looks pretty much the same here all right yeah so in terms of efficiency um i think that my approach might be a little bit more efficient because i don't do a sub query but except for that i think it's pretty similar i also don't really see how we can improve it that much more as always let me know in the comments below what your approach is especially since we've never seen a window function before so i would love to see how you guys think about what that um and if you also use the window function if you didn't use a window function i'm also really interested to see how you did it then okay so in terms of improvements to make um i don't in terms of like what we can do after this so since we're actually looking at the ranking i think it would actually be useful to go ahead and also find the actual percentiles of these people and grouping it together so what percentile each person is and what they belong to so last homework question for you guys why don't you also modify this function so that we can see the percentiles for each of these um users in terms of their total emails let me know what your solution would be in the comments below all right good luck everyone on their sequel data science interviews if you've been following the series all the way from number one to number eight here if you follow through and approach the questions like i suggested i am very confident that you'll do super well on your actual interview alright see you guys in the next video
