Data scientists be like...
13:47

Data scientists be like...

Tina Huang 14.03.2022 40 416 просмотров 1 594 лайков обн. 18.02.2026
Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
The first 1000 visitors to https://www.shortform.com/tinahuang will receive 5 days of unlimited access and a 20% discounted annual subscription. In this video, I talk about what data scientists do these days. 🔗Affiliates ======================== My SQL for data science interviews course (10 full interviews): https://365datascience.com/learn-sql-for-data-science-interviews/ 365 Data Science: https://365datascience.pxf.io/WD0za3 (link for 57% discount for their complete data science training) Check out StrataScratch for data science interview prep: https://stratascratch.com/?via=tina 🎥 My filming setup ======================== 📷 camera: https://amzn.to/3LHbi7N 🎤 mic: https://amzn.to/3LqoFJb 🔭 tripod: https://amzn.to/3DkjGHe 💡 lights: https://amzn.to/3LmOhqk 📲Socials ======================== instagram: https://www.instagram.com/hellotinah/ linkedin: https://www.linkedin.com/in/tinaw-h/ discord: https://discord.gg/5mMAtprshX 🤯Study with Tina ======================== Study with Tina channel: https://www.youtube.com/channel/UCI8JpGrDmtggrryhml8kFGw How to make a studying scoreboard: https://www.youtube.com/watch?v=KAVw910mIrI Scoreboard website: scoreboardswithtina.com livestreaming google calendar: https://bit.ly/3wvPzHB 🎥Other videos you might be interested in ======================== How I consistently study with a full time job: https://www.youtube.com/watch?v=INymz5VwLmk How I would learn to code (if I could start over): https://www.youtube.com/watch?v=MHPGeQD8TvI&t=84s 🐈‍⬛🐈‍⬛About me ======================== Hi, my name is Tina and I'm a data scientist at a FAANG company. I was pre-med studying pharmacology at the University of Toronto until I finally accepted that I would make a terrible doctor. I didn't know what to do with myself so I worked for a year as a research assistant for a bioinformatics lab where I learned how to code and became interested in data science. I then did a masters in computer science (MCIT) at the University of Pennsylvania before ending up at my current job in tech :) 📧Contact ======================== youtube: youtube comments are by far the best way to get a response from me! linkedin: https://www.linkedin.com/in/tinaw-h/ email for business inquiries only: hellotinah@gmail.com ======================== Some links are affiliate links and I may receive a small portion of sales price at no cost to you. I really appreciate your support in helping improve this channel! :)

Оглавление (22 сегментов)

  1. 0:00 Intro 235 сл.
  2. 1:13 Long-Term Projects 158 сл.
  3. 1:58 1. Form a hypothesis 14 сл.
  4. 2:01 Begin testing 15 сл.
  5. 2:04 Report findings to team 14 сл.
  6. 2:09 Hope it gets passed along 276 сл.
  7. 3:21 Fixing dashboards (with project manager) 36 сл.
  8. 3:29 Fixing models (with ML/software engineers) 34 сл.
  9. 3:37 Improving data quality (with data engineers) 40 сл.
  10. 3:49 2. Ad-Hoc Requests 684 сл.
  11. 6:48 Metrics and Measurements 51 сл.
  12. 7:00 GUARDIANS OF THE 15 сл.
  13. 7:06 How to measure the success of your team (based on company goals) 45 сл.
  14. 7:17 Consider possible counter-metrics 127 сл.
  15. 7:52 Forecasting metrics use time series or ML models 37 сл.
  16. 8:04 Monitoring is used to check on your progress and spot problems 382 сл.
  17. 9:57 Rules to Manage Al's Unintended Consequences by Bob Suh May 21, 2021 108 сл.
  18. 10:29 Used to make sure that changes to features or products are unbiased and non-discriminatory 69 сл.
  19. 10:50 Important to know statistical distributions, power analysis, sample sizes... 181 сл.
  20. 11:44 The statistics you use in big tech are different from the statistics you learn in school 100 сл.
  21. 12:15 Planning for the Future 96 сл.
  22. 12:39 Work with a team to scope out projects, figure out timing, and budget 201 сл.
0:00

Intro

this video is sponsored by short form but more about them later in the video hey could you post some quick data on our product's engagement gaps we have a meeting with leadership in like two hours oh yeah sure why don't i just go to our perfectly clean already conveniently there uh product engagement gaps data set hey i noticed that our metric fell today by one percent do you know what could be causing it oh yeah i did notice that as well that's quite interesting you know that the metric fell recently yeah oh the analysis yeah let me take another look today oh i guess it's actually important i'm like slightly exaggerating here but like not really so in this video let's talk about what data scientists actually do the caveat here of course is that data science is a huge field and the job of a data scientist really varies between different industries different companies and even within different teams kenji actually did a video earlier about what data science do and i thought it would be cool to give you guys my perspective as well specifically as a product data scientist in a big tech manga company i divided this video into what i call five core responsibilities which i spend most of my time doing all right let's go the first core type of work i do is
1:13

Long-Term Projects

long-term projects generally last at least a month it can go up to several months or even years i also divide this into two different subcategories the first one is exploratory projects and the second one is automation improvement projects let's first talk about exploratory projects as its name suggests are projects that are outside the scope of what the current team as well maybe even the company is doing it's intended to give direction to the team or even to the company on what it is that we should be doing in the future what kind of projects that we should be looking into it's kind of like exploring new lands so you're kind of like dorothy explorer but like with data some examples of explorative projects are looking at new technologies and see how they can be useful in your team or even in your company generally the whole process is that you form a hypothesis
1:58

1. Form a hypothesis

and you kind of like start testing things out poking around a little bit
2:01

Begin testing

and if you find something useful uh you make a presentation and tell your team
2:04

Report findings to team

about it maybe tell your leadership about it and if they think it's really
2:09

Hope it gets passed along

useful as well they usually pass it along to engineers to start implementing exploratory projects are definitely my favorite type of project because i really like looking into things that i don't know about and just kind of like discovering things which i find really exciting plus there's less people talking to you and like asking you how things are because it's kind of like outside the scope of what your team is already doing so people generally leave you alone which is quite nice it is also a great reason for if someone asks you to do something ad hoc which we'll talk about later you can be like well no because i'm doing this thing right now it is a good excuse another type of long-term project is the automation slash improvement type of project some examples of these are like sometimes your dashboards are a bit and nobody knows what it actually says maybe your models are just not very good especially as time passes by maybe the process for reporting to your leadership about the progress of your team is a huge pain in the ass or maybe your data quality is not great and your data just like isn't structured very well these type of projects there's also a lot of variety to them but it's really just thinking about how to improve or automate processes that the team and the company are already using for this type of project you're generally also working with a lot of different members of your team usually people with different roles like for example if you're fixing up the dashboards then you're probably working
3:21

Fixing dashboards (with project manager)

a lot with the product manager because they're going to be the ones who are looking at these dashboards the most except for yourself if your models are not so great you're probably going to be
3:29

Fixing models (with ML/software engineers)

working with mostly machine learning engineers and software engineers to see how you can improve them and for work that has to do with data quality usually working with the data engineer to look
3:37

Improving data quality (with data engineers)

at pipelines that can be improved pipelines that are broken as well as restructuring the data so that it's more effective to query so the second core responsibility that a data scientist spends their time on is ad hoc requests
3:49

2. Ad-Hoc Requests

and ad hoc requests are when people ask you to do things where like find figure things out that was not part of your original plan or when things that are unexpected happen and you kind of have to drop everything and focus on that from my experience in product data science ad hoc requests like people ask you so many things um because they're kind of just generally curious they'll be like how's this product doing we're like you know what's the data say about this for this thing that i did two days ago because i'm really excited about the potential results like things like that and you know it makes sense because they're curious and then they want to see what the data says and if data says they're going to come to you because you're the data scientist what i quickly learned after working is that the best way to distinguish between what is actually important and what is not so important is by the number of times that they ask you to do it like if they ask you one time and they don't ask you again then it's probably not important and they probably forgot about it but if they ask you like two times three times and it's like okay like this is probably important and then you go and do it you kind of need to have this type of filter because if you just did everything that everybody told you to do you would literally do nothing else and you don't have time to work on like your long-term project for example the other type of ad hoc work is when something breaks or something just goes like terribly wrong and everybody's like freaking out or something like that and then this is when you drop your long-term project and you immediately jump on whatever it is that is very urgent at this time so when things do occur it's usually very much a team effort to go and investigate and to like work things out now let's take a moment to talk about our sponsor today sure short film produces non-fiction guides that are so much more than just book summaries they start off by laying the structure of the book and the concepts that are being covered and if you're interested you can also look more into the details shorthand has really become my go-to when i'm being recommended a book for example and i'm not sure if i actually want to buy the book or not or read the book so i would go on short form look up the book and kind of see what the concepts and the structure of the book is and see if i'm interested short form covers a variety of different genres including philosophy learning and productivity and business is a genre that i've been more interested in recently especially how to work better in a team as a data scientist you spend a lot of time working in teams me personally i'm not naturally a good team player i tend to be the kind of person that would just be like it's probably faster if i just do it myself a recent book recommendation i got is the five dysfunctions of a team um so naturally i went on short form kind of looked at the concepts and stuff and the book seemed pretty interesting so i bought the book recently and i'm looking forward to reading it after i finish my current book short film drops new book guides as well as articles every single week and subscribers can vote for what book that they want to cover next to get five days of free unlimited access as well as 20 off the annual subscription you can join short form by going to this link over here also linked in description all right back to the video so the next type of work that i spent a lot of my time doing as a product data scientist in big tech are metrics and measurements
6:48

Metrics and Measurements

we're very data driven so what that means is that we always need a way of measuring the success or the impact that we're delivering and the way that you do this is by developing metrics and data scientists are kind of like the guardians of the metrics for the first
7:00

GUARDIANS OF THE

part in creating these metrics you think about how you can measure the success of
7:06

How to measure the success of your team (based on company goals)

your team in relation to your overarching company goal because you know whatever it is that your team is doing it should ultimately go up to whatever it is that the company cares about you're also thinking about what counter metrics there are and counter
7:17

Consider possible counter-metrics

metrics are metrics that you want to make sure that you're not hurting as you try to drive up your metrics for example say you're like on the ads team right and then you're like our metric is to increase revenue so you're like yay let's like put ads everywhere yay look at that we are increasing revenue um but then because you have your counter metrics which is maybe like number of engagements like number of people who are using your product and you see that like dropping really low um that shows that this is probably not a great thing because even though you're driving up your metrics you're also hurting the company as a whole for forecasting your metrics you're usually looking at some
7:52

Forecasting metrics use time series or ML models

sort of time series machine learning model while incorporating of the factors that you know are important in driving your metrics and finally monitoring is a very long term process you're essentially just making sure that you're
8:04

Monitoring is used to check on your progress and spot problems

working towards your goal and like nothing is going to have really wrong since you're the guardian of the metrics if something does happen to the metrics you're generally the first person that people ask and trying to figure out what it is that has happened that may have impacted our metric you know usually in a negative way i have figured out that there's actually like a process to doing this so you don't end up wasting a lot of time the easiest and the first thing you should do is go ask the software engineers and the data engineers if they perhaps did something and ship something into production that could have like broken the metric pipelines in some fashion and like 95 of the time that's usually the case and then if you still don't know what's happening the next step is to sit there for a few days and hope that it normalizes by itself um and finally if that still doesn't work and that covers like i would say 97 of occurrences of your metrics dropping then you finally launch an investigation and look at like the different components of the metric and where it is that it's dropping and trying to pinpoint it and diagnosing what the issue is if you have unexpected increases in your metric though people generally are more cool with that so they're not going to question you as much so the next type of work that i spend a lot of my time doing as a data scientist is experiments i think the importance of experiments is quite specific to big tech companies mostly just because if you're a smaller size company you don't have the data infrastructure or enough data for you to do more than just simple a b testing of different options however in big tech companies we take experiments very seriously the reason that we care so much is that making changes to the products or to features has dramatic impact just because of how big the reach is to so many different people in the world so if you're going to be shipping something you really have to make sure that what you're shipping is actually a good feature or a good product also for ml
9:57

Rules to Manage Al's Unintended Consequences by Bob Suh May 21, 2021

algorithms you have to make sure that they don't have unintended side effects like say if your ml algorithm has certain biases and you release that into the world then that's going to hugely impact people right usually in an unfair fashion towards a certain population of people while all companies should be making sure that the algorithms that they're shipping are ones that are unbiased um and are not discriminatory in any way for big tech companies it's almost like especially important again because of the wide scope and the wide reach of the product so pretty much if you want to do anything new then you
10:29

Used to make sure that changes to features or products are unbiased and non-discriminatory

have to run an experiment and make sure that it's doing the things that you're intending it to do making sure that it's good making sure it's not having bad impact data scientists are very involved and very responsible for this experimentation process as a data scientist it's really important that your stats are like in tip-top shape you got to know your statistical distributions your power analysis sample
10:50

Important to know statistical distributions, power analysis, sample sizes...

sizes like that and you also have to come up with the correct experimental structures as well as analyzing and interpreting the data that comes out of the experiments properly you're basically the person who tells the engineers and the product managers whether this thing that they did is good or not whether you should ship that to production or not something that i do want to point out is that school statistics is so different from real life work statistics like i thought i had a pretty decent grasp of statistics from school and you know like reviewing for interviews and just like learning it myself but then i started working and i realized that i didn't have deep enough of an understanding to be able to come up with like specific custom um experimentation structures and analyses school statistics always gives it to you in like a very specific way right you kind of just like need to make sure that you apply the formulas correctly but in real life you can't just assume things like there is a
11:44

The statistics you use in big tech are different from the statistics you learn in school

normal distribution like certain parameters are satisfied oftentimes you need to think about how to transform your data into a way for you to actually do your statistical analysis so yeah experimental processes really important and definitely this is the area in which i spend most of my time kind of like stressing out about and like double checking and triple checking my work just to make sure that i'm doing the experimental structures and doing the analysis correctly the next area where i spend a lot of my time on as a data scientist is on figuring out what
12:15

Planning for the Future

projects to do next where we should be investing more of our time and when we should be asking for more budget and how to ask for more budget you want to make sure that the projects are well scoped out make sense to leadership as well as making sure that you have a way of measuring the success using metrics if the rest of your team and you decide that you want a budget increase you're also working a lot with the product manager to come up with that proposal showing the data and analysis
12:39

Work with a team to scope out projects, figure out timing, and budget

that demonstrates why having more budget is going to increase your impact a lot so after you do all the scoping and projects asking for budget things like that and you know we start implementing um unfortunately you can't just like go like okay and then just run away even after you decide on the projects with the product manager and the rest of the team you're still working really closely with the team as the projects are becoming implemented um making sure that we're progressing the way that we should be and making adjustments because like unexpected things always occur all right that is all i have for you today i hope this video was helpful um giving you some idea about what it feels like and what the work of a data scientist is like especially in the context of bigtech do let me know in the comments section if you find any aspects of these work appealing or any aspects of this work not so appealing and i will see you guys in the next video we're live stream no bp no sitting on keyboard very bad why don't you go and sit in bed sit in bed

Ещё от Tina Huang

Ctrl+V

Экстракт Знаний в Telegram

Транскрипты, идеи, методички — всё самое полезное из лучших YouTube-каналов.

Подписаться