Biology to Data Science (data professor's tips on how to get a data science research position)
34:26

Biology to Data Science (data professor's tips on how to get a data science research position)

Tina Huang 26.09.2020 5 739 просмотров 176 лайков обн. 18.02.2026
Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
How did data professor transition from biology into data science? How do you get a research position with a professor? Will machine learning become automated? Where is data science headed in the future? I had the incredible opportunity to interview Chanin aka. Data Professor! He is a bioinformatics research professor that transitioned from biology to data science through his PhD. He's extremely knowledgeable and I think you guys will derive so much value from this video! :D P.S. data professor actually is a real professor! P.P.S check out our collab video we did on data professor's channel: https://www.youtube.com/watch?v=OBsxUjQ7Ua0 _____________________________________________________________________ Timestamps: 00:00 Intro 01:14 Data professor's background 04:43 Birth of data professor! 11:04 Data science opportunity in biology 18:03 Machine learning is getting automated 18:35 How to future proof yourself 19:16 Data scientist's essential toolkit 22:43 How to get a research position with a professor 30:29 Super Secret Lightning round ______________________________________________________________________ Check out data professor's channel and links we mention in the video: Data professor's channel: https://www.youtube.com/channel/UCV8e2g4IWQqK71bbzGDEI4Q How to make a bioinformatics web app: https://www.youtube.com/watch?v=iZUH1qlgnys Strategies of learning data science: https://www.youtube.com/watch?v=7XdoaQYwTeA Data professor's article about data science process: https://towardsdatascience.com/the-data-science-process-a19eb7ebc41b Ken Jee's video different data roles: https://www.youtube.com/watch?v=BZFfNwj7JhE Best data science project: https://www.youtube.com/watch?v=2goqyY5XBeI&t=3s ______________________________________________________________________ Other channels to check out: Ken Jee: https://www.youtube.com/c/KenJee1/videos Codebasics: https://www.youtube.com/c/codebasics/videos Krish Naik: https://www.youtube.com/channel/UCNU_lfiiWBdtULKOw6X0Dig Andrew from Data Leap: https://www.youtube.com/c/DataLeapTech/videos ______________________________________________________________________ My backstory: I was pre-med with a specialist (equivalent of major) in pharmacology at the University of Toronto. I worked in bioinformatics for a year before doing my masters in computer science at the University of Pennsylvania (UPenn MCIT). I interned at Goldman Sachs and am now a full time data scientist at a FAANG company. ______________________________________________________________________ Contact: linkedin: https://www.linkedin.com/in/tinaw-h/ (preferred) email: hellotinah@gmail.com *I'm the fastest to respond on youtube, followed by linkedin (especially if you mention in a youtube comment that you messaged me)! Don't be a stranger ^^

Оглавление (9 сегментов)

  1. 0:00 Intro 192 сл.
  2. 1:14 Data professor's background 513 сл.
  3. 4:43 Birth of data professor! 1035 сл.
  4. 11:04 Data science opportunity in biology 1162 сл.
  5. 18:03 Machine learning is getting automated 100 сл.
  6. 18:35 How to future proof yourself 110 сл.
  7. 19:16 Data scientist's essential toolkit 554 сл.
  8. 22:43 How to get a research position with a professor 1290 сл.
  9. 30:29 Super Secret Lightning round 609 сл.
0:00

Intro

at least um google for like your potential supervisor look at what type of work are they doing looking at the sort of publication that they're doing um over the years is there any trends that you're noticing thank you so much for taking the time to have a chat with me oh you know we've obviously just like ran into each other lots of places online you talk to each other uh kenji andrew you know we have a lot of mutual youtuber friends but this is actually the first time i'm getting to sit down and chat with you so i'm super excited yeah i'm also super excited to have this collaboration video as well yeah happy to be here thanks for having me yeah thank you so why don't we start off just like a bit of intro about yourself yeah so where should i begin uh well i have a bachelor's in uh biology uh particularly biological science and uh as i also recall from one of your video uh and also andrews so i guess we're all pre-med before and so i started as a pre-med as
1:14

Data professor's background

well and so the first goal was to become a medical doctor um and then kind of in the third year i kind of had like a change of heart uh i had this interest in computer programming and so at the time i was almost thinking of changing my major completely from a biological science to a computer science degree but i figured like i mean i'm having at least like maybe a year to go so i might as well finish up the bachelor's degree and then and i kind of totally forgot about the computer science uh and beaver and on one day i saw this video uh on kind of like a discovery channel and then i saw like this um engineered ear on the back of the mouse and i'm like that is so cool like how did they do that they engineered like a like an ear looking organ on the back of the mouse using like tissue engineering oh yeah right yeah so they kind of like use scaffolds and then uh seated with the cells and then adding growth factors to it and it kind of grow into an ear like uh organ on the back of the mouse and actually that was like pretty interesting uh for me and so i started finding opportunities to do a phd study and i met my potential advisor uh he recommended that i should do something about protein engineering yeah so pretty much like how to mutate proteins in order to uh obtain an intranet protein that has altered property altered function i mean long story short uh af i attended a conference one day and then uh during the conference i met a potential uh co-supervisor who just graduated fresh out of psd from the us from rpi and his phd was about data mining and so it was back in 2004 and so at the time the term data science was not yet around and so it was pretty much like data mining and so i met this potential uh co-supervisor and then uh we kind of talked and he gave me like a high level overview of what is data mining about i talked about neural network and then he gave me a book and then i kind of self study and then throughout the course because he's it's pretty busy and so throughout the course of the next year or so i was um working on my first project which was to predict the dna splice junction site and it was about 13 projects in awe and each project had its own challenges had its own domain expertise yeah it was pretty fun though yeah learn a lot of new stuff as well so not not so much into uh data science um at the time right well that is a perfect segue into my next question how did you become the professor the data professor okay great question so that pretty much started from my daughter yeah so uh she she likes
4:43

Birth of data professor!

some cartoon channels on youtube and then on one day she's kind of like saying like dad why don't you create your own channel and so that idea kind of lingered like for a couple of months and then over the course of the next couple of months i came across videos from 10g from greg snike and then and also there were other eminent uh prominent youtubers in data science as well and i was kind of intimidated i mean they're all like professionals they're all very charismatic and yeah i could remember that i kind of procrastinated on the first video i mean so i did start from recording i would just kind of like stock up on the gears so as you will see in the background i would buy uh all of these um gear i would buy lamps i would buy the ikea uh fake plants yeah so you know like all tech youtubers they should have one of those uh ikea fake plans yeah and also i kind of stock up yeah what's that again i don't have one maybe i should get one right yeah yeah you can recommend one to me later i'm sure i digress continue yeah so i was pretty much procrastinating and then i also stuck on gears buying cameras buying microphones you know like so they're pretty much irrelevant and then i pretty much gathered the courage after uh about three months or so and then final finally hit the record button and then after that i kind of have more confidence because the subscriber essentially they showed their interests in the videos that i was producing and at the time i was i wasn't sure of like the actual direction of the channel so with each of the video that i released the subscribers would provide their comments like hey why don't you make a video about this like before i would make like a very introductory video about like r and then some are like why don't you make a video about using python why don't you make like data science projects like practical end-to-end solutions so before mine was kind of like basic like how to define variables you know that kind of stuff and so it kind of evolved over time so thanks to the early subscribers and also the person subscriber for all of their suggestion yeah i learned a lot of new stuff as well yeah for sure it's like that continuous conversation i felt that as well like when you start off you actually don't realize what is valuable to other people because to you it's so normal right so people start asking you questions like hey like data professor you are a professor you do research you do bioinformatics research so we want to see stuff that's related to your work and you're like you know you're probably like that makes sense right yeah yeah exactly yeah it's kind of like my what i call it it's my true self so um yeah so it actually took less of an effort to make those kind of video like the recent video that i have uh created like how to make a bioinformatics web app so i'm not sure i mean it might be a bit complicated so i try to co i try to make it as simplified as possible and i'm looking to make making more bioinformatic related videos tutorial and also trying to simplify it trying to have as minimal biology domain knowledge that is required for the viewer as possible so i'll try to simplify that maybe i'll also make some infographic draw some cartoon yeah okay i'm so excited for that because we should talk about awesome yeah we should do that yeah i actually wanted to tell you um so when i was transitioning from bioinformatics uh at that time i don't think you started your channel yet like around 2018. did you start yet i don't know i started in 2019 august okay because i went on youtube i tried to look up biology to data science or biology computer science and there was nothing right nothing okay yeah so but then i realized so i started my um master's degree and then i was like oh like i'm curious is there something now and then i looked that up again and that's actually how i found you oh okay cool yeah but yeah that's i think i found you um sometime in late 2019 i think um okay yeah so that's like my early months yeah yeah your early months yeah early months yeah i was um at that time very shy so i didn't reach out and i was also t-shirt okay i should comment more now though because i was like oh this is so interesting this is so like cool but then i was also slightly intimidated because you're a professor you know and i was like why would you want to talk to me so i just didn't say anything okay yeah but that's actually really cool because i think there's actually a lot of people um like from my background as well you know you're working in the science you're coming from a biology background but you don't really know how to go there so i think you're a huge inspiration to people who are looking into going into that field you know because you did it yourself yeah so yeah i just wanted to put that out there as well um so i let's see i have another question that we kind of touched on already um so in terms of you talked about your journey of how you know you transitioned and did your phd in a lot of different projects some of them related to bioinformatics and data science so how did you kind of um end up like going into the research that you're going to now actually like briefly also like what kind of research are you doing now right yeah so continuing on the earlier uh discussion on my wet lab journey so i was working on
11:04

Data science opportunity in biology

protein engineering of the green fluorescent protein and then i figured like i mean there's so many data there's so much data about green fluorescent protein mutants and they have a lot of data about the mutation and this particular mutation at this particular position will give rise to color uh spectral property change and kind of like if you mutate a serine residue one of the amino acid to like a tyrosine not to a histidine then the color will change from green to blue and if you change it from no no not serine i mean tyrosine to histidine it will become blue if you change it to tryptophan it will become yellow so i figured like there should be someone um gather should be gathering all of that data and then analyze it and build a model to predict the color and so i searched the literature and to my surprise i mean there were nothing about that and so i thought wow that was a great idea to pursue and so with minimal knowledge in data science i collected the data manually i put it into microsoft excel i use all of the uh like primitive tools like text editors and then i use this data mining software i mentioned earlier called wika and weka is a data mine software that allows you to point click to import data or import data generally import data and then to select your classifier build the model and then and you could export the prediction performance which i did and then manually enter it into microsoft excel and so that is how i got started into data mining and it eventually became data science and project by project i would learn something new i would learn by publishing actually so with the first project we submitted the work and i learned from the reviewers so the peer reviewer who reviewed the work before we could get published um they would provide a lot of useful advice like why you do this why don't you do that like for example one of the peer reviewer would be like your data is highly imbalanced why don't you perform some data balancing and so i was like yeah sure i could do that and i looked into like what abil what available methods are there like okay and then i'm like okay there's this boat uh over sampling under sampling uh for balancing the data and then another situation is where um if i want to do data split i mean how can i do it uh obviously there's this random data split or is there a better way is there a rational way i mean how can you make like a 80 split and a 20 split to resemble one another and so um i stumbled upon this thing called kenneth stone algorithm so it allows you to make a roughly similar representation of the uh two data split for your projects and so as you can see i would be learning like bits by bits uh from the peer reviewers and also by going through several rabbit holes while reading the papers and so yeah so it kind of occur uh spontaneously over time yeah yeah that's really interesting so this was before like the sklearn days it was yeah so there were no sqlearn yeah at the time they were there was matlab was pretty popular yeah and our uh also yeah okay wow so you really learned it from the bottoms up not by choice but because you had to because there was no like sk learner to do it for you and there was no right so okay gotcha so you that's so you're like kind of run into a problem where someone says something to you and then you're like oh that's not something i should do and then you go do your research and you do the thing and that's how you accumulate knowledge throughout right yeah so i know the pain point i know how painful it is to manually collect the data i know how long it takes if you don't do coding if you're doing it manually and it take like my my project took me uh imagine that you have to optimize your parameter and then you have to set random seed manually imagine doing that by pointing and clicking so imagine how many thousands of clicks that you would have to do yeah so if i like in scikit learn you could do like okay i want 10 random split okay but then i have to do it manually i have to change seed number manually and then for each model i have to select click train after training is complete wait for like what one minute two minutes five minutes after the training is finished copy the result put it into the text file doing this like a thousand of time because when i optimize the parameter i have to do that and then kind of like i will use macro in the text editor at the time i was using this text editor called ultra edit and so i would record the mac pro and then the macro would pretty much doing it automatically for me and so it would aggregate the data for me and then i would then um select like select the column at the same time in a text editor copy it paste it into microsoft excel and then yeah manually make the plots in excel and putting it into powerpoint and then optimizing it uh so that it looks pretty nice so at the time there was not no matte plot lip as well yeah wow so much respect i have so much respect pretty good i think i would go insane if i had to do that was pretty fun at the time i mean we had no choice and yeah uh at the time that was like the best choice we had yeah only option actually that's true yeah because it's almost like you people are probably gonna look at us one day and be like oh my god you have to import a package stuff and you have to put it in you know we just do one button and everything goes through the machine learning for you right and now there's auto ml right you could just import the data you click you run like two lines of code it and it essentially does everything for you yeah i mean that would take me before it would take me about a month to do it manually oh man yeah and you could do that now like in one minute or two minutes you know yeah so it's pretty cool to see all of that you know yeah progress right and it was it didn't even take that long it really isn't
18:03

Machine learning is getting automated

like it's just like a few years and it's crazy how much progress there is that's also why i keep telling people that machine learning is going to become automated one day so what's much more important is that you understand what you're analyzing understand the problem how to apply the learning algorithm because there will come a day where the machine learning algorithm i mean it's already here right it's like two clicks essentially but that's going to scale hugely where essentially any data set you can just plug and chug into machine learning algorithm and what's much more
18:35

How to future proof yourself

valuable is understanding why you're using the machine learning algorithm what your problem is and why it's suitable and being able to interpret those results afterwards that's so much more important exactly i totally agree and like i would never remember the syntax of any uh programming language that i use so it'll come naturally when you use it repetitively over time so what's important as you also mentioned is the logic uh like at the high level like i also mentioned this in one of my early videos about strategies of learning data science and so i mean you would focus at the high level like um for example
19:16

Data scientist's essential toolkit

well you should obviously know like for those aspiring data scientists starting out like what are the essential steps for doing data science what is the data science life cycle so uh i wrote an article looking at the data science process and kenji also had a video and he talked about the data science life cycle and also compared to data science roles of data scientists data analysts machine learning engineer and so if you know like what is it requiring for a data science project like data collection data munching cleaning pre-processing model building parameter optimization model deployment communication so if you have that i mean that is the most important aspect of data science and also the and what's also the most important is the desire to learn new things to be excited with you know like the joy of making a custom plot and it would take you an entire week yeah so i did that yeah so i made one plot and it customized it and i thought it looks so good and so that took me a week to make just one plot i was so happy i was proud of myself yeah so it's like sometimes you kind of get kind of like you jump into this rabbit hole and then you kind of get immersed in in there yeah and so everything like time and space continuum it kind of stopped and then kind of like you're in that matrix you're in that zone and it's so awesome yeah uh i mean honestly it's not it's not an easy path um i see a lot of articles saying like you should not be a data scientist if and then dot so i think for one thing is that being a data scientist i mean it's all about learning all the time right um and the field is advancing very rapidly and it's impossible to know everything though but still if we have this open mindset and we're open to learning new things um like for example the greatest resource to learn new things are is on youtube like ken channel davao chris knight your channel tina andrew as well he's very entertaining and yeah and so i mean there's so much to learn here yes yeah yes i completely agree on that you just talked about so many like amazing things there's i'm gonna link everything we talked about above if i cannot link it above i will also link it below so you guys should definitely check those out um yeah like what you were saying about that mindset like just like how to actually approach an analysis that is just so fundamental and so much more important than just you know trying to like do algorithms i like it it's cool to be like oh i did random forest right like that it's and that's something that's going to become automated one day so it's so much more important to understand how to approach things and how to go through that entire life cycle so i highly encourage everyone to talk go check that out and also another thing that i think was very valuable about what you said um it's like that joy of doing that thing right like data science people a
22:43

How to get a research position with a professor

lot of people want to go into this field because they're like oh you make six figures you make good money um and you have like a chill lifestyle right but you know when you're actually working on a problem none of that brings those things matter at that moment it's you have to really enjoy doing that and what you described about the matrix that's how you know that you found a feel that you truly love and you enjoy like then you want to learn more you want to improve your skills and that is the thing that's going to drive you through to become better and better so yeah exactly thank you so much for touching on that yeah okay so um i do have let's see okay so i have another question for you um and this is related to your position of being a professor right now so on my channel i talk a lot about um doing a research professor is actually the best data science project because um i'll link the video above this law but basically it's because you know you're doing something super impactful because while the professor be looking into it if it wasn't impactful it's unique and you get to learn so many skill sets that you just wouldn't be able to learn because you don't know what you don't know at that time and you know that guidance that mentorship and you know if you do well as well you could get a referral you can also um you know even do a publication if you get lucky right so all these things combined and this is why i tell people if you're in school right now this is an amazing opportunity you should totally take advantage of that so with that being said people have been asking me questions about like oh um professors they just don't respond to me they ignore me so do you have any tips on that you know seeing as you are oh okay uh like how to attract the attention of professors for those who wanted to do like a research assistantship yeah i also got it reached out quite often what would impress me is if you would also go through my website or at least my publication so i have a research lab website where i discuss like what research area that my group is doing or also you could at least um google for like your potential supervisor look at what what type of work are they doing looking at the sort of publication that they're doing um over the years is there any trends that you're noticing and if you find a gap in some of the knowledge of uh like a gap or a potential area where your potential uh professor might be interested in that you could fill in and i would say that would pretty much catch the attention of the potential supervisor so another scenario is that um if the student is reaching out but then they're pretty much copying and pasting like for example dear sir dear madame and then like you know like generic email then for the most part um when i'm busy then i would just kind of overlook that email so it's not impressive not eye-catching um i mean when your email is like filled with a lot of emails it's quite hard to to find a noticeable or eye-catching email but then or at least say dear professor and then the name of the person right not like dear professor but what the name like the name as well so that the professor would know that okay you have at least you know at least copy pasted the name of the potential advisor instead of you know copying pasting the whole message bulk email to a hundred or a thousand uh potential yeah yeah for sure i've been told that a lot as well when i was working in the lab uh we had a lab manager so that person was like filtering through emails i don't know if you have one as well no i i manually look at the email yeah okay so that seems like that's even harder because then you just have so many emails going through it and you're like exactly so would you say like um how what is like a good title that you'll be like oh let me like click ah like a clickbait title let's see um well i mean a normal title would be kind of like looking for research opportunities uh or interest in pursuing a research project and blah blah well i mean the research title is hard to craft i mean i would say any title would do uh but gonna make it a bit more uh catchy i'm not sure how to say this but yeah i'm not sure yeah i wouldn't know yeah but i would look at the content though i mean if the content of the email is suggesting like a potential project that the student has already obviously spent some time looking over the publication that the potential advisor is working on suggesting some ideas i think that would pretty much catch the attention okay so you actually open each of your emails exactly yeah the student would send their cv or their resume as well yeah and also post doctoral uh fellow as well yeah like uh recent phd graduates with email for uh potential postdoctoral fellowship in the lab okay so that's really nice of you that you actually open the emails i know a lot of professors they like skip over everything that's related to i want to work at your lab okay that's so that's really great advice like it's you really just need to at least like be invested in this don't copy paste and you know it's so much better if you actually look at someone's work and show genuine interest right exactly yeah right exactly so you guys this is how you structure an email so that a professor will actually look at your email and even respond to you so um okay i have like a quick like tangent so i don't know if this is something that's good or not but when professors didn't respond to me uh this is what i did okay i would go and talk to everybody in that person's lab and then after that i would ask for an introduction yeah that is also okay that sounds okay yeah because i feel like if i did that um the person is much more likely to you know actually talk to me um so yeah maybe another option to consider if people yeah so that would mean maybe you're at uh like at the same university right so that would also show that you have genuine interest as well yeah so you're doing your homework you're talking to peers like for potential ideas like what kind of project the lab is working on yeah so i would say that is that would also work as well awesome awesome so lots of ideas guys okay all right super fun lightning round are you excited all right okay lightning round all right cool okay all right uh like i said i'm literally just thinking of these questions in my head right now so i'm also not weird oh okay what's your favorite protein green fluorescent protein okay yeah the first protein i worked with yeah cool okay um what are you most excited about in the next two to three years in your field huh auto ml uh
30:29

Super Secret Lightning round

no coding and then just and charge biology uh um gpt3 gbtc yeah uh it's this natural language processing thing that allows uh like machine learning models to be able to comprehend text video so it'll learn the entire internet like harry potter for example and it could write its own version of harry potter imagine the computer ai is writing a research paper for you so actually i'm working on that as well i have the crazy idea of like because the thing is when you're writing a research paper you would essentially write similar things right you would paraphrase what you would normally write and i've seen some promising libraries in python where you could paraphrase uh sentences and so i might explore this area as well like how can we make like first of all if i upload um if i upload a figure like an image a plot and then tell me what should i say look at the plot analyze the plot let me know the trend write it in simple terms a human friendly version that is so interesting you you'll have to link that for me i need to check this out i haven't heard about it if you couldn't beat a data professor and do research as a professor what would you do ah if i couldn't why would i do never thought about that you were just like i will i have decided this is my life's path and you yeah this is me yeah so i'm happy like where i am right now um if i wasn't working here oh how about this if i did not enter phd yeah actually after graduate phd i wanted to pursue entrepreneurship i would be a businessman i would have a startup yeah if you stay interesting well you have you can have both of both the camera talk today the best of both worlds now right okay right yeah i mean being a youtuber is kind of like you're starting your own brand right and the brand is you right so and it's also very fun as well yeah yeah full control that's something that is very difficult to get if you're trying to get funding from people you know then all the vcs are going to tell you what to do you're right right right um yeah so that is in terms of questions uh you thought your answers were so interesting that i literally forgot to think about questions so thank you so much for that and in general taking the time to chat with me today it was such a cool conversation and you know your background in bioinformatics it brings back so many memories for me as well and i know for anybody that's watching you know check out all those links i'm going to be posting check out dataprofessor the channel he posts amazing content and i you know go and creep on all his videos all the time as well it's just so every time you watch it there's so much more because it he i feel like you just have so much experience in this field um and just how to like structure things how to approach different questions i just like get value even if i watch the video like two or three times already so yes go and check it out guys link it above and below all right yeah thank you so much for having me here yeah it was very fun doing this and remember to always minimize effort and maximize outcome i'll see you guys in the next video

Ещё от Tina Huang

Ctrl+V

Экстракт Знаний в Telegram

Транскрипты, идеи, методички — всё самое полезное из лучших YouTube-каналов.

Подписаться