The comments sections are WILD | YouTube sentiment analysis - Data science project for beginners
13:50

The comments sections are WILD | YouTube sentiment analysis - Data science project for beginners

Tina Huang 11.12.2020 14 302 просмотров 324 лайков обн. 18.02.2026
Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
Full code: https://github.com/hellotinah/youtube_sentiment_analysis/ Behold, my first data science project! This is a data science project for beginners (guaranteed because I am also a beginner at NLP lol). I did some sentiment analysis on youtube comments and quickly went down the rabbit how and did other NLP things and eventually got to the US presidential debates. OH DEAR Videos mentioned: @KenJee_ds 's super cool leaderboard: https://www.youtube.com/watch?v=myhoWUrSP7o&t=50s ______________________________________________________________________ You might also be interested in these videos: Day in the life of a FAANG Data Scientist: https://www.youtube.com/watch?v=lCi6fWuI8r4 How I learned SQL from Scratch in 11 Days to Pass my FANG SQL Interview: https://www.youtube.com/watch?v=vaD3ZFFNwhM ______________________________________________________________________ Subscribe: https://www.youtube.com/channel/UC2UXDak6o7rBm23k3Vv5dww/?sub_confirmation=1 ______________________________________________________________________ Check out StrataScratch for SQL interview prep: https://stratascratch.com/?via=tina ______________________________________________________________________ Contact: youtube: youtube comments are by far the best way to get a response from me! I answer every single comment! linkedin: https://www.linkedin.com/in/tinaw-h/ (second preferred but I might suck at responding) email: hellotinah@gmail.com *If you're reaching out through linkedin or email, I can get back to you the fastest if you leave a youtube comment just letting me know that you reached out :) ______________________________________________________________________ *The StrataScratch affiliate program give me a small portion of the sales price at no cost to you. I'm currently not monetized and really appreciate your support in helping improve this channel! :) #DataScience #DataScienceProject #TinaHuang

Оглавление (4 сегментов)

  1. 0:00 <Untitled Chapter 1> 132 сл.
  2. 1:18 Sentiment Analysis 1433 сл.
  3. 10:01 Positive Comments 83 сл.
  4. 10:39 K-Means Clustering 588 сл.
0:00

<Untitled Chapter 1>

the coursera course spend 15 bloody minutes trying to find the cost of certification he sits he sits in his golf course and i mean literally think about it you probably play more than i do jim america sent a fool on an errand and he has cut her legs off and displayed the stops so this is how it all started for those of you that reach out to me asking about beginner data science projects i highly encourage you to also stay until the very end of this video where i'll give a rundown of a notebook and how to pull youtube comments yourself so you can start your own analysis on your favorite youtube videos or channels let's get down to business okay so i started off doing some
1:18

Sentiment Analysis

sentiment analysis but here's the problem you guys are so nice to me you guys are too nice to me and okay i mean the worst comment i've gotten when doing the scent of analysis is i've got mad yellow fever not saying that you guys should be mean to me though but for the sake of this analysis let's also explore another youtuber's comments and i know just the guy six and a half hours later don't make me laugh okay you're literally making me frightened at all okay ken you have to be normal hey ken how's it going great how about you good how are you papayas they're pretty good i only had three today all right i called you for a very specific reason ken i actually did some sentiment analysis on your channel using two different types of sentiment analysis modules one is called text blob another one is called vader are you ready to hear some results so basically this is going to be like are you ready to react some wholesome and mean comments yeah absolutely let's see what i mean i have read every single one of my comments so hopefully there are no surprises but you know i've um i've forgotten some by design so we'll see what happens when you unearth them all right so okay what do you want to do wholesome ones first or mean ones first to rip the band-aid off let's go for the mean ones let us see all right first learn from vader are you ready abandoned video at 0. 38 want to know why read first video to get visit video to get a quick subject and topic list of math required for data science videos suggest articles read on topic stop video article offers link to coursera course on topic visit coursera course spend 15 bloody minutes trying to find the cost of certification do not find it give up search frustrated return here to leave comment abandon watching the rest of the video conjure up images of wanting to punch the wall in frustration on time wasting with results still unfound comment down now 30 minutes later i still need a list of subjects and topics well i think if that person had just watched the whole video i would have actually given all the subjects and topics um i feel bad for that person but i also wish they had taken just maybe a little more uh time to watch past like 40 seconds into the video here's one from text blog very annoying background music also in podcasts otherwise perfect that one's not matt that that's like constructive criticism um my biggest problem is communicating i'm very bad at communicating what do you think i feel like this person is actually pretty good at communicating but they're bad again and maybe they're a very effective communicator via the written language but not the spoken language so i actually did notice i gave that example because both vader and textblock tended to classify things as negative just because like there's negative words in it right so i feel like that's something that it's not very good at doing is like where's the sentiment directed towards and i wonder if this is something that's kind of ubiquitous in classification i think it probably is i mean that's clearly a negative sentiment like i'm not good at something it's really hard especially in this to get direction i think of sentiment as kind of a blunt force tool where it's great at let's say evaluating a hundred thousand tweets you can tell on a topic most people are positive or negative about this but on a comment by comment basis i think we can both agree after looking at this data that it's a little shaky let's move on to some wholesome ones it's beautiful how you're thankful for us but actually we are happier and more thankful for you truly you educated and inspired me and led me to great resources also thank you because your work is not just useful but also beautiful and well made i can't wait to reach my fifth not bad project and 10th nice project and 15th amazing project and look back at the journey a journey full of learning and growing and amazing experience well i'm like i'm legitimately turning red that's something that if you're putting stuff out there into the world i mean that's some of the nicest things that someone could say about your work your other like you just have so many wholesome comments excellent advice always the best tips thanks man and there's another one that's like these are great tips for me since i myself work from home let's also play a game so what do you think are the 10 words that most really differentiates you based on your comments so let me give you some context first on what it is that i did so i use scatter text with spacey's english core web large nlp model and i created a corpus with a collection of words in it then i found the 10 words that are the most defining of your channel i won't let you everything you just said sounded like a foreign language to me but probably i would expect data and science to be in there probably project maybe kaggle tends a lot of words papaya i know that's not in there you wish you wished no chance not bad so the ones that came out um kaggle isn't there for top ten tweets youtube g your last name coursera jupiter you to me twitter and linkedin interest but um when i expand that to 20 though you do get data science coming up you get like a data camp coming up numpy twitter scraper bootcamp kobit kolak as well i wonder what g is or maybe it's because people some people call me like mr g or something like that or i get a lot of sirs you can just call me ken guys honestly but they call me sir too sertina there's actually so many ways of improving this like using themes as opposed to words i have a lot of ideas to improve this in the future let me know in the comments below too what you think i can do to improve this i also send over some of the stuff that i found and seeing as you know the scoreboard that you're building can i think we can really expand that out and just having like a whole slew of things that could be useful heck yeah i think that there's tons of opportunity here i'm uh pretty excited about the kind of potential to do more projects around youtube let's uh take this to the next level okay so ken's sentiments are generally very positive although he's had some variations over time it makes sense though because he's a really nice guy and has really useful content that's not really controversial and hasn't been involved in any drama that we know of at least okay i don't think he's going to do anything heinous so which unfortunately for this sentiment analysis means that he doesn't have any mean comments or polarizing opinions so i thought to myself what is the single most polarizing thing in the past few months or so the pandemic yeah probably the pandemic but that's been going on for like a year now and kova's already taking over our lives so i was like let's do something else the u. s election so i grabbed the comments from this video which covers the first presidential debate they have a plan he won't even meet with him the republicans won't meet with the senate but he and he sits in his golf course and i mean literally think about it you probably play more than i do jim here's some of the comments that got the most likes gordon ramsay will be a much better moderator in the next debate this isn't a presidential debate this is an emergency meeting discussion among us the debate is hilarious as hell until you realize one of them will be president completely agree with that one and many more i like the last one too they need an italian grandma as the moderator ah yes much better as you can see this is much more polarizing and both vader and textbop do a much better job at classifying sentiment
10:01

Positive Comments

so here are some of the most positive comments biden 2020 a lot of emojis hearts really long soliloquies stuff about salvation of souls best pizza night make russia great again okay that makes sense america sent a fool on an errand and he has cut her legs off and displayed the stops chris wallace a terrible moderator yep makes sense pathetic yeah i can see why you know that would be definitely classified as negative i also attempted to do some k-means
10:39

K-Means Clustering

clustering to see if i can find clusters of comments i first did more pre-processing as you can see this data set is really kind of unclean um in the sense that there's like a lot of spelling errors a lot of issues in terms of like the structured sentences comments that literally don't make any sense at all so i tried to do as much pre-processing as i could i played around with stemming versus tokenizing also like removing emojis trying to check spelling after all of that i then use an elbow graph to show the most optimal number of clusters which honestly didn't show a super clear album but i guess two seems promising and i was like oh maybe it's like one is trump and one is biting but these are actually clusters that we ended up with you get trump biden i debate because i use stemming in this case so the words are not for words but debate joe like president this uh all these things so and then for the other one you get trump word go vote biden you know um savage but i did notice that towards the end here you get beautiful um i don't know clear clearly peace bless heard love so i actually feel like what the clusters ended up being is the first one is much more negative um and the second one was i mean it was kind of mixed but it seems to have more positivity involved in it so that i thought that was interesting because i was really like expecting to see the biden trump split but yeah i don't think these are really good though um so i try to do some more processing but unfortunately to no avail i think the biggest challenge for this data set though is just sufficiently cleaning the comments since you get some pretty wild stuff what do you guys think i hope this video was insightful on how i went about going through my first ever project as a total nlp it's nowhere near perfect and i would also argue that projects are never finished i do want to say lots of things to kenji for unearthing how to script youtube comments and the collab to everyone that wants to do a super beginner friendly project where i also provide you guys with most of the code already you can find links to the notebook in the descriptions below all you have to do is go to the youtube api get your developer key and replace the developer key with yours and then you can run that code you can then switch out the channel id and youtube ids with those other channels and videos that you want i uploaded to collab since that was the easiest for me but you can also play around with the data in any way that you like and there you go try out the analysis i've done and take it away i would love to see any analysis that you guys do so please let me know if you do and improved my analysis too possibly by trying out some of the suggestions that i mentioned earlier next up i'm really enjoying learning about nlp and i was laying in bed i had some other ideas so do let me know what you guys think about this type of video and if you guys like watching and if you do enjoy watching this type of video

Ещё от Tina Huang

Ctrl+V

Экстракт Знаний в Telegram

Транскрипты, идеи, методички — всё самое полезное из лучших YouTube-каналов.

Подписаться