# Auditing Radicalization Pathways on YouTube

## Метаданные

- **Канал:** Yannic Kilcher
- **YouTube:** https://www.youtube.com/watch?v=AR3W-nfcDe4
- **Дата:** 28.08.2019
- **Длительность:** 43:13
- **Просмотры:** 1,326
- **Источник:** https://ekstraktznaniy.ru/video/13933

## Описание

This paper claims that there is a radicalization pipeline on YouTube pushing people towards the Alt-Right, backing up their claims with empirical analysis of channel recommendations and commenting behavior. I suggest that there is a much simpler explanation of this data: A basic diffusion process.

Abstract:
Non-profits and the media claim there is a radicalization pipeline on YouTube. Its content creators would sponsor fringe ideas, and its recommender system would steer users towards edgier content. Yet, the supporting evidence for this claim is mostly anecdotal, and there are no proper measurements of the influence of YouTube's recommender system. In this work, we conduct a large scale audit of user radicalization on YouTube. We analyze 331,849 videos of 360 channels which we broadly classify into: control, the Alt-lite, the Intellectual Dark Web (I.D.W.), and the Alt-right ---channels in the I.D.W. and the Alt-lite would be gateways to fringe far-right ideology, here represented by

## Транскрипт

### Intro []

hi there today we're going to look at auditing radicalization pathways on YouTube by Manuel Horta romário and Al so this paper is a bit different than the one we're usually looking at but since I'm a youtuber and this is in the kind of a data science realm I thought it fits neatly so um yeah we'll have a look and this is mostly going to be an analysis and my opinion on it so you know take that for what it is this is in my opinion a paper where you can see very well what it looks like when you deceive yourself so when you have a hypothesis of something and then only collect data that matches that and you don't think like you don't think of simpler solutions for that explain the data and therefore you don't think of experiments that could differentiate the simple solutions from what you propose so it's a good example of how you can kind of trick yourself into believing you found something and this isn't now about YouTube or anything this happened to me so many times always pays off to take a step back and say is there a simpler explanation for what's happening and this is what I think is exactly happening here so I'll present to you their hypothesis and then my kind of what I think is going on and a model that explains the data much much easier and simpler and actually better so let's dive in the this paper basically claims the following so on YouTube there are channels and channels or you know independent channels they make videos and you can actually arrange these channels so each dot here is a channel you can arrange these channel in kind of a network and Tube channels you can claim they're connected and it can be a connection strength or whatever for simplicity they can be connected if for example their topics are similar if they reference each other if they are recommended by YouTube from have the same users watching those same channels or the videos of these channels there are a number of metrics where it could you could make channels connected but all of them will turn out similar like we'll give you the similar structure of channels being connected oh that's connected twice um so so you can kind of build a graph of how these channels are connected and what you can do then is you can cluster them you don't have to build a graph to cluster them but you can cluster the channels and what will emerge are parts of the graph that are very well connected right here this might be connected with this and with this parts graph that are very well connected and they're kind of within well connected within and more sparsely connected to two others like also have a larger distance in between them so if you start out from one channel and you're kind of watching recommend videos and recommend channels and so on you stroll along here you will get much faster to these things than to the other things so these are called the communities usually in these kind of Network social network analysis so on YouTube you know there's a community for makeup sports within sports there is a community for soccer there's one for basketball and so on so these are all these kind of communities that you can discover by clustering this paper mainly deals with three communities namely the first of

### Three Communities [4:13]

all is the IDW and which is the intellectual dark web they discuss this here so is they describe a group of individuals that are in a rolling conversation with each other about topics that are let's say usually kind of difficult to talk about such as gender differences or intelligence research in certain areas or even you know regular politics but kind of the intellectual darkweb are a wide variety of people that basically are conversing with each other about topics it's that the description is a bit vague but the main aspect is conversation and maybe topics that are kind of on the edge of what's acceptable to talk about but the opinions range widely on these topics the second group is the alt right and the alt right here is kind of the they're defined as ethno nationalists for example here is an example the fringe ideas such as a white ethno state white supremacist ideology and so on so specifically ethno nationalists that i think nations should be organized to along the lines of ethnicity and the goal of the paper is actually to show that there is a kind of a dangerous pipeline on YouTube that will drive people to the alt-right and drive people into these radical ideas of the outright kind of in-between is the alt light which is here defined as civic nationalists which is simply as I understand it means that people should be organized into nations not along ethnicity but just should organize themselves into sovereign communities and it would be more of your libertarian classically liberal people whereas the ultra would be more of your let's say authoritarian right right-wing person so these three communities they have a fourth community which is a they call a control group and the control group consists of what they say are kind of math mainstream channels on YouTube simply to differentiate them from these three and to see what's going on with them and if there is a difference so

### Hypothesis [7:05]

this is kind of the de set up and as I said the hypothesis is the following people go on YouTube so YouTube is here YouTube people come on YouTube they go around right they explore a bit and all of a sudden they find IDW videos these are recommended by YouTube on a fairly regular basis right now they're interesting people find it they find it interesting answer and then they're from the IDW there are recommendations and links to the alt light and the out light are still so as I read this paper there is kind of a an undertone kind of the IDW in the out light are still okay like they there they discuss ideas that are you know sometimes political and so on but the real the worry is that all to write the kind of radical right-wing ethnic nationalists and i mean yes it did the formulation I can agree with and then they claim so you find the IDW that they have links to the alt light or links I mean recommendations and so on and from the alt light and to a certain degree also from the IDW you can then find the alt right so even though user that goes on YouTube at first isn't likely to find the alt right videos because its fringe its extreme and so on by through the YouTube recommendation algorithm basically by going to the IDW finding this then from there they'll find the light and from there and from the IDW they will then find the right so they claim that there is this pathway of radicalization here that kind of pushes people towards the alt right and that's their hypothesis and they claim that they have evidence to support this and I claim that there is a simple solution namely the first of all let me state I don't like the alt right I think their ideas are despicable I should go without saying though I have said it now so you know just as a disclaimer I'm not defending anyone here I'm simply saying this paper has a simpler explanation for their data namely what I think is happening here is YouTube again is channels each dot here is a channel channels can be clustered as such right there as we saw before I'm just drawing more of them right now but all channels janus janus channels so what I think is happening is there is a control group what they call the control group it's over here it's large control alright it's a bunch of channels then which is kind of mainstream media then over here there is let's say alternative media where all of these three groups belonging to me so at some point you will have the IDW then maybe a bit further away from the control group but very close to the IDW would have the alt light and very close to the - maybe here you would have the alt-right right but so notably the in my model the IDW in the alt light are kind of close together they are in terms of comparative distance so if you close through these channels but let's say all the answer topics or and so on it will turn out that all of these three are far away from the control group those two are very close to each other and then they're here there is some distance but how much you know how much distance it is a question but of course it's going to be smaller distance than the distance to the control group here I mean I could draw the alt-right maybe a more accurate picture would be something like this all right here so whatever I mean it doesn't matter that the details but the distance here is smaller than the distance to the control group right and in this model a second thing is also important namely the alt-right as you can see here is much smaller than the IDW and the alt light and these again are much smaller than the control group and this I think accounts for most so the distance relations between these and the size of the chan of the clusters account for most so that with size I mean mainly channel number of channels and also audience this accounts for most or most of the data better than their model so just keep this in mind right and my model of course doesn't include any kind of pipeline that they

### Data Collection [12:47]

suggest so first of all they go ahead and they say all right we they collect channels so they collect data for this and you know we could go over how they collect the data and criticize that and so on very human annotation and they start from already published reports and so on which themselves can be criticized I don't I'm not going to go into their data collection methodology it can have mistakes but any collection methodology can have mistakes what they end up with is a number of channels and here are the top channels from each category and you can as you can see all right intellectual dark web and control so already here you can see pretty clearly the model I have in mind namely and they acknowledge all of this by the way look at the size of the alt-right channels the biggest ones compared to the size of the alt light and the intellectual dark web there's much smaller in number of views and then compare this to the size of the control group again is again the larger than the other two groups so just keep it in my second thing to keep in mind look at these channels maybe you know some of them Joe Rogan sargon of akkad of these Paul Joseph Watson sticks hacks and hammer these are youtubers like these are individuals making YouTube clips creating content for YouTube being on this platform whereas if you compare it with the control group what's here Vox GQ Wired Business Insider these aren't youtubers these are websites or traditional media companies or they're their own kind of blogs and so on that have a YouTube channel where YouTube is one of the outlets of this media company so I think there's a giant discrepancy here in the control group that can explain also some of this data that you see so keep that in mind I think the control group they say they don't try to capture the user dynamic with the control group but I think that there's many problems with this control group including the fact that these are kind of traditional mainstream media that just have YouTube as an outlet and moreover a lot of these like Fox or vice they are clickbait media and rage bait media that it has worked for a number of years but it's the algorithms are becoming more attuned to kind of click bait and these are crashing fast whereas the kind of more youtuber people they are they're not susceptible to that much to kind of the abolishment of click bait all right so these are this is the data they have all these channels videos and they first of all give some some stats

### Statistics [16:10]

all right here you see on the bottom is always the year so they do this over time and you see the active channels which are channels that have uploaded videos in some time see the control group again is larger but has started to flatten out in the last few years whereas the these communities they are relatively flourishing another interesting point is that the paper somehow tries to tie this to the election of Donald Trump in 2016 but I again I think this is just kind of in there to gain relevance a lot of these kind of trends and so on you'll see already start before that so in so that the start of the rise here if you see these bumps here and so on a lot of them start before 2016 so as we go through this make up your own mind of how much this is actually tied to the election or not I think it's much more the years when kind of clickbait started to go down as a business model never mind though so the active channels growing though the control group not growing as much videos published even though the control group isn't growing so much they still publish the most videos but you can see generally the site is growing generally YouTube is growing like counts and here you see something interesting starting to have namely these communities especially the alt light and the intellectual dark web they're starting to catch up and this is one of the things that the paper also states is that if you look at for example comments per video this the light and the intellectual dark web outperform the control group vastly right also if you look at views per video and likes per video the control group simply don't have an engaged audience which I think first of all is because they produce clickbait second of all they're just not that interesting and third of all there are not youtubers like this isn't their thing they're just simply an outlet but yeah so that's kind of a one just kind of a bunch of metrics that the that they show here the next table is a bit

### User Intersections [19:06]

more interesting in the next table they do user intersections so what they do is they collect all these videos and then they collect all the comments of these videos and the comment of course always comes with the username you need to be logged in to YouTube to make a comment and they see which users comment on multiple videos or videos of multiple categories and then they can look at aha okay a user how many users of category a also comment and category B and vice versa so they have two metrics here Jaccard similarity which is for two communities a and B users commenting number of users commenting on a and B divided by or B and the second the overlap coefficient is number of users commenting on a and B divided by the minimum size of a and B they say that the overlap coefficient is more useful to compare communities of different sizes so we'll look at that the top graphs are always two chord difference and D or Jaccard similarity in the bottom one are overlap coefficient first graph though our number of commenting users per year and you already see that even though the control group has much more views and probably much more videos much larger the comments don't they so the again the users of the all light and the intellectual dark web are much more engaged also comments per user this is the cumulative distribution function most people that comment on the on control group videos maybe comment once and then but the these other communities take the comment more self-similarity means year after years are always compared to the year before how many users are similar so how many how well do these communities retain users and you can already see here the control group is actually very bad at retaining users it does have this overlap coefficient high but it has the jacquard same self-similarity low which basically if you think of the formula of the Jaccard similarity means that the this number is small and this number is high which means that a and b are very disjoint which means that the last year's users aren't this year's users basically so they constantly have to appeal to new users because they're losing old users because well i guess they're boring and whereas the alt light and intellectual dark web are much more much that are at retaining users interestingly the alt-right not as good as retaining users as the other two this could also be an effect of size like if your community is smaller the users might wander away more quickly but I think this already speaks against the radicalization pipeline if the all right if YouTube was radicalizing people towards alt right we I think we would see a the alt right being on top of user retention then here they have intersections between communities so green here is alt light and IDW while the blue is all right and out light and the other below result right and I do is basically the green is ultra light and IDW and the Blues are the other two and we see that the overlap in terms of overlap coefficient is similar card similarity the alt light and the IDW are very much more sharing users which in the picture I

### Similarity [23:37]

painted make sense if you think my model is valid my model explains this very well in that these two communities are quite close together therefore share a similar user base the all right smaller and a bit further apart therefore not as similar though more similar than the control group which is the last graph is sorry how similar are these communities to the control group and here we see the IDW and the alt light kind of similar um they all tried not as similar though in the overlap coefficient they're about the same so the paper here claims oh look at the similarity this is definitely a radical isn't so they don't claim yet this is a radicalization pipeline but they claim that there's a higher similarity if you actually look at the numbers it's not so I mean here your arm around 50% similarity and here at the end you're also around the 50 percent similarity with the control group so this is within these groups and this is here with the control group also here your if I look at the kind of mean here you're at whatever 20 18 percent and here you're also you may be a bit lower but you're also going towards this what it looks to me like rather than there being a radically a radicalization partly if you look at the shape of this and kind of where it starts in 2013 2014 it starts to go up here and you look at the shape of this it's simply the same shape delayed and I mean there's no reason why this graph wouldn't go up here in the future and reach the exact same numbers as here it seems that the graph is simply shifted which makes total sense if you think these communities are I'm going to draw the same picture here alright IDW light and over here control if you think they're there they're like that if you think simply think well YouTube is growing users are starting somewhere here and then spreading out pretty much randomly like they're spreading out spreading out users start here spreading out here spreading out everywhere users just kind of there's a diffusion process going on not in a particular direction like they claim if there is just a diffusion process going on what would you expect you would expect users that started here to reach the IDW and alright much sooner then they reach the control group but ultimately as the diffusion continues all users will have commented on most videos if you run YouTube infinitely and these numbers would go that's why the numbers go up right if you just let it go the diffusion process will go along and it simply takes a longer time to go from here all the way over here then it goes then between these communities so to me we're looking at a simple diffusion process here that is shifted in time and that explains very much the discrepancy in number but also the shape of the curve that is exactly the same but shifted their model does not explain the shape of the curve they simply say well here it's 75% and here it's only 50% that means that these communities are kind of shipping users towards each other

### Pipeline [27:47]

other so I think the explanation is easier then so they claim this does not alone kind of show that there is a pipeline what they now do however will show that basically so they claim this is the experiment that really shows that there is it is pipeline so what they do is they define what they call an infection now so what they say is okay we are for example this row here we're taking users that are alt light users at the beginning in this time so basically they only comment on the only comment on all light videos during this time right so discard all users that comment on anything else just retain the ones that only comment on all light videos during this time then we're going to follow them over time and how many of them have at least one comment in an alt right video so this is only directed from the community over here towards the all right and then they call a user infected specifically if they comment on one or two right videos they're likely infected if they comment on three to five they're mildly infected and if they comment on or they're severely infected so as you can see users starting from the alt light or from the IDW or from both they will become in some will become infected over time namely and I postulate we simply look at the since the tendencies between the groups are similar we'll simply look at the light infections here so they say okay after you know in 2018 about eight to ten percent of the users become infected in these groups you see here about the same trajectories whereas so whereas in the control group it's less here though honestly I don't think it's that much less right i think that again I think there is a normal diffusion process here they do this similarly with the other ones and to me like to them this makes total sense like oh yeah users that start in these communities they migrate they get infected by the alright they go towards your right because you can find it so easily and to me this simply looks like a normal diffusion process here's what you need if you want and by the way the control group isn't that much different here's what you need if you want to show that there is a pipeline in this direction you need this exact same graph in the other direction and you to show that people that started in the alt-right do not go back in the same fashion towards the alt light or the iew and they do especially not go to the control group you need to show this basically between each pair of these and you need to show that the direction of infection is only in a single direction namely towards radicalization otherwise you're just looking at a normal diffusion process between differently distance and differently sized groups so

### Analysis [31:42]

they've go on to analyze they say well how much basically how much of the alt-right audience makes is made up by people that have been radicalized that have been infected so that this infection is kind of their proxy for what they call a radicalization and if you become infected then basically you're not a part of the alt-right or something even though you might have continued man might have commented something negative actually the you might engage with their ideas and call them their crap but in any case you're now infected and they ask themselves how much of the alt-right audience has or of these infected so basically how much of the alt-right audience have our people that in the past have been not alt riders have been exclusively commenting on all light or IDW videos and they find that for example for alt light 23% of the alt-right audience or former alt lighters and have our former altar lighters that have now made one comment on an all-right video so that their claim is well there is a sizable portion of the alt-right that at the beginning wasn't alright that basically became infected and therefore that kind of shows this radicalization pipeline that all tried audience is mainly consistent of people that have not been alright previously but have become so and to me again this is simply I function of the size of these communities right if you think of this again and you start randomly somewhere on YouTube let's make this assumption people what's the probability that you're going to start in the all right very small right so what's the kind of natural let's say the natural size of alt right before users go on migrate is very tiny right so not many users are going to be what you would consult originally alt writers whatever their first comment basically what this thing measures is where is your first comment and are any of your subsequent comments all right if your first comment is not in the all right then you become a potential candidate for infection and if any comment is on the alright then you're infected so what's the probability that your first comment is not alright well you're gonna land somewhere on YouTube is huge the outright is very small thus that probability is extremely small and then you let you simply let people defuse let them defuse some will end up in the all right and since the outright is so small to begin with actually most people that will comment at some point I'm an all right video will have their first comment from somewhere outside the outright videos simply a numbers game right simply the all right is so small that this is virtually guaranteed so what they find here is again simply an evidence of a regular diffusion process between these differently sized groups and the claims they make from this are just over the top again that their comparison to the control group if you look at the numbers they're actually not that different from this from the IDW numbers they're different than the alt light here substantially different but again it simply a function of distance in my opinion in these clusters lastly they look at the YouTube recommender system and they say okay if we look at these videos and the channels and we look at on these videos what other videos are recommended and what other channels are recommended so if you have like a video on YouTube you have the video here and here you have like recommended videos similarly when you have a channel right this is a person yeah I'm this person the person can have first of all they can have featured channels where they say look these are channels that I find cool I go check them out and then they also have recommended channels that are kind of given by YouTube as recommendations so here YouTube controls basically everything here the Creator controls part and the YouTube controls dollar part so they look - but first of all the channels recommend recommendations so these are both sections here and they look at if you start on a alt light video how likely if you do a random walk or you to end up in the alt-right or in the interactive web or control group after one step two steps three steps four steps so that the big line is the random Walker and actually the dashed line is the distance if you were to target Lee go into the direction of such a video like what's the minimum number of clicks you need and you can see here the if you started out light after one or two steps the random walker is kind of a two percent chance to end up at an all right video and about a 25 percent chance here of ending up in a intellectual dark web video and about a 50 percent chance of ending up again at an old light video the scales here really different so it's very difficult to judge how it compares to the control group which is playing at zero here but to me again this is a reflection of the size of these communities and I think it's a bit you know weird to then claim oh these are reachable basically so two percent chance of landing on an all right video um I'm not sure but again if you compare if you start from the control group there's almost no chance you'll end up in a alt right video so I guess the comparison is okay if you compare to control group if you start look at videos however again if you start at alt light after one step you are approximately 25 percent likely to be in an IDW video you're a bit over 50 percent likely to stay in an alt light video however compare this to channels you're almost super unlikely to end at a control channel if you start at an old light channel but in video recommendations you're actually also about 25 percent chance of ending in a control group video where as look at the scale here you're only about point zero three percent likely to end up in an alt right video and also here so here even look at this if you start an IDW video the chance that you're going to end up in a control against super high much higher than an alt light video whereas with the channel recommendations this was completely turned around so we see the alt-right completely loses when it comes to video recommendations and mainly the control group gains compared to the channel recommendations I think here's what I think this is due to this section here where the creators have power and also this section here YouTube recommending I think they're putting a lot of work into the video recommendations I think they're putting not that much work into these recommendations and by work I mean actually manually intervening and deciding what's kind of good videos and bad videos and the control group they're probably there's probably big advertisement money in that so they might be pushed up a bit in the video recommendation since most people are going by video recommendations I've actually never used the channel recommendations feature and the channel recommendations first of all the creator has power over part of it and then also YouTube may not put as much work into these related channels so both have in the effect that I would say the that the data here first of all it doesn't convince me of a radicalization pipeline it simply convinces me that some communities are a larger or smaller and closer together but second of all that this down here if you forget about alt right for a moment yeah they're irrelevant this down here actually compared to up here shows maybe a bit of evidence of an algorithmic promotion of these mainstream media channels compared to how the communities are actually clustering which I think this up here might be a much more accurate picture so you know that it's just kind of a funky thing in the data yeah that's alright this irrelevant to this part because they're just too small so this is kind of my take on this they didn't give recommendations and is this a pipeline and so on and I don't think so you've now heard my idea and you've heard their idea decide for yourself but I think it's a good example of how if you are convinced of an underlying mechanism you're going to collect evidence in support of that mechanism and if you catch yourself doing that really think isn't there an easier explanation for this alright that was it for me have fun
