Identity and Well-Being — Faculty Talk
1:12:26

Identity and Well-Being — Faculty Talk

Steven Skiena 14.04.2026 266 просмотров 9 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
Take our survey here: https://self-identity.me Recorded on April 7, 2026 at the SBU Laufer Center

Оглавление (15 сегментов)

Segment 1 (00:00 - 05:00)

Okay. So, first of all, it's good to be back here and uh I'm going to talk about some stuff we've been working on with my grad student Rohan who's over there. So, that's why he's here. Um but um you guys at the LER Center, as far as I can tell, you work with cells, you work with genes, you work with molecules, you don't work with people. Okay. to a first approximation. This is a talk that's kind of going to be about people or about data and people and um it's about a project a we're interested in self-identity. Okay, what and uh you know um identity is something about how you see yourself. Okay, someone you think about a person can have many different kinds of identities way that they see themsel. Hypothetically, imagine there might be someone who's a man, someone who might think of themselves as a father, or they husband, an American, or they might think of themselves as Canadian, or they might think of themselves as a uh a scientist, or they might think of themselves as an educator, or they might think of themselves as a soccer player. Okay? Um there different kinds of identities. Um, everybody's got a self-conception of who they are. And the question is, does it matter? Okay, I claim it does matter and that how you think of yourself and how you see yourself has impact for health and happiness and success. And I'd like to um kind of study this by looking at data. That's kind of the question. So um when again I'm a computer scientist. I'm not a social scientist. Um, and the way social scientists are the people who usually study identity. Um, and the way that they do it, there's usually data limitations. The kind of standard way that in the social in sociology that people will study people's self-identity, they will give them a 20 statements test. They will say, "Write down for 20 things for you that you answer I am uh blank. " Okay? And you know there have been a large number of studies where they give people these surveys and you know they analyze it and they try to get um you know some insight into how people think about themselves. Um the problem with surveys to a first approximation is that you can't give them to too many people. Okay. If you have free form text okay like in a survey like this it's hard to analyze it. you know, how do I know what the meaning of what you write this down is? So, I'd like us to study identity, and it would be great if we could find some way for 420 million people to give us um tell us who what what's important to their selfidentity and ideally have phenotype on a lot of these people. If we could get that, then we would be able to understand whether identity means something or not. So this is the outline of my talk and this is the first time I've given this. So this is uh a new thing we've been doing. I'm going to talk about how do you measure identity at scale. Okay. Um I'm then going to talk about how we measure well-being for these people. I'm then going to give a uh sense of what kind of things do we learn or see from this methodology. then go more detailed into the methodology if people have questions or something you know to give some ideas of it and uh that's kind of my plan. Okay. So how do I get people's identity data on people's identity? The answer is social media. Who here has a Twitter or has or had a Twitter account at some point? A bunch of you. Okay. Um, when you had your Twitter ident account, you listed a uh 160 character string describing what your identity is. Okay, these are representative strings that I collected from people. Um, you can see some people describe their family roles, political beliefs, some people describe their job roles, some people use emojis, some people describe their identity in terms of where they live, okay, their gender, all kinds of different things. Okay? So, my claim here is that the these social media profiles, okay, are people expressing their identity. Okay, these bios are

Segment 2 (05:00 - 10:00)

personal. The person wrote it. Okay, they're expressing it. You know, the person who wrote it is the one telling you what they how they describe themselves. And the reason that why they are writing it is to specify what their identity is. And so my claim is that if you take a look at a data set of all these social media biographies, you will be able to get learn interesting things about how people see themselves. Okay? And if there any questions so far, comments. Okay. So you can see hundreds of millions of people have social media uh Twitter accounts where they list their identity. This is my Twitter account. I never bothered to mention an identity string, so you won't know anything about me. Roughly 10% of the people on Twitter are like me, but most people have self-escriptions. And I'm, you know, you guys may now be thinking, if you had an account, what did you write? Okay, but in principle, it tells us something about it. And I will tell you people like me who don't list their identity will show h have actually less well-being than people who do have biographies. So if you wrote your biography that's going to be a good thing. Okay. So the data set I have here comes from uh I have a picture here of Jason Jones. He's a professor in the sociology department here and in Ajax and he is the one that um assembled the data set that we have here that we need a shout out for Jason. Um you know in computer science social media data the uh you know analyzing tweets was has been a cottage industry for a long time. Um Jason had the observation that whenever anybody downloaded a tweet, Twitter also gave you their identity string and you know people like me for many years threw that identity string away. He saved them and therefore had assembled basically a data set of 420 million people who he observed over a 10-year period. Okay. and um you know ultimately stopping in 2023 when Elon Musk bought Twitter and you know kind of access to that data went away but given this I claim there's all kinds of interesting things you can learn about identity if you do that so what's an example of — yes data he has to participate — okay what what you have to what what Twitter would do did was they had an API meaning that you could go over write a program that could fetch okay a certain fraction of the Twitter feed historically 1% and so you could constantly go over this API and say give me the hundred less tweets that were available — and then I'm trying to ask a person might put an identity — but not tweet is that identity — okay if they If you have an identity and you do not tweet, you don't exist in our data set. Okay? — So, this requires that 420 million different people at least — have been observed tweeting over the course of this data set. That's what it means. I mean, that the population of Twitter was presumably larger than this. — Yeah. Again, in to think about it, we're seeing roughly 1% of all tweets that are done in a semi- random way. And so, you know, unless you have made a if hundred tweets over the course of your life, we almost certainly have seen you. Okay. But if you have made less than 100 tweets, you know that it's unlikely that we will have seen you. — Yeah. So this is about 5% of people on the planet. — This is something like about 5% of the people on the planet which is a large number. Okay. — There anyone here to figure out that the Twitter accounts represent people because — okay so we'll talk we'll I'll talk about that a little bit later. Um what I will say is one thing that we will have done uh so I'll talk about that later but um one thing that we will have done is we did do some kind of in our data set pruning for people things that we didn't think were people. So we eliminated about 10% of our feed for people we things we didn't think are people. Okay. Um and you know this is not an exact science but I'll tell you what we did or did not. Okay? That's a fair question.

Segment 3 (10:00 - 15:00)

Okay. So, what is an example of what you can learn just by counting at Tweeter, reading tweets and counting? This is an example of frequency analysis. This plots the prevalence of Trump and Jesus in people's social, you know, people's self-escription. Okay. The x-axis is time. The y ais is prevalence which means frequency in bios per h 10,000 bios and what do you see Jesus has been drifting downward in a steady weight and at least at some point in our data Trump became bigger than Jesus. Okay. Is this good? Is this bad? that you can decide but it is clearly yes in the biographies this is not about you know it's not like uh you complaining I could imagine many people in here might you know have opinions about Trump and might express it on your social media that's one thing it's a separate thing if it rises to the top 180 characters in your self-escription this means it has risen to the top 180 60 characters of their self-escription. Okay. Yeah. — Say that over Jesus in their — Okay. I couldn't hear that for some reason. — So, right. So, this is 100 is 1%. This says less than one but you know 08 point one you know po8% we're talking about Jesus at one point and a similar amount you know we're talking about Trump you know another year okay so okay and you know in other work that I'm not really going to want to talk about we have done a lot of analysis uh we have built kind of built this data set up where you can start to take a look at uh getting basically trends of how much do people talk about everything in their social, you know, in their biography and we have coded it by things like what country the person is from and what state they are from. And so you can see all kinds of things that are interesting. Where is it that people if you look at where are the emojis for the Mexican flag more prevalent in the United States for some reason they are down here. Why is it? It's probably because you have more Mexicans there. And where is the Canadian flag emoji more over represented somehow? It's near the northern part of the United States. So you can pick up a lot of things in these by I want to convince you that that there's some signal in this that there is some stuff you can do. There's all kinds of things you can do that I don't want to talk about one thing. But just to take a look at it, if you look at um national flag emojis, sometimes people, you know, put their national flag in as an emoji symbol in their bio. In every country on earth that we checked out, that country's flag emoji was the most common flag emoji overall flag emoji. So, you know, the geol location means something. What they're saying means something. Okay, you can do other kinds of analysis just based on counting. Um, one thing you might ask is how long is it? Again, we see somebody's identity every time we see a tweet from them. So, we can see for some people, you can monitor has their identity changed since the last tweet that we saw and based on this you can compute a Kaplan Meyer curve about for any term what is the survival of that identity. If you list yourself as a scientist, how long what fraction of the people from the moment we observe them as scientists keep this scientist identity after one day, 10 days, 100 days. And you can start to see that certain kinds of identities last longer than others. If you take a look at things with family tokens or religious expressions of religion, these last a lot longer than things where people express their political okay uh tokens or their job something about their career. So this is another dimension on which you can analyze this kind of stuff. Okay. So I want to convince you I can measure identity over a large number of people. um all that involved counting

Segment 4 (15:00 - 20:00)

and it didn't tell you anything about sort of the individual are people doing well not doing well okay it just counted basically how much they said was in their biography now there are methods you know you can use machine learning methods to take text and try to infer um psychological variables and other variables from it. Remember I said computer scientists have long analyzed tweets as kind of a thing. So you can imagine a machine learning method that tries to take uh people's psych predict people's psychological trait from their social media posts. Okay. So imagine a world where if I had, you know, hundreds of tweets from each of thousands of people who I gave a poll, a survey, a survey instrument to measure how happy they were, what their psychological big five profile was, what their propensity to depression was or anxiety. Okay, I could now have a data set where I have a lot of tweets. I have for these people measurements of their psychological health or their psychological profile. And from this you can use machine learning to try to come up with a way to convert tweets to estimates of their psychological health. Okay. Um you know this thing is uh one of the big five traits is something called agreeableness. And as an example, when they tried to um what they did is they would try to learn for each word that could appear in social media in their tweets, you know, does that tend to have a favorable correlation with um agreeableness or unfavorable? If you're social use words like happy, wonderful, friends, family, you're agreeable. If you use words like these, you are not agreeable. Okay? And so, so you should believe that you can kind of by machine learning take a bunch of tweets and a bunch of examples of people where you do know the ground truth of their mental state and come up with lexicons so you can get a score to estimate this. This isn't something we did, okay? This is another group that did this. But uh but we're going to use this kind of data in what we want to do in particular. We're going to use two data sets that were developed um where they in one they tried they took you know 3/4 of a million people on Twitter and tried to predict their big five personality uh measures. Okay. using data up to 2016. Okay. In another more recently they we there was a uh they tried to predict depression and anxiety scores on twe uh tweets from 2020 to 2023. So we're going to use these data sets to provide estimates of personality on these people. Okay? And the great thing about it is this is not me. You may complain about my methodology or me or you may not trust me. The estimates now of personality and anxiety and depression and happiness. This is another group. So I'm not trying to fake anything here. I'm just going to tabulate it based on not the kind of things that they were worried about. I am going to try to tabulate it based on their identity. Okay. In particular, I'm going to take we took all the people who we have identity strings for biography with those that these other groups develop personality assessments for. We eliminate 10% of the people who sound like businesses. That's to try to make it more like that they are people. Then for every keyword, we're going to average the personality assessments of all the people who have that word in their bio. So if you mention scientists in your bio, you may say, are scientists good or what's the psychological profile of scientists? I could take a look at every one of the people who had scientists in their bio uh who were one of these millionish people who h we have personality estimates on. I'll average those scores over all the people who claim to be scientists have scientists in their bio

Segment 5 (20:00 - 25:00)

and now I can get a sense as to what scientist means. And so basically we have about 8,000 different keywords where there is a one in 10,000 at least one in 10,000 people put that word in their bios that we have enough evidence on that we can get some kind of a meaningful personality guess about what that keyword means. Okay? And you can then play all kinds of games. If I want to understand if they like pets, I can a I can average c people and dog people. Okay, once I can do individual words, I can do higher order concepts. So, how does this work? Okay, here's some example words just to give you some idea about the methodology. The five variables that I am going to have induced from the data set are do these people report themselves being happy. Stability is a measure of kind of you know related to one of the big five psychological traits. You know something about how well do you want likely do you want will this person you want this person to work for you. introversion, extroversion, propensity to anxiety and propensity to depression. Okay. So if you put down words like CEO and church, okay, and parent, what is the score in here? This is going to show a percentile on the normal distribution of that variable. So what does this mean? This would say that if you would listed yourself as a parent in your biography and we average all the happiness scores of people who were uh listed parent in their biography they would sit at about the 66th percentile of hap happiness of all the people we have measurements for. Okay. So red means above the you know sizably you know reasonably above the mean or the median. Okay. Blue means below the median. Okay. We have statistical significant scores if the term occurred often enough. Okay. So what is this kind of mean? Way to look at it. If you look at someone that says they're a CEO in this, they are happy. They are stable. They are generally more introverted than extroverted. They have lower than average anxiety and lower propensity to depression. If you put any of these words on, what do you say? These works look less favorable. Okay. Uh these people are less happy than uh than you than median, less stable, less introverted, higher propensity of towards anxiety and depression. Okay. Yes. Okay. So let us think what honesty means and where this comes in. So first of all these scores are being reflective of these scores. The score the estimates of your personality variables come from your tweets. This has nothing to do with what you are actually specifying in your bio. Okay. So to a certain extent I'm observing your tweet behavior and to compute these estimates of your happiness and your anxiety and stuff like that. So there's no lying in the scores particularly. You may say um it's easier to lie in your biography, okay, to say that you know if I said I'm I could write down in my biography that I'm the pope, okay? And uh but you know so if you want to if you're worried about lying worrying about what people put in their self-description and there's then becomes a philosophical description of do people lie in their self-description or not. — Okay. It is fair to say — okay so we okay so these are reasonable questions okay but what people use Twitter in different ways okay that's one reason it would be good to I separate real people from people who are functioning in professional ways so it's fair to keep that thought in mind I have some comments on this later

Segment 6 (25:00 - 30:00)

but that's a fair thing to keep in Not. But again, maybe more interesting if you have mentioned hate in your bio. Hate is important enough to you that it's one of your 160 strings. We don't think well of you. Okay? You don't you know if you have love in your bio, sure enough, you do score statistically significant. You're happier and less anxious and uh less depressed. Okay. You're correlating some fact some other fact and whether it's a lie or another lie is kind of the interpretation. — Exactly right. So there is a question of yeah I'm not making you know I make I may slip and make cause and effect statement but I'm not making cause and effect statements. Everything here is an algorithmic number and it's subject to interpretation. Okay. So it's fair to have these methodological ideas but you know the I'm presenting real numbers. Okay. And you know you can think about what the interpretation is. That said if someone tells me they're psychotic I'm going to believe them. Okay. That's what I would say about this. Okay. So what are these numbers that I'm showing you? Again I think I kind of said this. I'm going to present some b some of my data is from these biographies. Some of them are from survey instruments. To make this clear, the thing that I think made it clear is all these numbers. I'm going to assume that my variables are normal. Okay? Normally distributed. That's not necessarily exactly true. Okay? But this way I can describe any group in where it sits in my distribution. Okay? And again, you're going to see numbers in red if they are higher than 55 percentile, blue if they're below 45 percentile. I'm going to present P values on some numbers if on numbers if they achieve this. Okay? Where according to a t test that particular word on that particular variable differs significantly from the uh you know from the population at large. That's what the goal is. Okay. So, let's take a look now. That's the methodology. Let's look at it one last time. Who here knows their astrology symbol? Okay. So, we've got a team of crack scientists here. All of whom identify with this astrology. Okay. — What? — Well, you identify with it to some extent. Okay. I would say it's probably not very important to you, but if you listed it in your top 160 characters of your bio, it probably is important to you, right? And if you take a look at this thing, if you put the emoji for a zodiac symbol in your uh biorank, you can see first of all, all the zodiac symbols perform similarly. Okay? and you will see that this is not a a very healthy profile. Okay, we you could also specify your zodiac symbol by a name, you know, like you know, I don't know if you knew what your symbol was, but you probably knew what your sign was. If you specify your zodiac symbol by a keyword, you explicitly spell out your zodiac symbol for almost all zodiac signs. Okay, you get the exact same profile as if you know if we had talked about it. The one interesting difference was cancer. Now, why does cancer have a different thing? Cancer mean word means two different things, right? this emoji symbol doesn't. If you have cancer in your bio, you do well much better according to these metrics if you then certainly than any of the zodiac symbols. Okay. Now, why is this? Okay. Part of it is that it being having the identity of a cancer survivor is a good thing. Okay. It's a sign of strength. part of it is that you some of you may be cancer researchers or something like this. Okay. But again to give you an so you have to worry about words having overloaded meanings but you should be convinced that there is some consistency here in what this keyword stuff means. Okay. Any questions? — It does make sense that if you list your zodiac you would be classified as someone with anxiety or depression. You list your in your bio. — Okay. — To me, I just as a big factor. — Well, what I would say is that somehow

Segment 7 (30:00 - 35:00)

um Okay. So, first of all, the right answer from Bruce is these are numbers and you have to interpret them the way that you want to think about it. My interpretation is there a couple reasons why you're going to see this. One reason may be that if you're putting your zodiac symbol in here, you don't have other sources of identity that are stronger and more important. That's probably my personal guess here that if you're bragging about your zodiac symbol, you're not, you know, bragging about being married. You're not bragging about being a faculty at the LER center. You're not bragging about other a lot of other things. — Okay? you might feel things are out of your control. — Maybe you do. So, so again, there's different ways you can interpret it, but I want to claim I've got a system that's measuring something and now it becomes a question of interpretation. Okay. To help interpret this stuff, I'll admit we've we've put together a self-identity survey which we administered to 800 people. Some of the results I'll show will come from our identity survey. Most are going to come from the uh what you call it most will come from the uh keyword analysis. Uh if you want to figure out what your self-identity is, go to our website selfidentity. mme. There you'll figure out we try to isolate which are the most important identity categories for people and kind of what that means. So if you want to check out your identity, go to selfidentity. me. So was based on key words — not individuals. So a given person may have multiple different keywords. — A different an individual person will have several different keywords and it could be that some support you know some of them are going to be healthier than other ones by our measure. For every individual keyword, I will have a measurement of the psychological profile of this keyword. — And then for an individual who lists a number of keywords, you have a score too — for the people who are in my data sets, I have on the order of a million people for whom I have these numerical scores from the tweets from their tweets. Okay? So you could later go back and look more detailed at these people. Maybe someone said I am the pope and an astrology and I am a zodiac, you know, and an Aquarius. Okay? And figure out which was the dominant one for those. That's not what I'm interested in right now. I'm interested in bringing it down by individual keywords. See what I see. Okay. So what any other questions methodology — cancer was not — cancer the word the cancer symbol emoji almost certainly was zodiac I don't know what it could have been the disease this — it could be all kinds of things okay now then there's a questions of how do you interpret it let's look at this thing let's look at some observ observation. So, I'm going to claim that our keyword thing gives us a way to look at uh identity in several different dimensions. So, what can I tell you about family? Okay, everything that sociologists will tell you and our survey data tells us is that if you identify with family roles, you will be exhibit higher well-being. Okay? If your keywords say father or mother, husband or wife, okay, you will be you know so exhibit higher levels of wealth of happiness, well-being, lower levels of in generally of anxiety and depression. But in the case of mother for example right there higher than average of all — oh I'm saying higher than average for mother here remember here I'm talking about a keyword by categories we're talking keywords so I have — so mother is at the 68% of happiness and also the 56% of the depression. So it seems that those — the these are orthogonal — maybe not — these are orthogonal measures okay to a certain extent one is a question of how happy do you report yourself one is your propensity towards depression okay and one you know one can imagine these being different okay and in fact you know we do see these being different they're measuring different things

Segment 8 (35:00 - 40:00)

Right. So, so somehow does that mean that the same person is tweeting along the entire penalty of these mental states at different times? — The claim is that if I read your tweets, if you could imagine a wise psychologist reading a hundred tweets from you and now saying, "Is this guy happy? Is this guy introverted? Is this guy showing signs of anxiety? depression? Okay. — The principal one can be depressed. — You can be. Yes. — He's not categorizing all mothers. He's only characterizing the fraction of people that have mothers in their identity. So, so there's no problem if 1% people have mothers in their identity. It's no problem for some of those way above average unhappiness. Yeah, we don't know it by model or just the average. — Okay. Again, we don't know it, — but again, I'd like to just say, okay, I think these are useful summary statistics. We can talk about methodology later, but I think it's interesting to see what it is. What if you list pets in your biography? Oh, I am a dog. What am I doing? I'm a parent of my dog or my cat. Okay. Is this good or bad? Well, pet people do not seem to be as happy. Pet parents are not as happy as human parents. Okay, looks like cat people seem to do worse than dog people. Okay, and this is kind of from the keyword analysis. This is from our poll thing. This seems to be consistent with the literature. Okay, it's not bad to have a pet, but for the pet to be that important doesn't seem to be particularly good. Okay. What pronouns do you use? Personal pronouns do you use in your thing? Do you describe yourself as a as I you describe us as plural or do you talking about them and they? Okay. People who use us and we in their profiles seem to be healthier and happier than people who talk about I and me. And they are happier than people who are busy complaining about other people. Okay? That's what they is about, right? And you start to see that — exactly the reason I just it would be nice to know um what fraction of identity. Okay. So you have a word like or us. Okay. So let's cap or let's say the word cap occurs in one in 50,000 identity profiles. Yeah. So it's very rare they cap into their identity profile. Let's say the word class occurs in 50% of identity profile. — Yeah. — Now it's very common thing and you're not really selecting for any special group because so many people use it. Um, do you have a measure of you understand what I'm asking? I mean — well okay so the what I will say is to a certain extent the if you have a big group the average of a big group would presumably tend to be towards you know towards the mean towards the center. Okay. So one measure of you know sort of group size if you wish is going to be on the significance levels you know when you have a t test of whether we have a large group or a small group okay a smaller shift in the average happiness level on a bigger group can be highly significant okay so that's the main way obviously we — um so with the astrology science Yeah, — we didn't think of all people were Pisces because of all people, — right? — We thought that people were Pisces who elevated it to a thing on their identity profiles might be. Okay. And so when you talk about pet cat people or mothers or whatever, are we talking more about — this is about identity. It's not saying that the cat or the dog is the is the issue. It's a question of whether this is a this has risen to the point of being in the top 160 characters of your

Segment 9 (40:00 - 45:00)

selfidentity. Right. But it's easier for some words like us or we or I or me or they to rise to the top than other words like capital. — Yeah. Right. — And so depending on what kind of word it is, whether it's a word likely to be at the top or a rare word, you know, reged up from the dead. — Okay. So you're saying that it might be interesting to look at the frequency of words in bios versus the English language — or in the tweets or something something. — Okay. So that might be interesting. I don't have data on that but uh but that I can see now I understand what you're interested in. Obviously the data is there if you want to do it. Okay. US supposed to mean two different things because — could mean us. It could mean that's why I have we here also which probably doesn't mean okay let's keep going religion okay you know how does religious identity make a difference okay and again generally speaking religious oriented words seem to so healthy psychological profiles from this when you break this down a little bit more here we tried to break down by these are denominational words okay And what do I find interesting? You know, again, the, you know, the Christian denominations seem to be doing very well. The non-Christian denominations generally seem to be doing well among less formal kind of belief systems. One thing that's true, if you list yourself as atheist or agnostic, this does not correspond to as healthy a profile as the religious profile. Okay? So that is something that you see in this data that I think is interesting. Okay. Now there can be a couple of reasons why but I mean you know but generally speaking you know as part of this you know I've been trying to read some of this literature from sociology and stuff like that. This seems basically consistent with that. Um you can imagine that values are part of your your identity. Sometimes certain aspects of character things that you would value you know could be part of what you see. Now there is a framework called values in action that tries to come up with a taxonomy of values and they say that there are kind of six different kind of categories of values each of which has several sub kind of categories and generally speaking the identities we see that match up with uh with senses of values tend to show favorable profiles here associated ated with better well-being. Um, this is kind of a detailed breakdown of that chart. Okay, that here we've got the uh 24 the six major categories with the 24 subcategories. Almost all the subcategories show healthy profiles. Recognizing someone's values from the keywords is a trickier thing than counting the word agnostic or atheist or something like that. Um you know so in order to get these estimates of uh value traits we you know tried to get a larger lexicon of keywords where we you know to make sure we had no bias we kind of used chat GPT to suggest words that would be associated with the trait and given the trait name and the definition of the trait what would be some identity words to use to do this but Again, generally speaking, we get healthy profiles. And you know, when you cross this with other data sets, this is probably going to get too deep in the weeds. You see things that uh that kind of correlate with what we would like to do. the our observed data on values kind of if you cross it with data sets that made um data set uh that kind of linked values to certain physical and mental health questions. Okay, you end up seeing that again if you do well by our psychological variables, you know they can predict when is your health, you know, positive physical health and mental health activities. Maybe that's too deep in the woods. Okay, what about age? Okay. Um, one thing that is true is that with age, people in general get happier as they

Segment 10 (45:00 - 50:00)

get older. This is one of the surprises that we, you know, you may not believe, okay? But, uh, but what this is from our survey data and sure enough, if you look at how well they self-report themselves, older people are happier than younger people. And in fact, a lot of our phenomena may very well be simply reflecting the ages of people on um on social media. But it's interesting if you have describe yourself as old or young, that doesn't really do very much for you. But if you're think of yourself in your biography as the oldest or the youngest, these seem to be associated with positive profiles. So distinctions seem to matter. Now we don't get direct information about um people's age. This is from their social media profiles. This is unfortunate. But we can get a proxy for it for a little bit by looking at some of the the identity form class of X. Okay. There was a time if Ken had been listing on Twitter his social what would his um social identity be? He would have been proud to be a member of the Oklahoma City high school class of 1939 or something like that. Okay. — 1839 — 1839. Okay. We can look at how often did people identify with the class of 2010 and you can see that there were people that went away. So there's a very short period of time when people's identity was associated with the class of a particular year. These are presumably young people. the when they if you say you're you're proud to be a member of a class, this gives you some kind of a proxy for age and we can reduce that to some kind of a measure of what is your well-being. If we track somebody over this 10-year period and we know that in 2010 they were a member of the class of 2010. when we see them in 2023, they're probably much older, right? So, what does this show us? Four years before your class of X. Okay, you were reasonably happy. This is you were young and naive, then something happens called adolescence or whatever this is, and your your so your your well-being values drop through the table. And then a certain number of years past your uh gradu after your the class of X then well-being start to go up. This kind of stuff is visible in our data. — Might be survivorship. — What — might be survivorship — could be a survivorship bias. Okay. Um okay. You get similar things. People list colleges in their uh biographies. Okay. Proud to be a member of SBU. Proud to be a member of Stone of Harvard. Proud to be something or other. What does this show? We took all biographies that mentioned a college trying to be careful about what we were doing and we grouped colleges by ranks. Okay? Top the top school, top desile school, second decile, third decile. people who go to who are associated with more prestigious schools seem to show better well-being. Okay? Now, you know, uh you know, maybe this is a function of real well-being. Maybe they're better at at faking it. Okay? But this is something that we see in our data. Okay? Okay, let me see that a lot of the Let's move on here a little bit just uh because I know I'm running long. Political identities if you're associated with politics. Okay, one thing that's true is in the sociological literature, conservatives are happier than liberals. Republicans are happier than Democrats. That's what you read. And sure enough, that's what we see in our we saw that in our poll. Okay. But we also see that in our social media data that Republicans, okay, report being happier than Democrats. Okay. Uh and uh conservatives report being happier than liberals. Why this is, I don't know, but about I there are there are theories why this is, but just to kind of keep going.

Segment 11 (50:00 - 55:00)

Okay, let me move on because I think that uh I I am running near the end of my time. What about the methodologies here? Okay, the people on the left are supposed to represent angry reviewers. Okay, you probably see these uh dealing with it. We've heard I've heard various methodological complaints about the working with social media data over the time we've done this. Some of these are better than others. Okay? And I want to just go through them just so that uh we can air any concerns we might have. One is some people will complain that Twitter has things that aren't people. Okay? Some people say it's bots, that they're just programs. Why should we listen to these bots? Well, as I said, social media companies try to uh eliminate things that aren't, you know, people on their things. Okay? We had a filter where we eliminated things that sounded like businesses. But in many ways, uh if you look at our results, you know, sort of on things that you can measure, okay, generally speaking, we're consistent with what social, you know, what the social science literature says and we're consistent with social stand observable standards like population size. the people who are from California are more numerous than Oklahoma. Okay? And that's what we would see here. You may say that Twitter users are not normal, you know, the full population. Okay? And that's possibly true. That's true to a certain extent. It's known that the Twitter people are younger and better educated than most Americans. But when we were analyzing our data, uh there were studies that said 23% of all US adults use Twitter. So what you know it's not that you know it's it's a biased sample, but it's not an unbelievably biased sample. That's I guess the way that I would like to think of. You may complain that public display is not self-identity. This is something that people think just because you say it on there um doesn't mean anything. One thing you'll be amazed by is how many CEOs and presidents and founders there are on Twitter. Why do people describe themselves as CEOs and presidents and founders? Well, they're all CEOs and founders and presidents of single proprietor organizations. you see a lot more people who are you know bragging about things than maybe that they should. So that's but on the other hand you know you would expect that what you know your bio string is public okay you want other people to see it and generally speaking the theory in um the social sciences is that these group memberships are kind of public and that they're observable. Um, the other thing is that there has been a lot of research over the years on how do people present themselves on social media. It's not wildly inconsistent with what's real. That's kind of my theory. There might be concerns about do we interpret language properly. Okay, we saw the thing about cancer. We're counting individual keywords. If somebody says that they are not happy, we would score them under happy, you know, if you know because we're not looking at negation in the individual keyword. But if someone uses Trump because they they want to talk about bridge, okay? We don't know that. But you hear what terms we're using. We try to avoid ambiguous terms and we try to default to, you know, I will say lexicons and language models so we're not cherry-picking terms that we want. Um, you know, we use these lexicons to estimate people's well-being. You may or may not like that methodology, but um generally speaking, the the interesting thing about our our assessments is that we were using other people's personality profiles. This is something we didn't cook. These were came from peerre journal studies. Okay, I think there's a certain level of validity there. The final thing that I just want to clarify is again when I tabulate these things I'm tabulating people's identity based on their identity strengths. Now our estimates of their personality

Segment 12 (55:00 - 60:00)

came from their tweet behavior. Now the same person wrote the tweets as wrote the person wrote their biography string. So there is some level of correlation. You know, you could expect that if you wrote, you know, uh, a nasty word in your biography, you probably are a lot more likely to write nasty words in your tweets. Okay. Um, so there is some level of where you might be concerned that, you know, the person that the text is influencing the profile and vice versa. But I'm not really that concerned about this for a couple of reasons. One is the biography text is not used in inferring the psychological profiles. That's all the tweets. Okay, we are partitioning people into groups based on only a single keyword of their identity. There's not a lot of context here. If you say you're a scientist, that's all that we're using to group it. And in general, when we report a number, we're grouping at least a hundred people who just who all we know about them is that they mention that word in their biography. Okay, that's the methodological stuff. Let me end here um and just say that I think that the social media provides an interesting way to an analyze uh you know well-being and uh you know and you know identity. Um we see consistency with survey results, published media, so published research, sociological theory. Plus when you look at it, it usually just makes sense. We have this self-identity survey. You want to take it, find out who you are, there is that. And I am trying to write a book about this stuff. That's kind of because I think that this you learn interesting things about how people work. So, um, who do I thank? Um, first Rohan, my grad student, has been doing a lot of the, uh, statistics and stuff here on, uh, the well-being analysis. Jason Jones is the one that developed uh you know this data set on identity that's been critical to us. Um I've had other students of mine who I've worked with Dakota and uh uh Zing Jing and these are the people uh Andy Schwarz and Sal and Sid these are the people that actually prepared the data sets on well-being and that's where I leave it. Okay. So the message is we want to be fundamentalist mothers and heart. — Well, if you can do that, that's great. Okay. And if you could So if you could absorb that identity, that sounds good. Now that may be harder for you to pull off than — Yeah. — than you could. — So this is extremely interesting. Thank you very much. And I wonder you mentioned that you are sharing this with psychologists or because they need this type of tools. — So you know I again we're trying first of all we I have worked with Jason for who's a sociologist. They're sociologists and psychologists they're different beasts but um so we we've worked with Jason for you know on this for a long time. Um, — where is Jason? — Jason is in sociology here and he's also in Ajax. And um, you know, I do think that this stuff has more um more meaning and more pull and more richness that I think might be properly appreciated. One reason for all the slides with the angry reviewers is that when we did send some of this stuff to social science reviews, a lot of it got beaten down for what we're off really kind of dumb reasons. There's it could be smart reasons to turn me down. But there there's things that I do think that this kind of a data collection and this kind of an assessment is very interesting. And uh I I'm hoping that kind of now with the stuff we're doing, I think it makes clear to clearer to me and I think it will make clear to other people how interest just how interesting it is. — You I wonder if you could have sharper classifications if you were using more than a single keyword at a time like using combinations of keywords. So when you use combinations, what happens is the the more precise you the more combinations you make, the fewer people have that combination. — That's right. — So you lose statistical power when you start looking at combinations. That's one reason why keywords are a good good dimension to look at this.

Segment 13 (60:00 - 65:00)

— It seemed like a lot of the review complaints. I don't get maybe you're saying this. They're not negatives because this is about who people think they are, not who they are, — right? Yeah. Yeah. — Well, they might not be this, but this is how people think they — right. So, so to say that again, what I'm analyzing is if you think you are like this, okay? Or you report this is what I see about you. And I see about people who report themselves like this. And usually if they're reporting themselves like this, it probably is how they see themselves, even if it may not be 100% true. Psychological psychiatrist psychologist is evaluating the data coming from the individual selfident. — Yeah, I don't understand that. So, so there you know there is this concern of there is a separation between if you report that you're a scientist on the identity that doesn't mean that you are necessarily you know we we're not checking your scientific credentials okay and so there's a question of are you really what you claim to be that I can't tell okay do you feel that you're reporting it. Okay? You know, do you feel you are? That's kind of what's what would be kind of more important. What I'm really measuring is are you reporting it? Okay. And you can then interpret that as you wish. But I think in general it captures kind of how you see yourself. — Okay. Thank you very much. Thank you very gave a link. — Yes. — Stealth-identity. me. — Go check it out. Well, you can check it out. Tell us what you are and uh we'll now I'll then know what your identity is and — think about. — Okay. Thanks a lot. That's very interesting. Very interesting. — Okay. Thank you. — Are you do you have I mean know are there ways to turn this into companies like you did before with general — so I don't Right. So to a certain we were doing tweet analysis then. — Yes. — So at that point we were doing a I don't see how to turn this into a company. I it into a book, but I don't company. — You know, part of it is see one of the things that's great about the identity data is that there's a lot of it, but not too much — because you don't you know, you don't report your ident you don't give new identities every day. You give a new tweets every day. — Yeah. — And the scale of this is that it's not something that changes on a daily basis on enough to the point where you want to need to know the latest here. — No, it seems like a very clever choice to focus on those things. That's a very nice way to do a correlation here. It's very interesting. — Okay, fair enough. Now, you just need to find out why it is the fundamentalist so happy and the Republicans were so happy. — You know, there is a literature. Okay, so there is speculation. There is part of it is that you know if you don't believe things can change to a certain extent you are content. If you believe that the world is the way it is because that's how it should be you are content. Okay. There's also some studies to the effect that when they actually measure real happiness, like things like how often do you smile as opposed to — what do you check as your happiness on a form? — There's some evidence that this goes away. — What goes away? — The the difference between liberals and conservatives. — Oh, I see. — That it could be that liberals report themselves as happier than they they actually are. I don't know if I completely believe this, but there are some — conservatives. — What? — Conservatives. — Yeah. — Okay. Or maybe equivalently liberals report themselves as more miserable — than they actually are. — Yeah. — I wonder if that depends on, you know, if we're living in Biden years versus Trump years. Maybe the, you know, there's probably some of that

Segment 14 (65:00 - 70:00)

but but the claim that that conservatives are more happy are happier than liberals is a stable thing that's been around for 20 years or something. — No, I guess I can believe that. Yeah. — Very interesting. — Put together with some other kind of similarly sourced data set um I don't know Facebook but there was a book that you read how the internet intelligence really you remember this — not sure I do — okay — I'm not match the individual but it might have to match. Yeah. Okay. This was Google frequency of Google searches — maybe. — Okay. That's a separate data set. — You get I mean it might be impossible to tell you must get some using this and something like Google search. — Okay. So Google searches it's not clear first Google searches in general you get prevalence you get frequencies of searches and you get some geoloccation or maybe some stratification by gender or something like that but not much more than that. So, you know, by definition, a Google, what made Google searches interesting was if they were not public — that, you know, you could search for pornography on your own, right? — You know, you know, so my Google search, you know, I'm happy my identity string is public. I'm not so happy if my Google search has become public. — So, it's not quite the same kind of a data set. I would kill to have the not quite kill, but almost kill to have similar kind of data from Facebook. So Facebook would have it Facebook and Instagram would in princip so Mark Zuckerberg could do this kind of a study better than I could and it's possible he is okay. It's not clear he can monetize it that quickly. So I'm not sure about that. And now Elon Musk could do this much better than I do because he still has access to that data. and you know would have access to it in a greater quantity. So that uh you know remember I'm seeing 1% of all of these the the tweets. Um I think for the personality estimates the guys who were doing that had 10% of all tweets but Elon Musk would have 100% of all tweets. So given enough you know Elon Musk enough interest from him and enough compute one could do uh even better study of this but — once the numbers get really big making them even bigger doesn't help that much does it — you know there's you then start to be able to okay if I really had access to everything I would be able to start asking more detailed breakdowns. — So, you know, one thing I would like to do if I had enough of this data, which I don't, you would like to do things like make age inferences of people, how much of this and gender inferences, okay? is uh due to uh you know, effects are due to age, how much of it's due to gender, stuff like that. And you could do these kind of cross tabs, break them down in a statistically robust way much better if you had more data. So you know, so I would like more data. It's not enough to keep me from working on it. I have enough to do some interesting things. But uh — so that was pretty interesting. Do you really have a publication problem? Are you not worried about that? — What do you mean by that? Well, you said you have all these angry reviewers. — So, the we had this paper. So, we've published a few papers on this so far, but um not on the personality stuff, but the earlier stuff there. There was one paper that's still been in limbo where we were looking at uh politics. One graph I flashed up was about measured something about how much did

Segment 15 (70:00 - 72:00)

you talk about politics within a particular in each country and you know it argued that politics had you know that politics had become an increasing part of people's identity um in almost all countries of the world and we could never get this thing published. There were always people who were fussing with the same kind of methodological problems and methodological concerns and you couldn't convince them otherwise. So, you know, so there's one manuscript that is kind of dead because of this that I think is actually very good. Um, but you know, and that's one reason why I want to write this as a book. I've been writing this up as a book more than I've been worrying about getting papers out on it because I think that you know first I think the looking at it holistically it it's more interesting and you get a better sense of this thing and part of it is that uh you know I don't need to get if I have it as a book I don't need to get into a fight with these people. So that's kind of it. So, I'm planning on writing the book and then we're writing the papers as opposed to the normal view which would be you write papers and then you collect them and that becomes a book. — This would be a book sort of like um like who's big. This is in the who's bigger family with the interview. — No problem. — Actually, you know what? I know. — I was actually hoping you were going to be here cuz I had an example where I was going to have a hypothetical person's identity — and I had all set to work out though maybe — you could imagine maybe you have somebody who's a male, maybe somebody who's a father or a husband, maybe somebody who is German, okay? Maybe someone who thinks of themselves as a scientist or an engineer. or German. Yeah. Um or a pingpong player. So that was where — So anyway

Другие видео автора — Steven Skiena

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник