Warning Bias in Information Transmission
Principal Investigator(s): View help for Principal Investigator(s) Ilya Altshteyn, UCLA; H Clark Barrett, UCLA
Version: View help for Version V2
Name | File Type | Size | Last Modified |
---|---|---|---|
|
text/x-r-syntax | 2.2 KB | 02/24/2017 07:27:AM |
|
text/csv | 207.3 KB | 02/26/2017 08:27:AM |
|
text/csv | 47.6 KB | 02/24/2017 07:27:AM |
Project Citation:
Altshteyn, Ilya, and Barrett, H Clark. Warning Bias in Information Transmission. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2017-02-26. https://doi.org/10.3886/E100334V2
Project Description
Summary:
View help for Summary
A growing
body of theoretical work suggests that some kinds of information may be
propagated more rapidly via cultural transmission than others. Laboratory
studies of cultural transmission have documented a variety of such biases, but
demonstrations of biased transmission outside the lab are rarer. Here we report
two studies that investigate the differential transmission of information about
danger, or a warning bias, on the social networking site Twitter. In Study 1
two coders rated each of 9,388 tweets (publicly-shared 140 character
utterances) from Police and Fire Department Twitter feeds for whether or not
each tweet contained information about danger. In Study 2 the same procedure
was applied to 3,815 tweets from parenting magazine, local news service, bank
and weather service Twitter feeds. To estimate the magnitude of the warning
bias in cultural transmission we computed the retweet rates for tweets
containing danger information and compared them to retweet rates for non-danger
control tweets from the same sources at approximately the same time. In Study 1
and Study 2 danger tweets were 1.49 times and 3.53 times as likely to be
retweeted as non-danger control tweets, respectively. We discuss the
implications of these findings for propagation of danger information in the
real world and provide suggestions for future studies of cultural transmission
outside of the laboratory.
Funding Sources:
View help for Funding Sources
NSF (GRFP DGE-1144087)
Scope of Project
Subject Terms:
View help for Subject Terms
social transmission;
content bias;
cultural evolution;
communication;
social networks;
negativity bias
Geographic Coverage:
View help for Geographic Coverage
internet
Data Type(s):
View help for Data Type(s)
event/transaction data
Methodology
Sampling:
View help for Sampling
Study 1
In this study, we collected tweets from the Twitter feeds of 10 police departments and 3 fire departments from the top 10 US cities by population. Three cities—Chicago, Houston and Philadelphia were represented by both a police department Twitter and a fire department Twitter. The average Twitter account included in this study had 22,488 followers, with a range of 2,998 to 103,000 followers. Twitter limits how far in the past a viewer can examine the feed of a given account at any one time. We collected all tweets between the date of collection (which was in May and June, 2013) and the earliest tweet we could view from each Twitter feed. In total, we collected 10,435 tweets. Coders did not rate 1,047 of these tweets because of time constraints. This left 9,388 rated tweets, which is the number we report in the main text. Two coders rated each tweet for whether or not it was about danger. All coders knew the hypothesis in the study. Coders were given the following guidance to determine whether or not a tweet was about danger: “If you lived in the place where the tweet is coming from, would the information in the tweet be (at least theoretically) useful to you in avoiding being harmed?” They were told that harm could be economic or bodily, to oneself or to friends or family. In this study, all tweets came from sources that were located in a specific city. After completing the danger coding, one coder recorded how many retweets each tweet had, and coded each tweet for whether it contained a number of content types. These included whether the tweet referenced another twitter user (in which case it would be part of a Twitter conversation, and could appear not only on the original tweeter’s page but also on the page of the recipient), whether it contained an image or a video, and whether it contained a link. Importantly, coders did not have access to information about the number of retweets a tweet had while they were coding tweets for danger content. Only tweets that both danger content coders agreed on and neither coder used an “unsure” code for were used in the analysis. We tested coder agreement only for tweets for which neither coder used the “unsure” code because different coders likely have different thresholds for using the “unsure” code. Where one coder might indicate that a tweet is about danger, another one might indicate that he is unsure. Similarly, if both coders use the “unsure” code for a given tweet, it does not mean that the coders agree about the tweet’s content. Such cases therefore do not tell us about coder agreement. Cohen’s kappa, a test of inter-rater agreement for categorical items, takes into account the base rates at which coders assign different codes to calculate the proportion of inter-rater agreement beyond what would be expected by chance given those base rates. Cohen’s kappa in our data was low, k = .33, but significant, p < .0001. Most tweets in this dataset are not about danger. The number of codes indicating that a tweet does not contain danger is therefore much higher than the number indicating that a tweet does contain danger. This drives the percent chance that two coders agree on a given tweet up to 88%, leaving very little room for coders to agree above that chance percentage. Our coders actually agreed on 92% of tweets. That means that in 8% of cases, one coder indicated that a tweet was not about danger while the other indicated that it was. This rate is unsurprising given that coders have to detect a rare signal (only ~2% of tweets in this dataset were rated by both coders as being about danger) in a large dataset, and that the operational definition of danger content was not always easily applicable (see appendix A for a sample of tweets from the Study 1 dataset). This reduced the number of possible tweets to analyze from 9,388 to 8,963. Of these, 203 were agreed by both coders to be danger tweets, and 8,760 were agreed to be non-danger tweets. Tweets that were part of a conversation with another user, identified by their inclusion of an @ symbol preceding another Twitter user’s username (this is the syntax the Twitter platform uses to have users send tweets to each other; when a tweet is sent to another user, the receiving user is privately notified of the tweet) were removed, leaving 8,177 tweets (198 danger tweets and 7,979 non-danger tweets). We chose to remove tweets that were considered part of a conversation between users because it is unclear how being a part of a private conversation could influence the likelihood of a Tweet being retweeted in our dataset. This removal shifted the data slightly against our hypothesis—the mean retweet count of danger tweets was reduced by .20, while that of non-danger retweets was reduced by only .07.
Study 2
In this study we used a different sampling method that increased the number and proportion of danger tweets in our dataset. Tweets were collected from a total of 25 Twitter feeds from the accounts of banks, parenting magazines, local news sources and weather services. We chose these four types of Twitter accounts because we reasoned that they would be likely to occasionally tweet about danger, but that their followers would not be following them specifically because they tweet about danger. The accounts in each category (banks, parenting magazines, etc) were chosen because they had relatively many followers. Twitter accounts included in this study had a mean of 172,236 followers—approximately 8 times more than the mean of 22,488 from accounts included in Study 1. Follower counts ranged from approximately 15,200 to 830,000. Because number of followers is a measure of the size of the audience that a tweeter reaches, this dataset has the potential to provide a more accurate measure of the effect size of danger content on retweet rates, minimizing the influence of variation in retweet rates that is unrelated to danger content (some tweets have unusually large retweet counts for idiosyncratic reasons and can have dramatic effects on the outcome of a negative binomial regression in a smaller sample, as exemplified by the change in the size of the effect of a tweet containing a photo on its retweet rate when a single outlier tweet was removed from the analysis in Study 1, model m2a vs m2b). As described in the main text, the sampling method in this study included three steps. In the first step, coders read through the Twitter feeds of the 25 accounts and picked tweets that they thought could possibly be about danger. They used the same operational definition as coders in Study 1, but were instructed to apply it loosely in the first step, including tweets liberally. The purpose of this step was to create a dataset that was more heavily populated with danger tweets than the twitter feed they came from, decreasing the amount of coder hours required to gather a sizeable collection of danger tweets. Each tweet that might be about danger, along with the tweet immediately preceding it (whether it was about danger or not), was included in the dataset used in the second step. The coder in this step did not include information about which tweets might be about danger in the dataset she created. In the second step, we added a pseudo-random sample (copy-pasted clusters) of tweets, 15-100% as large as the step-one sample from each Twitter feed (mean: 39%; each of the 25 Twitter feeds had a mean of 110 original tweets collected and a mean of 43 were added in this second stage; the number of original tweets ranged from 12 to 484 across the 25 Twitter feeds). We added these tweets to make the proportion of danger to non-danger tweets less transparent to coders. At this point the dataset contained 3,815 tweets. In the third step, coders implemented the same procedure as in Study 1, now using the operational definition of danger more strictly than in step 1. The coder who had collected the tweets for a given account in step 1 was never one of the two coders who rated the tweets in this third step. 828 tweets were rated by both coders as being about danger, and 1828 were rated by both coders as being not about danger. After excluding tweets that were part of a conversation between users (75 danger and 391 non-danger), and 4 tweets with missing data, 2186 tweets remained (752 danger and 1434 non-danger tweets). 20% of the tweets in this dataset were about danger. This made danger content an easier signal for coders to detect in this dataset than in the Study 1 dataset, and Cohen’s kappa increased to .86, p < .0001. Expected (by chance) agreement in this dataset was 56%, and coders agreed in 94% of cases where neither coder used the code for “unsure.” This inter-rater agreement percent is similar to that in Study 1, suggesting that it may be at a ceiling that exists because of coder error. This provides support for the conjecture that Cohen’s kappa was low in Study 1 because of the high, base rate-induced expected inter-rater agreement.
In this study, we collected tweets from the Twitter feeds of 10 police departments and 3 fire departments from the top 10 US cities by population. Three cities—Chicago, Houston and Philadelphia were represented by both a police department Twitter and a fire department Twitter. The average Twitter account included in this study had 22,488 followers, with a range of 2,998 to 103,000 followers. Twitter limits how far in the past a viewer can examine the feed of a given account at any one time. We collected all tweets between the date of collection (which was in May and June, 2013) and the earliest tweet we could view from each Twitter feed. In total, we collected 10,435 tweets. Coders did not rate 1,047 of these tweets because of time constraints. This left 9,388 rated tweets, which is the number we report in the main text. Two coders rated each tweet for whether or not it was about danger. All coders knew the hypothesis in the study. Coders were given the following guidance to determine whether or not a tweet was about danger: “If you lived in the place where the tweet is coming from, would the information in the tweet be (at least theoretically) useful to you in avoiding being harmed?” They were told that harm could be economic or bodily, to oneself or to friends or family. In this study, all tweets came from sources that were located in a specific city. After completing the danger coding, one coder recorded how many retweets each tweet had, and coded each tweet for whether it contained a number of content types. These included whether the tweet referenced another twitter user (in which case it would be part of a Twitter conversation, and could appear not only on the original tweeter’s page but also on the page of the recipient), whether it contained an image or a video, and whether it contained a link. Importantly, coders did not have access to information about the number of retweets a tweet had while they were coding tweets for danger content. Only tweets that both danger content coders agreed on and neither coder used an “unsure” code for were used in the analysis. We tested coder agreement only for tweets for which neither coder used the “unsure” code because different coders likely have different thresholds for using the “unsure” code. Where one coder might indicate that a tweet is about danger, another one might indicate that he is unsure. Similarly, if both coders use the “unsure” code for a given tweet, it does not mean that the coders agree about the tweet’s content. Such cases therefore do not tell us about coder agreement. Cohen’s kappa, a test of inter-rater agreement for categorical items, takes into account the base rates at which coders assign different codes to calculate the proportion of inter-rater agreement beyond what would be expected by chance given those base rates. Cohen’s kappa in our data was low, k = .33, but significant, p < .0001. Most tweets in this dataset are not about danger. The number of codes indicating that a tweet does not contain danger is therefore much higher than the number indicating that a tweet does contain danger. This drives the percent chance that two coders agree on a given tweet up to 88%, leaving very little room for coders to agree above that chance percentage. Our coders actually agreed on 92% of tweets. That means that in 8% of cases, one coder indicated that a tweet was not about danger while the other indicated that it was. This rate is unsurprising given that coders have to detect a rare signal (only ~2% of tweets in this dataset were rated by both coders as being about danger) in a large dataset, and that the operational definition of danger content was not always easily applicable (see appendix A for a sample of tweets from the Study 1 dataset). This reduced the number of possible tweets to analyze from 9,388 to 8,963. Of these, 203 were agreed by both coders to be danger tweets, and 8,760 were agreed to be non-danger tweets. Tweets that were part of a conversation with another user, identified by their inclusion of an @ symbol preceding another Twitter user’s username (this is the syntax the Twitter platform uses to have users send tweets to each other; when a tweet is sent to another user, the receiving user is privately notified of the tweet) were removed, leaving 8,177 tweets (198 danger tweets and 7,979 non-danger tweets). We chose to remove tweets that were considered part of a conversation between users because it is unclear how being a part of a private conversation could influence the likelihood of a Tweet being retweeted in our dataset. This removal shifted the data slightly against our hypothesis—the mean retweet count of danger tweets was reduced by .20, while that of non-danger retweets was reduced by only .07.
Study 2
In this study we used a different sampling method that increased the number and proportion of danger tweets in our dataset. Tweets were collected from a total of 25 Twitter feeds from the accounts of banks, parenting magazines, local news sources and weather services. We chose these four types of Twitter accounts because we reasoned that they would be likely to occasionally tweet about danger, but that their followers would not be following them specifically because they tweet about danger. The accounts in each category (banks, parenting magazines, etc) were chosen because they had relatively many followers. Twitter accounts included in this study had a mean of 172,236 followers—approximately 8 times more than the mean of 22,488 from accounts included in Study 1. Follower counts ranged from approximately 15,200 to 830,000. Because number of followers is a measure of the size of the audience that a tweeter reaches, this dataset has the potential to provide a more accurate measure of the effect size of danger content on retweet rates, minimizing the influence of variation in retweet rates that is unrelated to danger content (some tweets have unusually large retweet counts for idiosyncratic reasons and can have dramatic effects on the outcome of a negative binomial regression in a smaller sample, as exemplified by the change in the size of the effect of a tweet containing a photo on its retweet rate when a single outlier tweet was removed from the analysis in Study 1, model m2a vs m2b). As described in the main text, the sampling method in this study included three steps. In the first step, coders read through the Twitter feeds of the 25 accounts and picked tweets that they thought could possibly be about danger. They used the same operational definition as coders in Study 1, but were instructed to apply it loosely in the first step, including tweets liberally. The purpose of this step was to create a dataset that was more heavily populated with danger tweets than the twitter feed they came from, decreasing the amount of coder hours required to gather a sizeable collection of danger tweets. Each tweet that might be about danger, along with the tweet immediately preceding it (whether it was about danger or not), was included in the dataset used in the second step. The coder in this step did not include information about which tweets might be about danger in the dataset she created. In the second step, we added a pseudo-random sample (copy-pasted clusters) of tweets, 15-100% as large as the step-one sample from each Twitter feed (mean: 39%; each of the 25 Twitter feeds had a mean of 110 original tweets collected and a mean of 43 were added in this second stage; the number of original tweets ranged from 12 to 484 across the 25 Twitter feeds). We added these tweets to make the proportion of danger to non-danger tweets less transparent to coders. At this point the dataset contained 3,815 tweets. In the third step, coders implemented the same procedure as in Study 1, now using the operational definition of danger more strictly than in step 1. The coder who had collected the tweets for a given account in step 1 was never one of the two coders who rated the tweets in this third step. 828 tweets were rated by both coders as being about danger, and 1828 were rated by both coders as being not about danger. After excluding tweets that were part of a conversation between users (75 danger and 391 non-danger), and 4 tweets with missing data, 2186 tweets remained (752 danger and 1434 non-danger tweets). 20% of the tweets in this dataset were about danger. This made danger content an easier signal for coders to detect in this dataset than in the Study 1 dataset, and Cohen’s kappa increased to .86, p < .0001. Expected (by chance) agreement in this dataset was 56%, and coders agreed in 94% of cases where neither coder used the code for “unsure.” This inter-rater agreement percent is similar to that in Study 1, suggesting that it may be at a ceiling that exists because of coder error. This provides support for the conjecture that Cohen’s kappa was low in Study 1 because of the high, base rate-induced expected inter-rater agreement.
Data Source:
View help for Data Source
Twitter feeds, see "Sampling" for details.
Collection Mode(s):
View help for Collection Mode(s)
mixed mode
Unit(s) of Observation:
View help for Unit(s) of Observation
Retweet count
Related Publications
Published Versions
Report a Problem
Found a serious problem with the data, such as disclosure risk or copyrighted content? Let us know.
This material is distributed exactly as it arrived from the data depositor. ICPSR has not checked or processed this material. Users should consult the investigator(s) if further information is desired.