Social Influence Bias: A Randomized Experiment

Science  09 Aug 2013:
Vol. 341, Issue 6146, pp. 647-651
DOI: 10.1126/science.1240466

Follow the Leader?

The Internet has increased the likelihood that our decisions will be influenced by those being made around us. On the one hand, group decision-making can lead to better decisions, but it can also lead to “herding effects” that have resulted in financial disasters. Muchnik et al. (p. 647) examined the effect of collective information via a randomized experiment, which involved collaboration with a social news aggregation Web site on which readers could vote and comment on posted comments. Data were collected and analyzed after the Web site administrators arbitrarily voted positively or negatively (or not at all) as the first comment on more than 100,000 posts. False positive entries led to inflated subsequent scores, whereas false negative initial votes had small long-term effects. Both the topic being commented upon and the relationship between the poster and commenter were important. Future efforts will be needed to sort out how to correct for such effects in polls or other collective intelligence systems in order to counter social biases.


Our society is increasingly relying on the digitized, aggregated opinions of others to make decisions. We therefore designed and analyzed a large-scale randomized experiment on a social news aggregation Web site to investigate whether knowledge of such aggregates distorts decision-making. Prior ratings created significant bias in individual rating behavior, and positive and negative social influences created asymmetric herding effects. Whereas negative social influence inspired users to correct manipulated ratings, positive social influence increased the likelihood of positive ratings by 32% and created accumulating positive herding that increased final ratings by 25% on average. This positive herding was topic-dependent and affected by whether individuals were viewing the opinions of friends or enemies. A mixture of changing opinion and greater turnout under both manipulations together with a natural tendency to up-vote on the site combined to create the herding effects. Such findings will help interpret collective judgment accurately and avoid social influence bias in collective intelligence in the future.

We rely on ratings contributed by others to make decisions about which hotels, books, movies, political candidates, news, comments, and stories are worth our time and money (1). Given the widespread use and economic value of rating systems (24), it is important to consider whether they can successfully harness the wisdom of crowds to accurately aggregate individual information. Do they produce useful, unbiased, aggregate information about the quality of the item being rated? Or, as suggested by the experiments of Salganik et al. (5), are outcomes path dependent, yielding different aggregate ratings for items of equivalent quality?

Collective intelligence has recently been heralded as a harbinger of accelerated human potential (6). But, social influence on individuals’ perceptions of quality and value could create herding effects that lead to suboptimal market outcomes (7, 8); rich-get-richer dynamics that exaggerate inequality (912); a group think mentality that distorts the truth (13); and measurable disruptions in the wisdom of crowds (14). If perceptions of quality are biased by social influence, attempts to aggregate collective judgment and socialize choice could be easily manipulated, with dramatic consequences for our markets, our politics, and our health.

The recent availability of population-scale data sets on rating behavior and social communication enable novel investigations of social influence (1, 1520). Unfortunately, our understanding of the impact of social influence on collective judgment is limited because empirically distinguishing influence from uninfluenced agreement on true quality is nearly impossible in observational data (2127). For example, popular products may be popular because of the irrational effect of past positive ratings, or alternatively, the best products may become popular because they are of the highest quality. We must distinguish these explanations to determine the extent to which social influence creates irrational herding.

We therefore designed and analyzed a large-scale randomized experiment to quantify the effects of social influence on users’ ratings and discourse on a social news aggregation Web site, where users contribute news articles and discuss them. Users of the site that we studied write comments in response to posted articles, and other users can then “up-vote” or “down-vote” these comments, yielding an aggregate current rating for each posted comment equal to the number of up-votes minus the number of down-votes. Users do not observe the comment scores before clicking through to comments—each impression of a comment is always accompanied by that comment’s current score, tying the comment to the score during users’ evaluation—and comments are not ordered by their popularity, mitigating selection bias on high (or low) rated comments. Similar scoring mechanisms are widely used on the Web to reward users for supplying insightful or interesting analysis, while penalizing those posting irrelevant, redundant, or low-quality comments. The vast majority of interuser relations occur on the Web site, in contrast to Web sites whose members also interact offline. The data therefore provide a unique opportunity to comprehensively study social influence bias in rating behavior.

Over 5 months, 101,281 comments submitted on the site were randomly assigned to one of three treatment groups: up-treated, down-treated, or control. Up-treated comments were artificially given an up-vote (a +1 rating) upon the comment’s creation, whereas down-treated comments were given a down-vote (a –1 rating) upon the comment’s creation. Users were unaware of the manipulation and unable to trace votes to any particular user. As a result of the randomization, comments in the control and treatment groups were identical in expectation along all dimensions that could affect users’ rating behavior except for the current rating. This manipulation created a small random signal of positive or negative judgment by prior raters for randomly selected comments that have the same quality in expectation, enabling estimates of the effects of social influence holding comment quality and all other factors constant. The 101,281 experimental comments (of which 4049 were positively treated and 1942 were negatively treated to reflect the natural proportions of up- and down-votes on the site) were viewed more than 10 million times and rated 308,515 times by subsequent users.

We may sample users multiple times, and nonrandom heterogeneity may exist in users’ comment-generating process, their selection of comments to vote on, and in relationships between commenters and raters. We therefore estimated hierarchical Bayesian models of voting behavior and specified commenter, rater, and commenter-rater pair random effects; i.e., the confidence intervals are based on repeated resampling, creating a distribution of parameter estimates from which the 95% confidence bands are derived (see materials and methods in the supplementary materials).

We first compared the probabilities that comments in each group would be up-voted or down-voted by the first viewer after the manipulation. These probabilities measure the immediate effect of current ratings on users’ rating behavior. We then analyzed comments’ long-run ratings distributions and final mean scores by aggregating all users’ ratings for comments in the three groups over time.

Figure 1A shows the immediate up-vote and down-vote probabilities for the first viewer of comments in each of the three categories. Up-votes were 4.6 times as common as down-votes on this site, with 5.13% of all comments receiving an up-vote by the first viewer of the comment and only 0.82% of comments receiving a down-vote by the first viewer. The up-vote treatment significantly increased the probability of up-voting by the first viewer by 32% over the control group (P = 1.0 × 10–6) (Fig. 1A). Up-treated comments were not down-voted significantly more or less frequently than the control group, so users did not tend to correct the upward manipulation. In the absence of a correction, positive herding accumulated over time.

Fig. 1 Effect of manipulation on voting behavior.

The positively manipulated treatment group (up-treated), the negatively manipulated treatment group (down-treated), and the control group (dotted line) are shown. The probabilities to up-vote (A) and down-vote (B) positively manipulated, negatively manipulated, and control group comments are shown by the first unique viewer; 95% confidence intervals are inferred from Bayesian logistic regression with commenter, rater, and commenter-rater random effects. (C) The mean final scores of positively manipulated, negatively manipulated, and control group comments are shown with 95% confidence intervals inferred from Bayesian linear regression of the final comment score with commenter random effects. Final mean scores on this Web site are measured as the number of up-votes minus the number of down-votes. We discuss the implications of this measurement in greater detail in the supplementary materials.

The positive manipulation created a positive social influence bias that persisted over our 5-month observation window, generating accumulating herding effects that increased comments’ final mean ratings by 25% relative to the final mean ratings of control group comments (χ2 test; P = 2.3 × 10–11) (Fig. 1C), and Kolmogorov-Smirnov (K-S) tests showed that the final score distribution of up-treated comments was significantly shifted toward higher scores (K-S test statistic: 0.083; P = 1.2 × 10–23). Comments in the up-treated group were also significantly more likely than those in the control group to accumulate exceptionally high scores. Up-treated comments were 30% more likely to reach or exceed a score of 10 (6.4% versus 4.9% in the control group, χ2 test; P = 2.0 × 10–5). The small manipulation of a single random up-vote when the comment was created resulted in significantly higher accumulated ratings due to social influence.

Positive and negative social influence created asymmetric herding effects. The probability of down-treated comments receiving subsequent down-votes was 0.014, whereas the probability of control comments receiving a down-vote was 0.007; a significant difference (χ2 test; P = 1.1 × 10–3) (Fig. 1B). However, this effect was offset by a larger correction effect. The probability that a down-treated comment would subsequently be up-voted was 0.099, whereas the probability that a control comment would be up-voted was significantly different at 0.054 (χ2 test; P = 1.0 × 10–30) (Fig. 1A). This correction neutralized social influence in the ratings of negatively manipulated comments, and their final mean ratings were not statistically different from the control group’s final mean ratings (Fig. 1C).

We next estimated changes in the final mean score for up-treated comments compared to control comments in the seven most active topic categories on the Web site. We found significant positive herding effects for comment ratings in “politics,” “culture and society,” and “business,” but no detectable herding behavior for comments in “economics,” “IT,” “fun,” and “general news” (Fig. 2). These differences are not due to the frequency of commenting in these categories, as categories with significant differences in control and treatment ratings and those with no significant differences had similar levels of activity. There was no significant negative herding in any category.

Fig. 2 Effects of topic on herding.

Mean final scores of positively manipulated and control group comments are shown with 95% confidence intervals inferred from Bayesian linear regression of the final comment score with commenter random effects across the seven most active topic categories on the site, ordered by the magnitude of the difference between the mean final score of positively manipulated comments and the mean final score of control comments in each category

Friendship also moderated the impact of social influence on rating behavior (Fig. 3, A and B). The Web site has a feature whereby users can indicate that they “like” or “dislike” other users, forming “friends” and “enemies” social preference graphs. Unsurprisingly, friends of the commenter were more likely to up-vote a comment than those who disliked him or her (9.2% versus 2.7%, χ2 test; P = 2.7 × 10–49) [compare the average (dotted line) in Fig. 3A to the average (dotted line) in Fig. 3B]. Friends also tended to herd on current positive ratings (friends’ probability to up-vote a positively manipulated comment: 0.122 versus friends’ probability to up-vote a control comment: 0.092; χ2 test; P = 1.4 × 10–2) and to correct comments with negatively manipulated ratings (friends’ probability to up-vote a negatively manipulated comment: 0.176 versus friends’ probability to up-vote a control comment: 0.092; χ2 test; P = 4.0 × 10–12) (Fig. 3A), mirroring the cooperation found in human social networks (28). By contrast, enemies of the commenter were not susceptible to social influence. Enemies’ ratings were unaffected by our treatments, possibly because of the small sample of potential first ratings by enemies (though there are a substantial number of enemies in the community, they are less active) (Fig. 3B).

Fig. 3 Effects of friendship on rating behavior.

The figure shows the probability of a friend (A) and enemy (B) of the commenter to up-vote positively manipulated, negatively manipulated, and control group comments. Friends and enemies are defined as users who had previously clicked a button on the Web site labeling the commenter as someone they “liked” or “disliked,” respectively.

Finally, social influence in ratings behavior did not affect discourse in our setting during the 5-month observation period. Neither the positive nor the negative manipulation affected the average number of replies (Fig. 4A) or the average depth of the discussion tree created in response to a comment (Fig. 4B).

Fig. 4 Effects on subsequent discourse.

The figure displays the average number of responses (A) and the average depth of the discussion tree (B) that developed in response to positively manipulated, negatively manipulated, and control group comments; 95% confidence intervals are inferred from Bayesian linear regressions with author random effects.

Several data-generating processes could explain our findings. A selection effect could inspire different populations of voters to turn out to rate the item (selective turnout)—for example, if the negative manipulation inspired voters who tend to down-vote (negative voters) to vote in higher proportion. Alternatively, prior ratings could bias users’ voting behavior by changing their opinions about comment quality and therefore their votes (opinion change). We analyzed changes in turnout (the likelihood of rating rather than just viewing comments) and changes in positivity (the proportion of positive ratings) across subgroups in our study population to identify variance in our results explained by turnout effects and opinion change, respectively. We divided raters and commenters, on the basis of their rating history, along four subgroup dimensions by their positivity (proportion of positive votes), the commenters’ quality (prior scores of their comments), the frequency with which a rater rated a particular commenter, and whether raters were friends or enemies with commenters. We then compared treatment effects with expected voting behavior in these subgroups. Our analysis revealed several behavioral mechanisms that together explain the experimental results (see “Estimating Behavioral Mechanisms” in the supplementary text).

First, both treatments increased turnout (by 82 and 28% for first viewers of down- and up-treated comments, respectively), but neither created differential turnout for different types of voters. This suggests that selection, or differential turnout by voter type (e.g., selecting different proportions of positive or negative voters, or frequent or infrequent voters), cannot explain our results. Second, we found statistically significant opinion change in two of the four subgroup dimensions. The positive treatment created a systematic increase in the proportion of positive ratings for raters with little prior experience rating the particular commenter whose comment was manipulated (a 7% increase in the ratio of positive to negative votes and a 50% increase in the probability of up-votes for these raters) and no decrease in positivity in any subgroup. This implies that positive opinion change explains part of the variation in positive herding. The negative treatment, by contrast, created countervailing opinion change for positive and negative raters, cancelling out any aggregate-level evidence of opinion change (Fig. 5). This in part explains why we find no aggregate trend in either direction for the negative treatment. Third, both treatments created a uniform increase in turnout across voter types. This overall increase in turnout, combined with a general preference for positivity on the site, created a tendency toward positive ratings under both treatments. Prior work has shown a preference for positivity in ratings in other contexts (1), which suggests that these results will generalize. Together, these findings suggest that a mixture of changing opinion and the natural tendency to up-vote together with greater turnout under both manipulations, combine to create the herding effects we see (see “Estimating Behavioral Mechanisms” in the supplementary text).

Fig. 5 Effects on turnout versus positivity.

The figure displays treatment effects of the negative and positive manipulation on turnout (the likelihood of rating) and positivity (the proportion of positive votes) for negative (positive) raters, defined as active raters (with at least 100 votes on control comments) who display less than (greater than) the median proportion of positive votes on control comments. Results displayed in red are statistically significant at the 95% level, whereas results displayed in gray are not. (A) Treatment effect of the negative manipulation on positivity; (B) treatment effect of the positive manipulation on positivity; (C) treatment effect of the negative manipulation on turnout; and (D) treatment effect of the positive manipulation on turnout. In each panel, the first estimate is the treatment effect on negative raters, the second estimate is the treatment effect on positive raters, and the third estimate is the ratio of the treatment effects on negative/positive raters. The treatment effects displayed are odds ratios of the treatment effect on the treated compared to the control for negative and positive raters. The negative manipulation created countervailing opinion change for positive and negative raters, increasing the positivity of negative raters and decreasing the positivity of positive raters, whereas both treatments increased turnout uniformly across both subgroups.

Our results demonstrate that whereas positive social influence accumulates, creating a tendency toward ratings bubbles, negative social influence is neutralized by crowd correction (29). Our findings suggest that social influence substantially biases rating dynamics in systems designed to harness collective intelligence. Future research that further explores the mechanisms driving individual and aggregate ratings—especially in in vivo social environments—will be essential to our ability to interpret collective judgment accurately and to avoid social influence bias in collective intelligence. We anticipate that our experiment will inspire more sophisticated analyses of social influence bias in electoral polling, stock market prediction, and product recommendation and that these results will be used to adapt rating and review technologies to account for social influence bias in their outputs.

Supplementary Materials

Materials and Methods

Supplementary Text

Tables S1 to S21

Figs. S1 to S16

References (3057)

References and Notes

  1. Acknowledgments: We thank D. Eckles and D. Watts for invaluable discussions. This work was supported by a Microsoft Research faculty fellowship (S.A.) and by NSF Career Award 0953832 (S.A.). The research was approved by the New York University institutional review board. There are legal obstacles to making the data available and revealing the name of the Web site, but code is available upon request. All of the user data that we analyzed is publically available, except the treatment assignment and the random identifier that links a deidentified user to a vote, and therefore does not contain any other information that cannot be obtained by crawling the Web site. The randomized testing that the Web site performed is covered by the Web site’s terms of service. Opt-in permissions were granted by the users when they registered for the Web site. No data on nonregistered users was collected.
View Abstract


Navigate This Article