Research Article

Fake news on Twitter during the 2016 U.S. presidential election

See allHide authors and affiliations

Science  25 Jan 2019:
Vol. 363, Issue 6425, pp. 374-378
DOI: 10.1126/science.aau2706

Finding facts about fake news

There was a proliferation of fake news during the 2016 election cycle. Grinberg et al. analyzed Twitter data by matching Twitter accounts to specific voters to determine who was exposed to fake news, who spread fake news, and how fake news interacted with factual news (see the Perspective by Ruths). Fake news accounted for nearly 6% of all news consumption, but it was heavily concentrated—only 1% of users were exposed to 80% of fake news, and 0.1% of users were responsible for sharing 80% of fake news. Interestingly, fake news was most concentrated among conservative voters.

Science, this issue p. 374; see also p. 348

Abstract

The spread of fake news on social media became a public concern in the United States after the 2016 presidential election. We examined exposure to and sharing of fake news by registered voters on Twitter and found that engagement with fake news sources was extremely concentrated. Only 1% of individuals accounted for 80% of fake news source exposures, and 0.1% accounted for nearly 80% of fake news sources shared. Individuals most likely to engage with fake news sources were conservative leaning, older, and highly engaged with political news. A cluster of fake news sources shared overlapping audiences on the extreme right, but for people across the political spectrum, most political news exposure still came from mainstream media outlets.

In 1925, Harper’s Magazine published an article titled “Fake news and the public,” decrying the ways in which emerging technologies had made it increasingly difficult to separate rumor from fact (1). Nearly a century later, fake news has again found its way into the limelight, particularly with regard to the veracity of information on social media and its impact on voters in the 2016 U.S. presidential election. At the heart of these concerns is the notion that a well-functioning democracy depends on its citizens being factually informed (2). To understand the scope and scale of misinformation today and most effectively curtail it going forward, we need to examine how ordinary citizens experience misinformation on social media platforms.

To this end, we leveraged a panel of Twitter accounts linked to public voter registration records to study how Americans on Twitter interacted with fake news during the 2016 election season. Of primary interest are three simple but largely unanswered questions: (i) How many stories from fake news sources did individuals see and share on social media? (ii) What were the characteristics of those who engaged with these sources? (iii) How did these individuals interact with the broader political news ecosystem? Initial reports were alarming, showing that the most popular fake news stories in the last 3 months of the presidential campaign generated more shares, reactions, and comments on Facebook than the top real news stories (3). However, we do not yet know the scope of the phenomenon, in part because of the difficulty of reliably measuring human behavior from social media data (4). Existing studies of fake news on social media have described its spread within platforms (5, 6) and highlighted the disproportionate role played by automated accounts (7), but they have been unable to make inferences about the experiences of ordinary citizens.

Outside of social media, fake news has been examined among U.S. voters via surveys and web browsing data (8, 9). These methods suggest that the average American adult saw and remembered one or perhaps several fake news stories about the 2016 election (8), that 27% of people visited a fake news source in the final weeks before the election, and that visits to these sources constituted only 2.6% of hard news site visits (9). They also show a persistent trend of conservatives consuming more fake news content, with 60% of fake news source visits coming from the most conservative 10% of Americans (9). However, because social media platforms have been implicated as a key vector for the transmission of fake news (8, 9), it is critical to study what people saw and shared directly on social media.

Finally, social media data also provide a lens for understanding viewership patterns. Previous studies of the online media ecosystem have found evidence of insulated clusters of far-right content (10), rabbit holes of conspiratorial content (11), and tight clusters of geographically dispersed content (12). We wish to understand how fake news sources were positioned within this ecosystem. In particular, if people who saw content from fake news sources were isolated from mainstream content, they may have been at greater risk of adopting misinformed beliefs.

Data and definitions

Fake news sources

We follow Lazer et al. (13), who defined fake news outlets as those that have the trappings of legitimately produced news but “lack the news media’s editorial norms and processes for ensuring the accuracy and credibility of information.” The attribution of “fakeness” is thus not at the level of the story but at that of the publisher [similar to (9)].

We distinguished among three classes of fake news sources to allow comparisons of different operational definitions of fake news. The three classes correspond to differences in methods of generating lists of sources as well as perceived differences in the sites’ likelihoods of publishing misinformation. We labeled as “black” a set of websites taken from preexisting lists of fake news sources constructed by fact-checkers, journalists, and academics (8, 9) who identified sites that published almost exclusively fabricated stories [see supplementary materials (SM) section S.5 for details]. To measure fake news more comprehensively, we labeled additional websites as “red” or “orange” via a manual annotation process of sites identified by Snopes.com as sources of questionable claims. Sites with a red label (e.g., Infowars.com) spread falsehoods that clearly reflected a flawed editorial process, and sites with an orange label represented cases where annotators were less certain that the falsehoods stemmed from a systematically flawed process. There were 171 black, 64 red, and 65 orange fake news sources appearing at least once in our data.

Voters on Twitter

To focus on the experiences of real people on Twitter, we linked a sample of U.S. voter registration records to Twitter accounts to form a panel (see SM S.1). We collected tweets sent by the 16,442 accounts in our panel that were active during the 2016 election season (1 August to 6 December 2016) and obtained lists of their followers and followees (accounts they followed). We compared the panel to a representative sample of U.S. voters on Twitter obtained by Pew Research Center (14) and found that the panel is largely reflective of this sample in terms of age, gender, race, and political affiliation (see SM S.2).

We estimated the composition of each panel member’s news feed from a random sample of the tweets posted by their followees. We called these tweets, to which an individual was potentially exposed, their “exposures.” We also analyzed the panel’s aggregate exposures, in which, for example, a tweet from an account followed by five panel members was counted five times. We restricted our analysis to political tweets that contained a URL for a web page outside of Twitter (SM S.3 and S.4). Because we expected political ideology to play a role in engagement with fake news sources, we estimated the similarity of each person’s feed to those of registered Democrats or Republicans. We discretized the resulting scores to assign people into one of five political affinity subgroups: extreme left (L*), left (L), center (C), right (R), and extreme right (R*). Individuals with less than 100 exposures to political URLs were assigned to a separate “apolitical” subgroup (SM S.10).

Results

Prevalence and concentration

When totaled across all panel members and the entire 2016 U.S. election season, 5.0% of aggregate exposures to political URLs were from fake news sources. The fraction of content from fake news sources varied by day (Fig. 1A), increasing (in all categories) during the final weeks of the campaign (SM S.7). Similar trends were observed in content sharing, with 6.7% of political URLs shared by the panel coming from fake news sources.

Fig. 1 Prevalence over time and concentration of fake news sources.

(A) Daily percentage of exposures to black, red, and orange fake news sources, relative to all exposures to political URLs. Exposures were summed across all panel members. (B to D) Empirical cumulative distribution functions showing distribution of exposures among websites (B), distribution of shares by panel members (C), and distribution of exposures among panel members (D). The x axis represents percentage of websites or panel members responsible for a given percentage (y axis) of all exposures or shares. Black, red, and orange lines represent fake news sources; blue line denotes all other sources. This distribution was not comparable for (B) because of the much larger number of sources in its tail and the fundamentally different selection process involved.

However, these aggregate volumes mask the fact that content from fake news sources was highly concentrated, both among a small number of websites and a small number of panel members. Within each category of fake news, 5% of sources accounted for more than 50% of exposures (Fig. 1B). There were far more exposures to red and orange sources than to black sources (2.4, 1.9, and 0.7% of aggregate exposures, respectively), and these differences were largely driven by a handful of popular red and orange sources. The top seven fake news sources—all red and orange—accounted for more than 50% of fake news exposures (SM S.5).

Figure 1, C and D, shows that content was also concentrated among a small fraction of panel members for all categories of fake news sources. A mere 0.1% of the panel accounted for 79.8% of shares from fake news sources, and 1% of panel members consumed 80.0% of the volume from fake news sources. These levels of concentration were not only high in absolute terms, they were also unusually high relative to multiple baselines both within and beyond politics on Twitter (SM S.15).

The “supersharers” and “superconsumers” of fake news sources—those accountable for 80% of fake news sharing or exposure—dwarfed typical users in their affinity for fake news sources and, furthermore, in most measures of activity. For example, on average per day, the median supersharer of fake news (SS-F) tweeted 71.0 times, whereas the median panel member tweeted only 0.1 times. The median SS-F also shared an average of 7.6 political URLs per day, of which 1.7 were from fake news sources. Similarly, the median superconsumer of fake news sources had almost 4700 daily exposures to political URLs, as compared with only 49 for the median panel member (additional statistics in SM S.9). The SS-F members even stood out among the overall supersharers and superconsumers, the most politically active accounts in the panel (Fig. 2). Given the high volume of posts shared or consumed by superspreaders of fake news, as well as indicators that some tweets were authored by apps, we find it likely that many of these accounts were cyborgs: partially automated accounts controlled by humans (15) (SM S.8 and S.9). Their tweets included some self-authored content, such as personal commentary or photos, but also a large volume of political retweets. For subsequent analyses, we set aside the supersharer and superconsumer outlier accounts and focused on the remaining 99% of the panel.

Fig. 2 Shares and exposures of political URLs by outlier accounts, many of which were also SS-F accounts.

(A) Overall supersharers: top 1% among panelists sharing any political URLs, accounting for 49% of all shares and 82% of fake news shares. Letters above bars indicate political affinities. (B) Overall superconsumers: top 1% among panelists exposed to any political URLs, accounting for 12% of all exposures and 74% of fake news exposures. Black, red, and orange bars represent content from fake news sources; yellow or gray bars denote nonfake content (SS-F accounts are shown in yellow). The rightmost bar shows, for scale, the remainder of the panel’s fake news shares (A) or exposures (B).

Who was exposed to fake news sources?

Excluding outliers, panel members averaged 204 potential exposures [95% confidence interval (CI): 185 to 224] to fake news sources during the last month of the campaign. If 5% of potential exposures were actually seen (16), this would translate to, on average, about 10 exposures (95% CI: 9 to 11) to fake news sources during that month. The average proportion of fake news sources (among political URLs) in an individual’s feed was 1.18% (95% CI: 1.12 to 1.24%). However, there was a large and significant discrepancy between left and right (P < 0.0001). For example, people who had 5% or more of their political exposures from fake news sources constituted 2.5% of individuals on the left (L and L*) and 16.3% of the right (R and R*). See Fig. 3 for the distribution among all political affinity groups and SM S.10 for additional statistics.

Fig. 3 Probability density estimates for the percentage of content from fake news sources in people’s news feeds (for people with any fake news exposures).

The number of individuals in each subgroup (N) and the percent with nonzero exposures to fake news sources are shown.

According to binomial regressions fit separately to each political affinity group, the strongest predictors of the proportion of fake news sources in an individual’s feed were the individual’s age and number of political URLs in the individual’s feed (Fig. 4, A and B, and SM S.11). A 10-fold increase in overall political exposures was associated with doubling the proportion of exposures to fake news sources (Fig. 4A)—that is, a 20-fold increase in the absolute number of exposures to fake news sources. This superlinear relationship holds for all five political affinity groups and suggests that a stronger selective exposure process exists for individuals with greater interest in politics. Figure 4B shows that age was positively and significantly associated with increased levels of exposure to fake news sources across all political groups.

Fig. 4 Key individual characteristics associated with exposure to and sharing of fake news sources.

The proportion of an individual’s political exposures coming from fake news sources as a function of (A) number of political exposures, excluding fake news sources, and (B) age. Estimates are based on binomial regression models fitted separately to each political affinity subgroup. Blue, liberal; black, center; red, conservative. (C to E) An individual’s likelihood of sharing one or more URLs from fake news sources as a function of (C) number of shares of political URLs, (D) number of exposures to fake news sources, and (E) political affinity. (F to I) Likelihood of a liberal (D) or conservative (R) individual sharing a political URL to which they have been exposed, depending on the political congruency and veracity of the source: (F) congruent and fake, (G) incongruent and fake, (H) congruent and nonfake, and (I) incongruent and nonfake. Brackets indicate significantly different pairs: **P < 0.01, ***P < 0.001. All estimates and 95% CIs [gray shaded regions in (A) to (D); line segments in (E) to (I)] are based on regression models specified in SM S.11 to S.13, with the remaining model variables held constant to their median or most common level.

Other factors were also associated with small increases in exposures to fake news sources: Men and whites had slightly higher rates, as did voters in swing states and voters who sent more tweets (excluding political URLs analyzed here). These findings are in line with previous work that showed concentration of polarizing content in swing states (17) and among older white men (18). However, effects for the above groups were small (less than one percentage point increase in proportion of exposures) and somewhat inconsistent across political groups.

Who shared fake news sources?

Political affinity was also associated with the sharing of content from fake news sources. Among those who shared any political content on Twitter during the election, fewer than 5% of people on the left or in the center ever shared any fake news content, yet 11 and 21% of people on the right and extreme right did, respectively (P < 0.0001) (see SM S.10). A logistic regression model showed that the sharing of content from fake news sources (as a binary variable) was positively associated with tweeting about politics, exposure to fake news sources, and political affinity, although the disparity across the political spectrum was smaller than suggested by univariate statistics (Fig. 4, C to E, and SM S.12). Other factors such as age and low ratio of followers to followees were also positively associated with sharing fake news sources, but effect sizes were small.

Next, we examined rates of sharing per exposure, modeling the likelihood that an individual would share a URL after being exposed to it (SM S.13). Conditioned on exposure to a politically congruent source, there were no significant differences in sharing rates between liberals and conservatives and across fake and nonfake sources (Fig. 4, F to I). Incongruent sources were shared at significantly lower rates than congruent sources (P < 0.01), with two exceptions. First, conservatives shared congruent and incongruent nonfake sources at similar rates. Second, we lacked the statistical power to assess sharing rates of conservatives exposed to liberal fake news, owing to the rarity of these events.

These findings highlight congruency as the dominant factor in sharing decisions for political news. This is consistent with an extensive body of work showing that individuals evaluate belief-incongruent information more critically than belief-congruent information (19). Our results suggest that fake news may not be more viral than real news (5).

Fake news and the media ecosystem

We extracted relationships between news websites according to a measure of shared audience and identified distinct group structure in a network of these relationships (Fig. 5). In a manner similar to other analyses of media coconsumption (12), we constructed this coexposure network by using a technique that identifies statistically significant connections between sites (20) (SM S.14). We identified four groups of websites in this network: Groups 1 to 3 were large subsets of nodes consistently grouped together by three different clustering algorithms, whereas Group 4 comprised the remaining nodes.

Fig. 5 Coexposure network.

Each node is a political news, blog, or fact-checking website. Edges link pairs of sites where an unusually high number of (nonoutlier) panel members were exposed to content from both sites, controlling for the popularity of each site. Filled nodes represent fake news sources. Node colors indicate groups (1, green; 2, orange; 3, purple; 4, gray) identified via an ensemble of clustering algorithms. Sites with the highest exposures are sized slightly larger. See fig. S10 for node labels.

Group 1, a collection of nationally relevant mainstream media outlets, contained only 18.4% of sites but was responsible for the vast majority of individuals’ political URL exposures, ranging from an average of 72% (extreme right) to 86% (extreme left) across political affinity subgroups. These sites were mostly centric in political leaning, whereas Group 2 was significantly more conservative, and Group 3 was significantly more liberal. Many sites in Group 2 were fake news sources (68.8%), substantially more than in Group 1 (3.6%), Group 3 (2.0%), or Group 4 (11.4%). Exposure to Group 2, unlike to Group 1, varied considerably by political affinity. For individuals on the extreme right, Group 2 generated the majority (64.2%) of exposures outside of Group 1, as compared with 38.6, 22.2, 13.9, and 8.0% for those on the right, center, left, and extreme left, respectively.

Further, the high network density within Group 2 (Fig. 5) reflects that consumers of content from fake news sources were often exposed to multiple fake news sources. Of the 7484 nonoutlier panel members exposed to at least two fake news URLs, 95.6% of them saw URLs from at least two fake news sources, and 56.4% encountered URLs from at least five. In summary, fake news sources seem to have been a niche interest: Group 2 accounted for only a fraction of the content seen by even the most avid consumers of fake news but nonetheless formed a distinct cluster of sites, many of them fake, consumed by a heavily overlapping audience of individuals mostly on the right.

Discussion

This study estimated the extent to which people on Twitter shared and were exposed to content from fake news sources during the 2016 election season. Although 6% of people who shared URLs with political content shared content from fake news sources, the vast majority of fake news shares and exposures were attributable to tiny fractions of the population. Though previous work has shown concentration of volume both in political conversations on Twitter (21) and in fake news consumption (9), the extreme levels we observed are notable. For the average panel member, content from fake news sources constituted only 1.18% of political exposures, or about 10 URLs during the last month of the election campaign. These averages are of similar magnitude to estimates from previous work (8, 9), which is noteworthy given the vastly different study populations and methodologies. As in these studies, we found that the vast majority of political exposures, across all political groups, still came from popular nonfake news sources. This is reassuring in contrast to claims of political echo chambers (22) and fake news garnering more engagement than real news during the election (3).

We identified several subpopulations that deserve particular attention when devising interventions. Within human populations, it will be important to understand the mechanisms that lead different groups to engage with fake news sources. For example, heightened engagement by older adults could result from cognitive decline (23), digital media literacy, stronger motivated reasoning, or cohort effects. Among Twitter accounts, despite stringent measures to exclude bot accounts, our remaining sample still included partially automated cyborgs. Unlike bots, which may struggle to attract human audiences, these cyborg accounts are embedded within human social networks. Given the increasing sophistication of automation tools available to the average user and the increasing volume and coordination of online campaigns, it will be important to study cyborgs as a distinct category of automated activity (14, 22). Regarding limitations to this study, note that our findings derive from a particular sample on Twitter, so they might not generalize to other platforms (24). Additionally, although our sample roughly reflects the demographics of registered voters on Twitter, it might systematically differ from that population in other ways.

Our findings suggest immediate points of leverage to reduce the spread of misinformation. Social media platforms could discourage users from following or sharing content from the handful of established fake news sources that are most pervasive. They could also adopt policies that disincentivize frequent posting, which would be effective against flooding techniques (25) while affecting only a small number of accounts. For example, platforms could algorithmically demote content from frequent posters or prioritize users who have not posted that day. For illustrative purposes, a simulation of capping political URLs at 20 per day resulted in a reduction of 32% of content from fake news sources while affecting only 1% of content posted by nonsupersharers. (SM S.15). Finally, because fake news sources have shared audiences, platforms could establish closer partnerships with fact-checking organizations to proactively watch top producers of misinformation and examine content from new sites that emerge in the vicinity of fake news sources in a coexposure network. Such interventions do raise the question of what roles platforms should play in constraining the information people consume. Nonetheless, the proposed interventions could contribute to delivering corrective information to affected populations, increase the effectiveness of corrections, foster equal balance of voice and attention on social media, and more broadly enhance the resiliency of information systems to misinformation campaigns during key moments of the democratic process.

Supplementary Materials

www.sciencemag.org/content/363/6425/374/suppl/DC1

Supplementary Text

Figs. S1 to S14

Tables S1 to S7

References (2868)

References and Notes

Acknowledgments: We thank L. Adamic, Y. Benkler, S. McCabe, and the three anonymous reviewers for thoughtful feedback on the manuscript and TargetSmart for access to voter data. The research was approved by Northeastern University’s Institutional Review Board. All opinions expressed in this article are those of the authors alone. Funding: D.L. acknowledges support by the ESRC ES N012283/1 and ARO W911NF-12-1-0556. Author contributions: D.L. conceived of the study. N.G., K.J., and L.F. collected and processed data, carried out statistical modeling, and produced visualizations. N.G., K.J., L.F., and B.S.-T. performed literature review and annotated data. All authors devised analyses and wrote and revised the paper. Competing interests: The authors declare no competing interests. Data and materials availability: Aggregate data and code from this study are freely available at Zenodo (26). Deidentified individual-level data are also available at Zenodo (27) upon signing a usage agreement stating that: (i) you shall not attempt to identify, reidentify, or otherwise deanonymize the dataset and (ii) you shall not further share, distribute, publish, or otherwise disseminate the dataset without Northeastern University’s prior written approval.
View Abstract

Stay Connected to Science

Navigate This Article