Superspreading genomes

See allHide authors and affiliations

Science  05 Feb 2021:
Vol. 371, Issue 6529, pp. 574-575
DOI: 10.1126/science.abg0100

Individual contributions to epidemic spread vary. Although some infections may not cause any secondary cases, others are associated with so-called “superspreading” events in which numerous infections result from the same case. These events can shape the course of an epidemic, but their detection remains challenging. On page 588 of this issue, Lemieux et al. (1) show that phylogenetic analyses of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome sequence data help quantify the prevalence and impact of superspreading events on COVID-19 outbreaks.

Superspreading gained attention during the 2002–2004 SARS epidemic, and mathematical models highlighted its contrasting effects, by which the phenomenon is predicted to increase the probability that an outbreak will go extinct by chance but also to fuel the growth of outbreaks that evade extinction (2). Superspreading could also be involved in the adaptation of emerging infectious diseases to new hosts (3). Such events have since been identified in measles (2), Middle East respiratory syndrome (4), and Ebola (5) outbreaks. Their presence can be detected from cluster sizes (4) or spatial incidence data (5), but they appear most clearly in contact tracing data. However, studies to generate these data are expensive, invasive, and time consuming. Their quality is also limited because many people are unreliable respondents and list no or multiple potential sources of infection. Digital tracking could increase data quality but comes with substantial privacy risks. Most of these limitations are minimal for virus genome analyses.

A common motivation to monitor the diversity of circulating viral strains is that some genetic variations may threaten treatment or long-term vaccine efficiencies (6). But genetic evolution can also be harnessed to track infections as they spread, informing public health decisions. Assuming a constant mutation rate, viruses originating from infections that are close in the transmission chain should be more alike, from a genetic standpoint, than viruses from infections that are temporally or geographically distant. Using sequence data, it is possible to infer phylogenetic trees, which bear many similarities with dated genealogies of infections.

The effect of superspreading events

Virus genome sequences are arranged in a phylogeny to track transmission over time. Each node corresponds to a transmission event (“birth”), and each tip corresponds to the end of an infection (“death”). Superspreading generates many secondary infections that cluster in the phylogenetic tree. In epidemics with superspreading, many infections do not cause any secondary case.

Phylogenetic analyses readily provide insights regarding epidemic structure, including the presence of superspreading events, but also temporal aspects such as the date of origin of an epidemic wave (7) (see the figure). The Ebola virus (EBOV) epidemic in West Africa in 2013–2016 marked a shift in the production and sharing of virus genomic sequence data (8). This allowed monitoring of a potentially important role of superspreading events (9), which was later confirmed with spatial incidence data (5). The large number of EBOV genome sequences that were published during that time now seems limited. Owing to technological progress, rapid and affordable full genome sequencing protocols (10), and the involvement of many teams across the world, more than 300,000 open-licensed SARS-CoV-2 genomes were shared in 2020 (11).

The impressive number of SARS-CoV-2 genomes should not mask the strong sampling heterogeneity. Half of the genomes are from the United Kingdom. Within the United States, more than half originate from only five states (California, Texas, Washington, Michigan, and New York). Timing also matters. For example, that the oldest SARS-CoV-2 genome dates from December 2019 limits our ability to accurately estimate the date of onset of the pandemic. The study by Lemieux et al. stands out because it uses dense and early sampling of the local epidemic in Boston, Massachussetts. This offers a detailed view of the structure and history of the epidemic in the city area. The study also confirms and analyzes two contrasting superspreading events. One of these occurred in a skilled nursing facility and had a substantial impact locally in terms of mortality. The second event was associated with a business conference. Thanks to their understanding of the local epidemics, the Global Initiative on Sharing All Influenza Data (GISAID) collaborative effort, and two key mutations in the virus genome, Lemieux et al. show that this event generated transmission chains all over the world that may account for hundreds of thousands of infections.

Phylogenetic analyses may interpret any type of heterogeneity not included in the underlying model, especially uneven sampling or spatial structure, as superspreading (7). A promising research avenue to address this issue is called “data integration.” It hypothesizes that combining different types of data—for example, genome sequences and incidence data (12)—should contain more information. This could improve the detection of individual heterogeneity in transmission (13). It could also help answer a longstanding question: Are superspreading events due to individual biological (such as virus load) or behavioral (such as contact rate) properties, or to the environment (such as poorly ventilated meeting rooms)? This is illustrated in two ways in the study by Lemieux et al. They use their knowledge of the sampling location to distinguish superspreading events from clusters generated by uneven sampling. Also, they show that two virus introductions in the skilled nursing facility had opposite trajectories, with one leading to a handful of secondary cases and the other to a massive outbreak.

The unfolding of an epidemic depends on the underlying causes of superspreading events. If transmissibility is correlated with susceptibility and if natural immunity after recovery is strong, the effect of superspreaders should decrease as the epidemic unfolds, decreasing herd immunity thresholds (14). In terms of control, identifying risk factors associated with superspreading events—whether they relate to individuals, activities, or locations—opens the possibility for targeted interventions that can disproportionately hamper epidemic spread (2, 15).

The evolutionary rate of SARS-CoV-2 has, so far, been relatively slow, making multiple sources of data particularly complementary. The ease with which virus genomes can now be generated and the importance of monitoring virus evolution are opportunities for phylodynamics to become a routine tool in outbreak management.

References and Notes

Acknowledgments: I thank CNRS, In stitut pour la Recherche et le Développement, and Région Occitanie for financial support and T. Kamiya, M. T. Sofonea, M. van Baalen, and the Experimental and Theoretical Evolution team for discussion.
View Abstract

Stay Connected to Science

Navigate This Article