Report

Cryptic transmission of SARS-CoV-2 in Washington state

See allHide authors and affiliations

Science  10 Sep 2020:
eabc0523
DOI: 10.1126/science.abc0523

Abstract

Following its emergence in Wuhan, China, in late November or early December 2019, the SARS-CoV-2 virus has rapidly spread globally. Genome sequencing of SARS-CoV-2 allows reconstruction of its transmission history, although this is contingent on sampling. We have analyzed 453 SARS-CoV-2 genomes collected between 20 February and 15 March 2020 from infected patients in Washington State, USA. We find that most SARS-CoV-2 infections sampled during this time derive from a single introduction in late January or early February 2020 which subsequently spread locally before active community surveillance was implemented.

The novel coronavirus, referred to alternately as SARS-CoV-2 (1) or hCoV-19 (2), emerged in Wuhan, Hubei, China, in late November or early December 2019 (3). As of 18 May 2020, there have been >4 million confirmed cases of COVID-19, the disease caused by SARS-CoV-2, that have resulted in >300,000 deaths (4). After its initial emergence in China, travel-associated cases with travel histories related to Wuhan appeared in other parts of the world (5). The first confirmed case in the United States was travel-associated and was detected in Snohomish County, Washington State, on 19 January 2020. Until 27 February 2020, the US Centers for Disease Control and Prevention (CDC) guidance recommended prioritizing testing on persons with direct travel history from an affected area or exposure to a known case. Cases of respiratory disease with no known risk factors were not routinely tested. In the 6 weeks between 19 January and 27 February, 59 confirmed cases were reported in the United States (6), all outside of Washington State and with either direct travel history or exposure to a known confirmed case. On 28 February 2020, a community case was identified in Snohomish County (7). One month later, on 25 March, as a result of increased testing and ongoing transmission, Washington State reported 2580 confirmed cases and 132 deaths (8). Here we report on the putative history of early community transmission in Washington State as revealed by genomic epidemiology. We concluded that SARS-CoV-2 was circulating undetected for several weeks by the surveillance apparatus in Washington State from late January–early February 2020.

Although publicly available SARS-CoV-2 genomes (9, 10) are not sampled in strict proportion to the burden of infections through time and across geography, their genetic relationships can still shed light on underlying patterns of spread. SARS-CoV-2 genomes sampled between December 2019 and 15 March 2020 appear closely related with between 0 and 12 mutations relative to a common ancestor estimated to exist in Wuhan between late November and early December 2019 (Fig. 1). This pattern is consistent with a reported rate of molecular evolution of ~0.8 × 10−3 substitutions per site per year or ~2 substitutions per genome per month (3). After its initial zoonotic emergence in Wuhan (11), SARS-CoV-2 viral genomes began to accumulate substitutions and spread from Wuhan to other regions in the world (3). During December 2019, the Wuhan outbreak was too small to seed many introductions outside of China, but by January 2020, it had grown large enough to begin seeding cases elsewhere (12).

Fig. 1 Maximum-likelihood phylogeny of 455 SARS-CoV-2 viruses collected from Washington State (red circles) on a background of 493 globally collected viruses.

Tips and branches are colored based on location, branch lengths are proportional to number of mutations along a branch and the x-axis is labeled with the number of substitutions relative to the root of the phylogeny, here equivalent to basal Wuhan outbreak viruses. Clustering of related viruses indicates community transmission after an introduction event. Branch locations are estimated based on a discrete traits model. We observe a single introduction leading to a large outbreak clade of 384 sampled viruses from Washington State (marked by the larger arrow). We observe a second introduction leading to a smaller outbreak clade of 39 viruses (marked by the smaller arrow). An interactive version of this figure is available at https://nextstrain.org/community/blab/ncov-cryptic-transmission/introductions.

Sequencing of viruses from the Washington State outbreak began on 28 February 2020 and has continued since then. We analyzed the sequences of 455 SARS-CoV-2 viruses from this outbreak collected between 19 January and 15 March 2020 (Fig. 1). Virus sequences from Washington State are closely related to viruses collected elsewhere. Clusters of closely related viruses indicate separate introduction events followed by local spread. The majority (n = 384, 84%) of these viruses fall into a closely related clade (marked by a larger arrow in Fig. 1), and possess single nucleotide polymorphisms (SNPs) C8782T, C17747T, A17858G, C18060T and T28144C relative to the basal virus at the root of the phylogeny, equivalent to the reference virus Wuhan/Hu-1/2019. This clade derives from viruses circulating in China (Fig. 1, in blue) and is closely related to viruses sampled in British Columbia (Fig. 1, in orange) and is labeled as Pangolin lineage A.1 (13). From here forward, we refer to this clade as the “Washington State outbreak clade”. Other viruses (n = 39, 9%) fall into a separate smaller clade (marked by a smaller arrow in Fig. 1) and derive from viruses circulating in Europe. The remaining 33 viruses (7%) from Washington State are distributed across the phylogeny. Thus, we conclude that most early cases descend from a single introduction event followed by local amplification.

The Washington State outbreak clade has a highly “comb-like” structure (Fig. 2A) indicative of rapid exponential growth (14). This clade has a C17747T change relative to viruses sampled in British Columbia, and A17858G change relative to viruses sampled in Fujian, Chongqing, Hangzhou and Guangdong. Given limited and non-representative sampling of viruses for sequencing, along with the rate of molecular evolution, it is difficult to make detailed assessments for geographic origins. However, we can be confident that this clade represents an introduction from China followed by local spread within the US and Canada. British Columbia may have been the entry point, or the location at which the first virus was sampled.

Fig. 2 Maximum-likelihood phylogeny of the Washington State outbreak clade and immediately ancestral variants containing 448 SARS-CoV-2 viruses (A) and Bayesian estimates of outbreak ancestor and doubling time (B).

(A) Tips are colored based on location, branch lengths are proportional to the number of mutations between viruses and the x-axis is labeled with the number of substitutions relative to the root of the phylogeny, here equivalent to the “WA1” haplotype. This comb-like phylogenetic structure of the Washington State outbreak clade is consistent with rapid exponential growth of the virus population. An interactive version of this figure is available at https://nextstrain.org/community/blab/ncov-cryptic-transmission/wa-clade. (B) Highest posterior density estimates for the date of the common ancestor of viruses from the Washington State outbreak clade as well as the doubling time in days of the growth of this clade.

We analyzed the Washington State outbreak clade in a coalescent analysis to estimate evolutionary dynamics. Here, we assume a prior on evolutionary rate based on analysis of viruses sampled globally between December 2019 and July 2020 (see materials and methods). This analysis uses the degree and pattern of genetic diversity of sampled genomes to estimate the date of a common ancestor and exponential growth rate of the virus population. We obtain a median estimate for the date of the clade’s common ancestor of 2 February 2020 with a 95% Bayesian credible interval of 22 January to 10 February 2020 (Fig. 2B). We note that the initiation of a transmission chain may slightly predate the common ancestor belonging to this chain in sampled viruses, as initial transmission events following introduction may not result in branching of the transmission tree. We calculated a rate of exponential growth from the coalescent analysis for this clade, finding a median doubling time of 3.4 days with a 95% Bayesian credible interval of between 2.6 and 4.6 days (Fig. 2B).

In addition to the 384 viruses from Washington State identified in the Washington State outbreak clade, we observed 12 viruses from elsewhere, including from California, Connecticut, Minnesota, New York, North Carolina, Virginia, Utah, Australia and the Grand Princess cruise ship (Fig. 2A). Viruses from outside Washington State nest within the diversity found within Washington State. In the case of the Grand Princess, the genetic relationship among these viruses is consistent with a single introduction onto the cruise ship of the basal outbreak variant, possessing C17747T and A17858G, and subsequent transmission and evolution on the ship.

The first confirmed case recorded in the United States was a travel-associated case from an individual returning from Wuhan on 15 January 2020 who presented for care at an outpatient clinic in Snohomish County on 19 January 2020 and tested positive (15). This infection is recorded as strain USA/WA1/2020 (referred to here as WA1 and annotated in Fig. 2A) and appears closely related to viruses from infections in China (Fujian, Hangzhou and Guangdong provinces). Viruses from the Washington State outbreak clade group together as direct descendants of WA1 and its identical relatives (Fig. 2A). This tree structure is consistent with the WA1 strain transmitting locally after arrival into the United States. The rarity of the C8782T, T28144C, C18060T mutations, characteristic of WA1, in viruses sampled from China (found in 6/224 or 3% of sequenced viruses) indicates this is a parsimonious explanation for the origin of the Washington State outbreak clade. However, because the evolution rate for SARS-CoV-2 (one mutation per ~15 days) is slower than the transmission rate (one transmission event every 4-8 days) (16, 17) it is possible that WA1 sits on a side branch of the underlying transmission tree even if it appears as a direct ancestor in the maximum likelihood tree. Indeed, that viruses sampled from British Columbia interdigitate between WA1 and the Washington State outbreak clade indicates that this clade may have been introduced into North America by a closely related, but distinct, infection to WA1 (Fig. 2A). Additionally, it remains possible, although significantly less likely than a single introduction, that multiple viruses possessing the basal Washington State outbreak clade genotype were introduced resulting in the local amplification of this clade.

Given that community transmission was first detected on 28 February 2020 from a transmission chain originating between 22 January and 10 February 2020, we sought to address community prevalence during this period. Here, we analyzed 10,382 acute respiratory specimens collected as part of the Seattle Flu Study between 1 January and 15 March 2020 (Fig. 3A). These represented a mix of residual samples collected as part of routine clinical testing along with samples collected as part of prospective community enrollment of individuals with acute respiratory illness. 5270 samples collected between 1 January and 20 February tested negative. The first positive sample was collected on 21 February (Fig. 3B). From 21 February to 15 March, of 5112 samples collected, 65 samples tested positive. On 1 March a sequential Monte Carlo procedure estimated the proportion of acute respiratory specimens positive for SARS-CoV-2 as 1.1% with a 95% credible interval of 0.5% to 2.0% (Fig. 3C). It is challenging to directly convert this into population prevalence of SARS-CoV-2; however, US HealthWeather data shows a 4.5% prevalence of influenza-like illness on 1 March (18) from which we estimated a 0.05% population prevalence of SARS-CoV-2.

Fig. 3 Acute respiratory samples tested for SARS-CoV-2 collected as part of the Seattle Flu Study between 1 January and 15 March.

(A) Total samples tested per day. 10,382 samples collected between 1 January and 15 March were tested. (B) Number of samples testing positive per day. (C) Estimated proportion positive using a sequential Monte Carlo model to provide day-to-day smoothing. The solid red line is the mean estimate of proportion positive and the gray interval is the 95% credible interval. All dates are dates of sample collection, not date of testing.

In January and February 2020, screening for SARS-CoV-2 in the United States was directed at travelers with fever, cough and shortness of breath, with point of origin broadening as new outbreaks were identified, but specifying travel to China up until 24 February 2020 (19, 20). Our analysis indicates that at least one clade of SARS-CoV-2 had been circulating in the Seattle area for 3–6 weeks by the time the virus was first detected in a non-traveler on 28 Feb 2020. By then, variants within this clade constituted the majority of confirmed infections in the region (384 of 455; 84%). Several factors could have contributed to the delayed detection of presumptive community spread, including limited testing among non-travelers or the presence of asymptomatic or mild illnesses.

Both the WA1 strain sampled in Snohomish County, Washington, on 19 January as well as viruses sampled from British Columbia in early March appear to be phylogenetically ancestral to viruses from the Washington State outbreak clade (Fig. 2A), suggesting a possible route of introduction. However, in both these cases, a lack of comprehensive geographic sampling makes it difficult to rely on phylogenetic structure for transmission inference. Viruses sampled from British Columbia may derive from local spread after a direct introduction event or may be offshoots of an introduction elsewhere and subsequent spread to British Columbia. Refining the time and geographic origin of the introduction into Washington State will require a combination of earlier samples and samples from other geographic locations. Other states in the US have shown different genetic histories from Washington State, with a majority of SARS-CoV-2 sequences from New York (21) and Connecticut (22) clustering within European lineages, indicating repeated introductions from Europe. Indeed, we observe a second cluster of Washington State viruses related to a later introduction from Europe.

Our results highlight the critical need for widespread surveillance for community transmission of SARS-CoV-2 throughout the United States and the rest of the world even after the current pandemic is brought under control. The broad spectrum of disease severity (23) makes surveillance challenging (24). The combination of traditional public health surveillance and genomic epidemiology can provide actionable insights, as happened in this instance: upon sequencing the initial community case on 29 February 2020, results were immediately shared with national, state and local public health, resulting in rapid rollout of social distancing policies as Seattle and Washington State came to grips with the extent of existing COVID-19 spread. The confirmation of local transmission in Seattle prompted a change in testing criteria to emphasize individuals with no travel history. From 29 February onwards, new genomic data was immediately posted to the GISAID EpiCoV sequence database (9, 10) and analyzed alongside other public SARS-CoV-2 genomes via the Nextstrain online platform (25) to provide immediate and public situational awareness. We see the combination of community surveillance, genomic analysis and public real-time sharing of results as empowering new systems for infectious disease surveillance.

Supplementary Materials

science.sciencemag.org/cgi/content/full/science.abc0523/DC1

Materials and Methods

Fig. S1

Seattle Flu Study Investigators List

References (2740)

MDAR Reproducibility Checklist

Data S1

https://creativecommons.org/licenses/by/4.0/

This is an open-access article distributed under the terms of the Creative Commons Attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

References and Notes

Acknowledgments: We gratefully acknowledge the authors, originating and submitting laboratories of the sequences from GISAID’s EpiFlu Database on which this research is based. A full Acknowledgments table is available as supplementary materials. We have tried our best to avoid any direct analysis of genomic data not submitted as part of this paper and use this genomic data as background. We particularly thank Harrigan, Prystajecky, Krajden, Lee, Kamelian, Lapointe, Choi, Hoang, Sekirov, Levett, Tyson, Snutch, Loman, Quick, Li, Gilmour who shared virus genomes from British Columbia collected by the BCCDC Public Health Laboratory. We thank Niket Thakkar, Joe Felsenstein and Chris Spitters for helpful input and discussion. Funding: The Seattle Flu Study is run through the Brotman Baty Institute for Precision Medicine and funded by Gates Ventures, the private office of Bill Gates. The funder was not involved in the design of the study and does not have any ownership over the management and conduct of the study, the data, or the rights to publish. JS is an Investigator of the Howard Hughes Medical Institute. TB is a Pew Biomedical Scholar and is supported by NIH R35 GM119774-01. EBH and RAN are supported by University of Basel core funding. Sequencing analyses of SARS-CoV-2 genomes from California was supported by an NIH grant R33-AI129455 and the Charles and Helen Schwab Foundation to CYC, and an NIH grant K08-CA230156 and the Burroughs-Wellcome CAMS Award to WG. Author contributions: M-LH, AN, GP, AR, HX, LS, TNN, ALG, PR, GSB, KRJ generated sequence and diagnostic data from UW Virology samples. AA, EB, SC, DG, PDH, KF, CDF, MI, KL, JL, AK, MR, TRS, MT, CRW, DAN, MJR, JAE, TB, LMS, MF, HYC, JS collected SFS specimens and generated sequence and diagnostic data. TB, JH, EBH, JH, LHM, NFM, RAN wrote bioinformatic analysis software and performed phylogenetic analyses. XD, WG, SF, CC generated sequence data from UCSF samples. RG, GM, BH, PD, SL collected WA DOH specimens and generated diagnostic data. KQ, YT, AU, ST, DM, GLA generated sequence data for the WA1 specimen. TB, ALG, PR, LMS, MF, JD, GLA, HYC, JS, KRJ interpreted the data and wrote the paper. Competing interests: Janet A. Englund is a consultant for Sanofi Pasteur and Meissa Vaccines, Inc., and receives research support from GlaxoSmithKline, AstraZeneca, and Novavax. Helen Chu is a consultant for Merck and GlaxoSmithKline. Jay Shendure is a consultant with Guardant Health, Maze Therapeutics, Camp4 Therapeutics, Nanostring, Phase Genomics, Adaptive Biotechnologies, and Stratos Genomics, and has a research collaboration with Illumina. Michael Famulare, Lea Starita, Pavitra Roychoudhury, Amanda Adler, Peter Han, Kirsten Lacombe, Elisabeth Brandstetter, Caitlin R. Wolf, Richard A Neher, James Hadfield, Nicola F. Müller, Jover Lee, Thomas Sibley, Kairsten Fay, Deborah A. Nickerson, Mark J. Rieder, and Trevor Bedford declare no competing interests. Data and materials availability: Sequencing and analysis of samples from the Seattle Flu Study was approved by the institutional review board at the University of Washington (protocol STUDY00006181). Informed consent was obtained for all community participant samples and survey data. Informed consent for residual sample and clinical data collection was waived. For UW Virology Lab, use of residual clinical specimens was approved by the institutional review board at the University of Washington (protocol STUDY00000408) with a waiver of informed consent. This manuscript represents the opinions of the authors and does not necessarily reflect the position of the U.S. Centers for Disease Control and Prevention. Data and code associated with this work are available at https://github.com/blab/ncov-cryptic-transmission (26). SARS-CoV-2 consensus genome sequences associated with this work have been uploaded to the GISAID EpiFlu database and accession numbers are available in the supplementary materials. Sequencing reads have been deposited to NCBI SRA (Bioproject PRJNA610428). This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0/. This license does not apply to figures/photos/artwork or other content included in the article that is credited to a third party; obtain authorization from the rights holder before using such material.
View Abstract

Stay Connected to Science

Subjects

Navigate This Article