Cryptic transmission of SARS-CoV-2 in Washington state

See allHide authors and affiliations

Science  30 Oct 2020:
Vol. 370, Issue 6516, pp. 571-575
DOI: 10.1126/science.abc0523

A series of unfortunate events

The history of how severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spread around the planet has been far from clear. Several narratives have been propagated by social media and, in some cases, national policies were forged in response. Now that many thousands of virus sequences are available, two studies analyzed some of the key early events in the spread of SARS-CoV-2. Bedford et al. found that the virus arrived in Washington state in late January or early February. The viral genome from the first case detected had mutations similar to those found in Chinese samples and rapidly spread and dominated subsequent undetected community transmission. The other viruses detected had origins in Europe. Worobey et al. found that early introductions into Germany and the west coast of the United States were extinguished by vigorous public health efforts, but these successes were largely unrecognized. Unfortunately, several major travel events occurred in February, including repatriations from China, with lax public health follow-up. Serial, independent introductions triggered the major outbreaks in the United States and Europe that still hold us in the grip of control measures.

Science, this issue p. 571, p. 564


After its emergence in Wuhan, China, in late November or early December 2019, the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus rapidly spread globally. Genome sequencing of SARS-CoV-2 allows the reconstruction of its transmission history, although this is contingent on sampling. We analyzed 453 SARS-CoV-2 genomes collected between 20 February and 15 March 2020 from infected patients in Washington state in the United States. We find that most SARS-CoV-2 infections sampled during this time derive from a single introduction in late January or early February 2020, which subsequently spread locally before active community surveillance was implemented.

The novel coronavirus, referred to alternately as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) (1) or human coronavirus 2019 (hCoV-19) (2), emerged in Wuhan, Hubei, China, in late November or early December 2019 (3). As of 18 May 2020, there have been >4 million confirmed cases of coronavirus disease 2019 (COVID-19)—the disease caused by SARS-CoV-2—that have resulted in >300,000 deaths (4). After its initial emergence in China, travel-associated cases with travel histories related to Wuhan appeared in other parts of the world (5). The first confirmed case in the United States was travel associated and was detected in Snohomish County, Washington state, on 19 January 2020. Until 27 February 2020, the U.S. Centers for Disease Control and Prevention (CDC) guidance recommended prioritizing testing for COVID-19 on persons with direct travel history from an affected area or with exposure to a known case. Cases of respiratory disease with no known risk factors were not routinely tested. In the 6 weeks between 19 January and 27 February, 59 confirmed cases were reported in the United States (6), all outside of Washington state and with either direct travel history or exposure to a known, confirmed case. On 28 February 2020, a community case was identified in Snohomish County (7). One month later, on 25 March, as a result of increased testing and ongoing transmission, Washington state reported 2580 confirmed cases and 132 deaths (8). Here, we report on the putative history of early community transmission in Washington state as revealed by genomic epidemiology. We conclude that SARS-CoV-2 was circulating for several weeks undetected by the surveillance apparatus in Washington state from late January to early February 2020.

Although publicly available SARS-CoV-2 genomes (9, 10) are not sampled in strict proportion to the burden of infections through time and across geography, their genetic relationships can still shed light on underlying patterns of spread. SARS-CoV-2 genomes sampled between December 2019 and 15 March 2020 appear to be closely related, with between 0 and 12 mutations relative to a common ancestor estimated to exist in Wuhan between late November and early December 2019 (Fig. 1). This pattern is consistent with a reported rate of molecular evolution of ~0.8 × 10−3 substitutions per site per year or approximately two substitutions per genome per month (3). After its initial zoonotic emergence in Wuhan (11), SARS-CoV-2 viral genomes began to accumulate substitutions and spread from Wuhan to other regions in the world (3). During December 2019, the Wuhan outbreak was too small to seed many introductions outside of China, but by January 2020, it had grown large enough to begin seeding cases elsewhere (12).

Fig. 1 Maximum-likelihood phylogeny of 455 SARS-CoV-2 viruses collected from Washington state on a background of 493 globally collected viruses.

Viruses collected from Washington state are shown as red circles. Tips and branches are colored on the basis of location, branch lengths are proportional to the number of mutations along a branch, and the x axis is labeled with the number of substitutions relative to the root of the phylogeny—here equivalent to basal Wuhan outbreak viruses. The clustering of related viruses indicates community transmission after an introduction event. Branch locations are estimated on the basis of a discrete traits model. We observe a single introduction leading to a large outbreak clade of 384 sampled viruses from Washington state (marked by the larger arrow), and we observe a second introduction leading to a smaller outbreak clade of 39 viruses (marked by the smaller arrow). An interactive version of this figure is available at

Sequencing of viruses from the Washington state outbreak began on 28 February 2020 and has continued since then. We analyzed the sequences of 455 SARS-CoV-2 viruses from this outbreak collected between 19 January and 15 March 2020 (Fig. 1). Virus sequences from Washington state are closely related to those from viruses collected elsewhere. Clusters of closely related viruses indicate separate introduction events followed by local spread. The majority (n = 384; 84%) of these viruses fall into a closely related clade (marked by the larger arrow in Fig. 1), and these viruses have single-nucleotide polymorphisms (SNPs) C8782T, C17747T, A17858G, C18060T, and T28144C relative to the basal virus at the root of the phylogeny, which is equivalent to the reference virus Wuhan/Hu-1/2019. This clade derives from viruses circulating in China (Fig. 1, in blue), is closely related to viruses sampled in British Columbia (Fig. 1, in orange), and is labeled as Pangolin lineage A.1 (13). Going forward, we refer to this clade as the Washington state outbreak clade. Other viruses (n = 39; 9%) fall into a separate, smaller clade (marked by the smaller arrow in Fig. 1) and derive from viruses circulating in Europe. The remaining 33 viruses (7%) from Washington state are distributed across the phylogeny. Thus, we conclude that most early cases descend from a single introduction event followed by local amplification.

The Washington state outbreak clade has a highly comb-like structure (Fig. 2A), which is indicative of rapid exponential growth (14). This clade has a C17747T change relative to viruses sampled in British Columbia and a A17858G change relative to viruses sampled in Fujian, Chongqing, Hangzhou, and Guangdong. Given the limited and nonrepresentative sampling of viruses for sequencing, along with the rate of molecular evolution, it is difficult to make detailed assessments of geographic origins. However, we can be confident that this clade represents an introduction from China followed by local spread within the United States and Canada. British Columbia may have been the entry point or the location at which the first virus was sampled.

Fig. 2 Maximum-likelihood phylogeny of the Washington state outbreak clade and immediately ancestral variants containing 448 SARS-CoV-2 viruses and Bayesian estimates of the date of the outbreak common ancestor and outbreak doubling time.

(A) Maximum-likelihood phylogeny. Tips are colored on the basis of location, branch lengths are proportional to the number of mutations between viruses, and the x axis is labeled with the number of substitutions relative to the root of the phylogeny—here equivalent to the WA1 haplotype. This comb-like phylogenetic structure of the Washington state outbreak clade is consistent with rapid exponential growth of the virus population. An interactive version of this figure is available at (B) Highest posterior density estimates for the date of the common ancestor of viruses from the Washington state outbreak clade (top) as well as the doubling time in days of the growth of this clade (bottom).

We analyzed the Washington state outbreak clade in a coalescent analysis to estimate evolutionary dynamics. Here, we assume a prior on evolutionary rate based on analysis of viruses sampled globally between December 2019 and July 2020 (see materials and methods). This analysis uses the degree and pattern of genetic diversity of sampled genomes to estimate the date of a common ancestor and the exponential growth rate of the virus population. We obtained a median estimate for the date of the clade’s common ancestor of 2 February 2020, with a 95% Bayesian credible interval of 22 January to 10 February 2020 (Fig. 2B). We note that the initiation of a transmission chain may slightly predate the common ancestor belonging to this chain in sampled viruses, as initial transmission events after introduction may not result in branching of the transmission tree. We calculated a rate of exponential growth from the coalescent analysis for this clade and found a median doubling time of 3.4 days, with a 95% Bayesian credible interval of 2.6 to 4.6 days (Fig. 2B).

In addition to the 384 viruses from Washington state identified in the Washington state outbreak clade, we observed 12 viruses from elsewhere, including from California, Connecticut, Minnesota, New York, North Carolina, Virginia, Utah, Australia, and the Grand Princess cruise ship (Fig. 2A). Viruses from outside Washington state nest within the diversity found in Washington state. In the case of the Grand Princess, the genetic relationship among these viruses is consistent with a single introduction onto the cruise ship of the basal outbreak variant—having C17747T and A17858G changes—and subsequent transmission and evolution on the ship.

The first confirmed case recorded in the United States was a travel-associated case from an individual returning from Wuhan on 15 January 2020, who presented for care at an outpatient clinic in Snohomish County on 19 January 2020 and tested positive (15). This infection is recorded as strain USA/WA1/2020 (referred to here as WA1 and annotated in Fig. 2A), and it appears to be closely related to viruses from infections in China (Fujian, Hangzhou, and Guangdong provinces). Viruses from the Washington state outbreak clade group together as direct descendants of WA1 and its identical relatives (Fig. 2A). This tree structure is consistent with the WA1 strain transmitting locally after arrival into the United States. The rarity of the C8782T, T28144C, and C18060T mutations—characteristic of WA1—in viruses sampled from China (found in 6 of 224 or 3% of sequenced viruses) indicates that this is a parsimonious explanation for the origin of the Washington state outbreak clade. However, because the evolution rate for SARS-CoV-2 (one mutation per ~15 days) is slower than the transmission rate (one transmission event every 4 to 8 days) (16, 17), it is possible that WA1 sits on a side branch of the underlying transmission tree even if it appears as a direct ancestor in the maximum-likelihood tree. The fact that viruses sampled from British Columbia interdigitate between WA1 and the Washington state outbreak clade indicates that this clade may have been introduced into North America by a closely related infection to—but one distinct from—WA1 (Fig. 2A). Additionally, it remains possible that multiple viruses with the basal Washington state outbreak clade genotype were introduced, which resulted in the local amplification of this clade; however, this is markedly less likely than a single introduction of the virus.

Given that community transmission was first detected on 28 February 2020 from a transmission chain originating between 22 January and 10 February 2020, we sought to address community prevalence during this period. Here, we analyzed 10,382 acute respiratory specimens collected as part of the Seattle Flu Study between 1 January and 15 March 2020 (Fig. 3A). These specimens represented a mix of residual samples collected as part of routine clinical testing and samples collected as part of prospective community enrollment of individuals with acute respiratory illness. In total, 5270 samples collected between 1 January and 20 February tested negative. The first positive sample was collected on 21 February (Fig. 3B). From 21 February to 15 March, of 5112 samples collected, 65 samples tested positive. On 1 March, a sequential Monte Carlo procedure estimated the proportion of acute respiratory specimens positive for SARS-CoV-2 as 1.1% with a 95% credible interval of 0.5 to 2.0% (Fig. 3C). It is challenging to directly convert this value into population prevalence of SARS-CoV-2; however, U.S. Health Weather data show a 4.5% prevalence of influenza-like illness on 1 March (18), from which we estimated a 0.05% population prevalence of SARS-CoV-2.

Fig. 3 Acute respiratory samples tested for SARS-CoV-2 collected as part of the Seattle Flu Study between 1 January and 15 March 2020.

(A) Total samples tested per day. In total, 10,382 samples collected between 1 January and 15 March were tested. (B) Number of samples testing positive per day. (C) Estimated proportion positive using a sequential Monte Carlo model to provide day-to-day smoothing. The solid red line is the mean estimate of proportion positive, and the gray shaded region is the 95% credible interval. All dates are those of sample collection, not dates of testing.

In January and February 2020, screening for SARS-CoV-2 in the United States was directed at travelers with fever, cough, and shortness of breath, with the point of origin broadening as new outbreaks were identified but continuing to solely specify travel to China up until 24 February 2020 (19, 20). Our analysis indicates that at least one clade of SARS-CoV-2 had been circulating in the Seattle area for 3 to 6 weeks by the time the virus was first detected in a nontraveler on 28 Feb 2020. By then, variants within this clade constituted the majority of confirmed infections in the region (384 of 455; 84%). Several factors could have contributed to the delayed detection of presumptive community spread, including limited testing among nontravelers or the presence of asymptomatic or mild illnesses.

Both the WA1 strain sampled in Snohomish County, Washington, on 19 January as well as viruses sampled from British Columbia in early March appear to be phylogenetically ancestral to viruses from the Washington state outbreak clade (Fig. 2A), which suggests a possible route of introduction. However, in both of these cases, a lack of comprehensive geographic sampling makes it difficult to rely on phylogenetic structure for transmission inference. Viruses sampled from British Columbia may derive from local spread after a direct introduction event, or they may be offshoots of an introduction elsewhere that subsequently spread to British Columbia. Refining the time and geographic origin of the introduction into Washington state will require a combination of earlier samples and samples from other geographic locations. Other states in the United States have shown different genetic histories from that seen in Washington state, with most SARS-CoV-2 sequences from New York (21) and Connecticut (22) clustering with European lineages, which indicates repeated introductions from Europe. We also observed a second cluster of Washington state viruses related to a later introduction from Europe.

Our results highlight the critical need for widespread surveillance for community transmission of SARS-CoV-2 throughout the United States and the rest of the world, even after the current pandemic is brought under control. The broad spectrum of disease severity (23) makes such surveillance challenging (24). The combination of traditional public health surveillance and genomic epidemiology can provide actionable insights, as happened in this instance: Upon sequencing the initial community case on 29 February 2020, results were immediately shared with national, state, and local public health agencies, which resulted in the rapid rollout of social distancing policies as Seattle and Washington state came to grips with the extent of existing COVID-19 spread. The confirmation of local transmission in Seattle prompted a change in testing criteria to emphasize individuals with no travel history. From 29 February onward, genomic data were immediately posted to the GISAID EpiCoV sequence database (9, 10) and analyzed alongside other public SARS-CoV-2 genomes by means of the Nextstrain online platform (25) to provide immediate and public situational awareness. We see the combination of community surveillance, genomic analysis, and public real-time sharing of results as a pathway to empower infectious disease surveillance systems.

Supplementary Materials

Materials and Methods

Fig. S1

Seattle Flu Study Investigators List

References (2740)

MDAR Reproducibility Checklist

Data S1

This is an open-access article distributed under the terms of the Creative Commons Attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

References and Notes

Acknowledgments: We gratefully acknowledge the authors and the originating and submitting laboratories of the sequences from GISAID’s EpiFlu Database, on which this research is based. A full acknowledgments table is available as supplementary materials. We have tried our best to avoid any direct analysis of genomic data not submitted as part of this paper and use these genomic data as background. We particularly thank R. Harrigan, N. Prystajecky, M. Krajden, G. Lee, K. Kamelian, H. Lapointe, J. Choi, L. Hoang, I. Sekirov, P. Levett, J. Tyson, T. Snutch, N. Loman, J. Quick, K. Li, and J. Gilmour, who shared virus genomes from British Columbia collected by the British Columbia Centre for Disease Control Public Health Laboratory. We thank N. Thakkar, J. Felsenstein, and C. Spitters for helpful input and discussion. Funding: The Seattle Flu Study is run through the Brotman Baty Institute for Precision Medicine and funded by Gates Ventures, the private office of Bill Gates. The funder was not involved in the design of the study and does not have any ownership over the management and conduct of the study, the data, or the rights to publish. J.S. is an Investigator of the Howard Hughes Medical Institute. T.B. is a Pew Biomedical Scholar and is supported by NIH R35 GM119774-01. E.B.H. and R.A.N. are supported by University of Basel core funding. Sequencing analyses of SARS-CoV-2 genomes from California were supported by an NIH grant R33-AI129455 and the Charles and Helen Schwab Foundation to C.C. and by an NIH grant K08-CA230156 and the Burroughs-Wellcome CAMS Award to W.G. Author contributions: M.-L.H., A.N., G.P., A.R., H.X., L.S., T.N.N., A.L.G., P.R., G.S.B., and K.R.J. generated sequence and diagnostic data from University of Washington Virology samples. A.A., E.B., S.C., D.G., P.D.H., K.F., C.D.F., M.I., K.L., J.L., A.K., M.R., T.R.S., M.T., C.R.W., D.A.N., M.J.R., J.A.E., T.B., L.M.S., M.F., H.Y.C., and J.S. collected Seattle Flu Study specimens and generated sequence and diagnostic data. T.B., J.Ha., E.B.H., J.Hu., L.H.M., N.F.M., and R.A.N. wrote bioinformatic analysis software and performed phylogenetic analyses. X.D., W.G., S.F., and C.C. generated sequence data from University of California San Francisco samples. R.G., G.M., B.H., P.D., and S.L. collected Washington State Department of Health specimens and generated diagnostic data. K.Q., Y.T., A.U., S.T., D.M., and G.L.A. generated sequence data for the WA1 specimen. T.B., A.L.G., P.R., L.M.S., M.F., J.S.D., G.L.A., H.Y.C., J.S., and K.R.J. interpreted the data and wrote the paper. Competing interests: J.A.E. is a consultant for Sanofi Pasteur and Meissa Vaccines, Inc., and she receives research support from GlaxoSmithKline, AstraZeneca, and Novavax. H.Y.C. is a consultant for Merck and GlaxoSmithKline. J.S. is a consultant with Guardant Health, Maze Therapeutics, Camp4 Therapeutics, Nanostring, Phase Genomics, Adaptive Biotechnologies, and Stratos Genomics, and he has a research collaboration with Illumina. G.S.B. is a consultant for Avalon Healthcare Solutions. All other authors declare no competing interests. Data and materials availability: Sequencing and analysis of samples from the Seattle Flu Study was approved by the institutional review board at the University of Washington (protocol STUDY00006181). Informed consent was obtained for all community participant samples and survey data. Informed consent for residual sample and clinical data collection was waived. For the University of Washington Virology Laboratory, use of residual clinical specimens was approved by the institutional review board at the University of Washington (protocol STUDY00000408), with a waiver of informed consent. This manuscript represents the opinions of the authors and does not necessarily reflect the position of the U.S. Centers for Disease Control and Prevention. Data and code associated with this work are available at (26). SARS-CoV-2 consensus genome sequences associated with this work have been uploaded to the GISAID EpiFlu database, and accession numbers are available in the supplementary materials. Sequencing reads have been deposited to NCBI SRA (Bioproject PRJNA610428). This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. To view a copy of this license, visit This license does not apply to figures/photos/artwork or other content included in the article that is credited to a third party; obtain authorization from the rights holder before using such material.

Stay Connected to Science

Navigate This Article