Research Article

Genomics and epidemiology of the P.1 SARS-CoV-2 lineage in Manaus, Brazil

See allHide authors and affiliations

Science  14 Apr 2021:
DOI: 10.1126/science.abh2644


Cases of SARS-CoV-2 infection in Manaus, Brazil, resurged in late 2020, despite previously high levels of infection. Genome sequencing of viruses sampled in Manaus between November 2020 and January 2021 revealed the emergence and circulation of a novel SARS-CoV-2 variant of concern. Lineage P.1, acquired 17 mutations, including a trio in the spike protein (K417T, E484K and N501Y) associated with increased binding to the human ACE2 receptor. Molecular clock analysis shows that P.1 emergence occurred around mid-November 2020 and was preceded by a period of faster molecular evolution. Using a two-category dynamical model that integrates genomic and mortality data, we estimate that P.1 may be 1.7–2.4-fold more transmissible, and that previous (non-P.1) infection provides 54–79% of the protection against infection with P.1 that it provides against non-P.1 lineages. Enhanced global genomic surveillance of variants of concern, which may exhibit increased transmissibility and/or immune evasion, is critical to accelerate pandemic responsiveness.

Brazil has experienced high mortality during the COVID-19 pandemic, recording >300,000 deaths and >13 million reported cases, as of March 2020. SARS-CoV-2 infection and disease burden have been highly variable across the country, with Amazonas state in north Brazil being the worst-affected region (1). Serological surveillance of blood donors in Manaus, the capital city of Amazonas and the largest city in the Amazon region, has suggested >67% cumulative attack rates by October 2020 (2). Similar but slightly lower seroprevalences have also been reported for cities in neighboring regions (3, 4). However, the level of previous infection in Manaus was clearly not sufficient to prevent a rapid resurgence in SARS-CoV-2 transmission and mortality there during late 2020 and early 2021 (5), which has placed a significant pressure on the city’s healthcare system.

Here, we show that the second wave of infection in Manaus was associated with the emergence and rapid spread of a new SARS-CoV-2 lineage of concern, named lineage P.1. The lineage carries a unique constellation of mutations (table S1), including several that have been previously determined to be of virological importance (610) and which are located in the spike protein receptor binding domain (RBD), the region of the virus involved in recognition of the angiotensin-converting enzyme-2 cell surface receptor (11). Using genomic data, structure-based mapping of mutations of interest onto the spike protein, and dynamical epidemiology modelling of genomic and mortality data, we investigate the emergence of the P.1 lineage and explore epidemiological explanations for the resurgence of COVID-19 in Manaus.

Identification and nomenclature of a novel P.1 lineage in Manaus

In late 2020, two SARS-CoV-2 lineages of concern were discovered through genomic surveillance, both characterised by sets of significant mutations: lineage B.1.351, first reported in South Africa (12) and lineage B.1.1.7, detected in the United Kingdom (13). Both variants have transmitted rapidly in the countries where they were discovered and spread to other regions (14, 15). Analyses indicate B.1.1.7 has higher transmissibility and causes more severe illness compared with previously circulating lineages in the UK (1, 16, 17).

Following a rapid increase in hospitalizations in Manaus caused by severe acute respiratory infection in December 2020 (Fig. 1A), we focused ongoing SARS-CoV-2 genomic surveillance (2, 1822) on recently collected samples from the city (supplementary materials, materials and methods, and table S2). Prior to this, only seven SARS-CoV-2 genome sequences from Amazonas state were publicly available (SARS-CoV-2 was first detected in Manaus on 13 March 2020) (19, 23). We sequenced SARS-CoV-2 genomes from 184 samples from patients seeking COVID-19 testing in two diagnostic laboratories in Manaus between November and December 2020, using the ARTIC V3 multiplexed amplicon scheme (24) and the MinION sequencing platform. As partial genome sequences can provide useful epidemiological information, particularly regarding virus genetic diversity and lineage composition (25), we harnessed information from partial (n=41, 25-75% genome coverage), as well as near-complete (n=95, 75-95%) and complete (n=48, ≥95%) sequences from Manaus (figs. S1 to S4), together with other available and published genomes from Brazil for context. Viral lineages were classified using the Pangolin (26) software tool (, nextclade ( and standard phylogenetic analysis using complete reference genomes.

Fig. 1 SARS-CoV-2 epidemiological, diagnostic, genomic and mobility data from Manaus.

(A) Dark solid line shows the 7-day rolling average of the COVID-19 confirmed and suspected daily time series of hospitalisations in Manaus. Admissions in Manaus are from Fundação de Vigilância em Saúde do Amazonas (66). Green dots represent daily severe acute respiratory mortality records from the SIVEP-Gripe (Sistema de Informação de Vigilância Epidemiológica da Gripe) database (67). SARI = severe acute respiratory infections. Excess burial records based on data from Manaus Mayor’s office are shown in red dots for comparison (see Materials and Methods). The arrow denotes 6 December 2020, the date of the first P.1 case identified in Manaus by our study. (B) Maximum likelihood tree (n=962) with B.1.1.28, P.1 and P.2 sequences, with collapsed views of P.1 and P.2 clusters and highlighting other sequences from Manaus, Brazil). Ancestral branches leading to P.1 and P.2 are shown as dashed lines. See fig. S3 for a more detailed phylogeny. Scale bar is shown in units of nucleotide substitutions per site (s/s). (C) Number of air travel passengers from Manaus to all states in Brazil was obtained from National Civil Aviation Agency of Brazil ( The ISO 3166-2:BR codes of the states with genomic reports of P.1 (GISAID (68), as of 24 Feb 2021), are shown in bold. An updated list of GISAID genomes and reports of P.1 worldwide is available at (D) Number of genome sequences from Manaus belonging to lineages of interest (see Materials and Methods); spike mutations of interest are denoted.

Our early data indicated the presence of a novel SARS-CoV-2 lineage in Manaus containing 17 amino acid changes (including 10 in the spike protein), three deletions, four synonymous mutations and a four base-pair nucleotide insertion compared to the most closely related available sequence (GISAID ID: EPI_ISL_722052) (27) (Fig. 1B) (lineage-defining mutations can be found in table S1). This lineage was given a new designation, P.1, on the basis that (i) it is phylogenetically and genetically distinct from ancestral viruses, (ii) associated with rapid spread in a new area, and (iii) carries a constellation of mutations that may have phenotypic relevance (26). Phylogenetic analysis indicated that P.1, and another lineage, P.2 (19), were descendants of lineage B.1.1.28 that was first detected in Brazil in early March 2020 (Fig. 1B). Our preliminary results were shared with local teams on 10 Jan 2021 and published online on 12 Jan 2021 (27). Concurrently, cases of SARS-CoV-2 P.1 infection were reported in Japan in travellers from Amazonas (28). As of 24 Feb 2021, P.1 had been confirmed in 6 Brazilian states, which in total received >92,000 air passengers from Manaus in November 2020 (Fig. 1C). Genomic surveillance first detected lineage P.1 on 6 December 2020 (Fig. 1A), after which the frequency of P.1 relative to other lineages increased rapidly in the tested samples from Manaus (Fig. 1D; lineage frequency information can be found in fig. S5). Retrospective genome sequencing might be able to recover earlier P.1 genomes. Between the 2 Nov 2020 and 9 Jan 2021, we observed 7,137 SARI cases and 3,144 SARI deaths in Manaus (Fig. 1A). We generated a total of 182 SARS-CoV-2 sequences from Manaus during this period. This corresponds to 1 genome for each 39 SARI cases in Manaus, and this ratio is >100-fold higher compared to the average number of shared genomes per reported case during the same period in Brazil.

Dating the emergence of the P.1 lineage

We used molecular clock phylogenetics to understand the emergence and evolution of lineage P.1 (25). We first regressed root-to-tip genetic distances against sequence sampling dates (29) for the P.1, P.2, and B.1.1.28 lineages separately (figs. S6 to S8). This exploratory analysis revealed similar evolutionary rates within each lineage, but greater root-to-tip distances for P.1 compared to B.1.1.28 (fig. S8), suggesting that the emergence of P.1 was preceded by a period of faster molecular evolution. The B.1.1.7 lineage exhibits similar evolutionary characteristics (13), which was hypothesized to have occurred in a chronically infected or immunocompromised patient (30, 31).

To date the emergence of P.1, while accounting for a faster evolutionary rate along its ancestral branch, we used a local molecular clock model (32) with a flexible non-parametric demographic tree prior (33). Using this approach, we estimate the date of the common ancestor of the P.1 lineage to be around 15 Nov 2020 (median, 95% Bayesian credible interval, BCI, 6 Oct to 24 Nov 2020, mean, 9 Nov 2020) (fig. S9). This is only three to four weeks before the resurgence in SARS-CoV-2 confirmed cases in Manaus (Figs. 1A and 2 and fig. S9). The P.1 sequences formed a single well-supported group (posterior probability=1.00) that clustered most closely with B.1.1.28 sequences from Manaus (“AM” in Fig. 2), suggesting P.1 emerged there. The earliest P.1 samples were detected in Manaus (34). The first known travel-related cases were detected in Japan (28) and São Paulo (table S3) and were both linked to travel from Manaus. Furthermore, the local clock model statistically confirms a higher evolutionary rate for the branch immediately ancestral to lineage P.1 compared with lineage B.1.1.28 as a whole (Bayes factor, BF=6.04).

Fig. 2 Visualization of the time-calibrated maximum clade credibility tree reconstruction for B.1.1.28, P.1 and P.2 lineages (n = 962) in Brazil.

Terminal branches and tips of Amazonas state are colored in brown and those from other locations are colored in green. Nodes with posterior probabilities of <0.5 have been collapsed into polytomies and their range of divergence dates are illustrated as shaded expanses.

Our data indicates multiple introductions of the P.1 lineage from Amazonas to Brazil’s south-eastern states (Fig. 2). We also detected seven small well-supported clusters of P.2 sequences from Amazonas (2–6 sequences, posterior probability=1.00). Virus exchange between Amazonas state and the urban metropolises in southeast Brazil largely follow patterns of national air travel mobility (Fig. 1D and fig. S10).

Infection with P.1 and sample viral loads

We analyzed all SARS-CoV-2 RT-qPCR positive results from a laboratory providing testing in Manaus since May 2020 (Fig. 1A and data file S1) with the aim of exploring trends in sample RT-qPCR cycle threshold (Ct) values, which are inversely related to sample virus loads and transmissibility (35). By focusing on data from a single laboratory, we reduce instrument and process variation that can affect Ct measurements.

We analyzed a set of RT-qPCR positive cases for which virus genome sequencing and lineage classification had been undertaken (n = 147). Using a logistic function (Fig. 3A) we find that the fraction of samples classified as P.1 increased from 0% to 87% in around 7 weeks (table S4), quantifying the trend shown in Fig. 1C. We found a small but statistically significant association between P.1 infection and lower Ct values, for both the E gene (lognormal regression, p = 0.029, n = 128 samples, 65 of which were P.1) and N gene (p = 0.01, n = 129, 65 of which were P.1), with Ct values lowered by 1.43 (0.17-2.60 95% CI) and 1.91 (0.49-3.23) cycles in the P.1 lineage on average, respectively (Fig. 3B).

Fig. 3 Temporal variation in the proportion of sequenced genomes belonging to P.1, and trends in RT-qPCR Ct values for COVID-19 infections in Manaus.

(A) Logistic function fitting to the proportion of genomes in sequenced infections that have been classified as P.1 (black circles, size indicating number of infections sequenced), divided up into time-periods where the predicted proportion of infections that are due to P.1 is <1/3 (light brown), between 1/3 and 2/3 (green) and greater than 2/3 (grey). For the model fit, darker ribbon represents the 50% credible interval, and lighter ribbon represents the 95% credible interval. For the data points, grey thick line is the 50% exact Binomial confidence interval and the thinner line is the 95% exact Binomial confidence interval. (B) Ct values for genes E and N in a sample of symptomatic cases presenting for testing at a healthcare facility in Manaus (laboratory A), stratified according to the period defined in (A) in which the oropharyngeal and nasal swab collections occurred in. (C) Ct values for genes E and N in a subsample of 184 infections included in (B) that had their genomes sequenced (dataset A).

Using a larger sample of 942 Ct values (including an additional 795 samples for which no lineage information was available) we investigated Ct values across three time periods characterised by increasing P.1 relative abundance. Average Ct-values for both the E and N genes decline through time, as both case numbers and the fraction of P.1 infections increased, with Ct values significantly lower in period 3 compared to period 1 (Fig. 3C; E gene p = 0.12 and p<0.001 for comparison of time periods 2 and 3 to period 1; N gene p = 0.14 and p<0.001, respectively). Analyses of Ct values for samples from a different lab, also based in Manaus, showed similarly significant declines between the first and third time-periods defined here (p < 0.0001 for both E and N genes) (fig. S11 and data file S3).

However, population-level Ct distributions are sensitive to changes in the average time since infection when samples are taken, such that median Ct values can decrease during epidemic growth periods and increase during epidemic decline (36). To account for this effect, we assessed the association between P.1 infection and Ct levels while controlling for the delay between symptom onset and sample collection. Statistical significance was lost for both data sets (E gene p=0.15, n = 42, 22 of which were P.1; N gene p = 0.12, n = 42, 22 of which were P.1). Owing to this confounding factor we cannot distinguish if P.1 infection is associated with increased viral loads (37) or a longer duration of infection (38).

Mathematical modelling of lineage P.1 epidemiological characteristics

We next explored epidemiological scenarios that might explain the recent resurgence of transmission in Manaus (39). To do this, we extend a semi-mechanistic Bayesian model of SARS-CoV-2 transmissibility and mortality (4042) to include two categories of virus (“P.1” and “non-P.1”) and to account for infection severity, transmissibility and propensity for re-infection to vary between the categories. It also integrates information on the timing of P.1 emergence in Manaus using our molecular clock results (Fig. 2). The model explicitly incorporates waning of immune protection following infection, parameterized based on dynamics observed in recent studies (16, 43), to explore the competing hypothesis that waning of prior immunity might explain the observed resurgence (42). We use the model to evaluate the statistical support that P.1 possesses altered epidemiological characteristics compared to local non-P.1 lineages. Epidemiological model details and sensitivity analyses (tables S5 to S10) can be found in Supplementary Materials. The model is fitted to both COVID-19 mortality data (with a correction for systematic reporting delays (44, 45) and to the estimated increase through time in the proportion of infections due to P.1 derived from genomic data (table S4). We assume within-category immunity wanes over time (50% wane within a year, though sensitivity analyses varying the rapidity of waning are presented in table S7) and that cross-immunity (the degree to which previous infection with a virus belonging to one category protects against subsequent infection with the other) is symmetric between categories.

Our results suggest the epidemiological characteristics of P.1 are different to those of previously circulating local SARS-CoV-2 lineages, but also highlight substantial uncertainty in the extent and nature of this difference. Plausible values of transmissibility and cross-immunity exist in a limited area but are correlated (Fig. 4A, with the extent of immune evasion defined as 1 minus the inferred cross-immunity). This is expected, because in the model a higher degree of cross-immunity means that greater transmissibility of P.1 is required to generate a second epidemic. Within this plausible region of parameter space, P.1 can be between 1.7–2.4 (50% BCI, 2.0 median, with a 99% posterior probability of being >1) times more transmissible than local non-P1 lineages, and can evade 21–46% (50% BCI, 32% median, with a 95% posterior probability of being able to evade at least 10%) of protective immunity elicited by previous infection with non-P.1 lineages, corresponding to 54–79% (50% BCI, 68% median) cross-immunity (Fig. 4A). The joint-posterior distribution is inconsistent with a combination of highly increased transmissibility and low cross-immunity, and conversely, also with near-complete cross-immunity but only a small increase in transmissibility (Fig. 4A). Moreover, our results further show that natural immunity waning alone is unlikely to explain the observed dynamics in Manaus, with support for P.1 possessing altered epidemiological characteristics robust to a range of values assumed for the date of the lineage’s emergence and the rate of natural immunity waning (tables S5 and S7). We caution that these results are not generalisable to other settings; more detailed and direct data are needed to identify the exact degree and nature of the changes to the epidemiological characteristics of P.1 compared with previously circulating lineages.

Fig. 4 Estimates of the epidemiological characteristic of P.1 inferred from a multicategory Bayesian transmission model fitted to data from Manaus, Brazil.

(A) Joint posterior distribution of the cross-immunity and transmissibility increase inferred through fitting the model to mortality and genomic data. Grey contours refer to posterior density intervals ranging from the 95% and 50% isoclines. Marginal posterior distributions for each parameter shown along each axis. (B) As for (A) but showing the joint-posterior distribution of cross-immunity and the inferred relative risk of mortality in the period following emergence of P.1compared to the period prior. (C) Daily incidence of COVID-19 mortality. Points show severe acute respiratory mortality records from the SIVEP-Gripe database (67, 69), brown and green ribbons show model fit for COVID-19 mortality incidence, disaggregated by mortality attributable to non-P.1 lineages (brown) and the P.1 lineage (green). (D) Estimate of the proportion of P.1 infections through time in Manaus. Black data points with error bars are the empirical proportion observed in genomically sequenced cases (see Fig. 3A) and green ribbons (dark = 50% BCI, light = 95% BCI) the model fit to the data. (E) Estimated cumulative infection incidence for the P.1 and non-P.1 categories. Black data points with error bars are reversion-corrected estimates of seroprevalence from blood donors in Manaus (2), colored ribbons are the model predictions of cumulative infection incidence for non-P.1 lineages (brown) and P.1 lineages (green). These points are shown for reference only and were not used to fit the model. (F) Bayesian posterior estimates of trends in reproduction number Rt for the P.1 and non-P.1 categories.

We estimate that infections are 1.2–1.9 (50% BCI, median 1.5, 90% posterior probability of being >1) times more likely to result in mortality in the period following the emergence of P.1, compared to before, although posterior estimates of this relative risk are also correlated with inferred cross-immunity (Fig. 4B). More broadly, the recent epidemic in Manaus has strained the city’s healthcare system leading to inadequate access to medical care (46). We therefore cannot determine whether the estimated increase in relative mortality risk is due to P.1 infection, stresses on the Manaus healthcare system, or both. Detailed clinical investigations of P.1 infections are needed. We note that our model makes the assumption of a homogeneously mixed population, and therefore ignores heterogeneities in contact patterns (see for example fig. S13 for differences in private versus public hospitals). This is an important area for future research. The model fits observed time series data from Manaus on COVID-19 mortality (Fig. 4C), the relative frequency of P.1 infections (Fig. 4D) and also captures previously estimated trends in cumulative seropositivity in the city (Fig. 4E). We estimate the reproduction number (Rt) on 07 Feb 2021 to be 0.1 (median, 50% BCI: 0.04-0.2) for non-P.1 and 0.5 (median, 50% BCI: 0.4-0.6) for P.1 (Fig. 4F).

Characterisation and adaptation of a constellation of spike protein mutations

Lineage P.1 contains 10 lineage-defining amino acid mutations in the virus spike protein (L18F, T20N, P26S, D138Y, R190S, K417T, E484K, N501Y, H655Y, T1027I) compared with its immediate ancestor (B.1.1.28). In addition to the possible increase in the rate of molecular evolution during the emergence of P.1, we found using molecular selection analyses (47), evidence that eight of these 10 mutations are under diversifying positive selection (table S1 and fig. S14).

Three key mutations present in P.1, N501Y, K417T and E484K, are in the spike protein RBD. The former two interact with human angiotensin-converting enzyme 2 (hACE2) (11), while E484K is located in a loop region outside the direct hACE2 interface (fig. S14). Notably, the same three residues are mutated with the B.1.351 variant of concern, and N501Y is also present in the B.1.1.7 lineage. The independent emergence of the same constellation of mutations in geographically distinct lineages indicates a process of convergent molecular adaptation. Similar to SARS-CoV-1 (4850), mutations in the RBD may increase affinity of the virus for host ACE2 and consequently influence host cell entry and virus transmission. Recent molecular analysis of B.1.351 (51) indicates that the three P.1 RBD mutations may similarly enhance hACE2 engagement, providing a plausible hypothesis for an increase in transmissibility of the P.1 lineage. Moreover, E484K is associated with reduced antibody neutralisation (6, 9, 52, 53). RBD-presented epitopes account for ~90% of the neutralising activity of sera from individuals previously infected with SARS-CoV-2 (54), thus tighter binding of P.1 viruses to hACE2 may further reduce the effectiveness of neutralizing antibodies.


We show that P.1 most likely emerged in Manaus in mid-November, where high attack rates have been previously reported. High rates of mutation accumulation over short time periods have been reported in chronically infected or immunocompromised patients (13). Given a sustained generalised epidemic in Manaus we believe this is a potential scenario for P.1 emergence. Genomic surveillance and early data sharing by teams worldwide has led to the rapid detection and characterisation of SARS-CoV-2 and new variants of concern (25), yet such surveillance is still limited in many settings. The P.1 lineage is spreading rapidly across Brazil (55), and this lineage has now been detected in >36 countries (56). But existing virus genome sampling strategies are often inadequate for determining the true extent of VOCs in Brazil, and more detailed data are needed to address the impact of different epidemiological and evolutionary processes in their emergence. Sustainable genomic surveillance efforts to track variant frequency [e.g., (5759)] coupled with analytical tools to quantify lineage dynamics [e.g., (60, 61)] and anonymized epidemiological surveillance data (62, 63) could enable enhanced real-time surveillance of variants of concern worldwide. Studies to evaluate real-world vaccine efficacy in response to P.1 are urgently needed. We note that neutralisation titers represent only one component of the elicited response to vaccines, and that minimal reduction of neutralisation titers relative to earlier circulating strains is not uncommon. Until an equitable allocation and access to effective vaccines is available to all, non-pharmaceutical interventions should continue to play an important role in reducing the emergence of new variants.

Supplementary Materials

Materials and Methods

Supplementary Text

Figs. S1 to S16

Tables S1 to S10

References (70102)

MDAR Reproducibility Checklist

Data Files S1 to S6

References and Notes

Acknowledgments: We thank Lucy Matkin (University of Oxford), Marcio Oikawa (Universidade Federal do ABC) and Andre Acosta (University of Sao Paulo) for logistic support and Claudio Sachi (Instituto Adolfo Lutz) for agreeing with the use of unpublished sequence data available in GISAID before publication. We thank the anonymous reviewers for their considerations and suggestions. We thank the administrators of the GISAID database for supporting rapid and transparent sharing of genomic data during the COVID-19 pandemic. A full list acknowledging the authors publishing data used in this study can be found in Data S4. Funding: This work was supported by a Medical Research Council-São Paulo Research Foundation (FAPESP) CADDE partnership award (MR/S0195/1 and FAPESP 18/14389-0) (; FAPESP (E.C.S.: 18/14389-0; I.M.C: 2018/17176-8 and 2019/12000-1, F.C.S.S.: 2018/25468-9; J.G.J.: 2018/17176-8, 2019/12000-1, 18/14389-0; T.M.C.: 2019/07544-2; C.A.M.S.: 2019/21301-5; W.M.S.: 2017/13981-0, 2019/24251-9; L.M.S.: 2020/04272-9; M.C.P.: 2019/21568-1; V.H.N.: 2018/12579-7; C.A.P.J.: 2019/21858-0; and P.S.P.: 16/18445-7; J.L.P.-M.: 2020/04558-0); Wellcome Trust and Royal Society (N.R.F. Sir Henry Dale Fellowship: 204311/Z/16/Z); Wellcome Trust (Wellcome Centre for Human Genetics: 203141/Z/16/Z); Clarendon Fund and Department of Zoology, University of Oxford (D.S.C.); Medical Research Council (T.A.B, R.J.G.H: MR/S007555/1); European Molecular Biology Organisation (R.J.G.H.:ALTF 869-2019); CNPq (R.S.A.: 312688/2017-2, 439119/2018-9; W.M.S.: 408338/2018-0, 304714/2018-6; V.H.N.: 304714/2018-6); FAPERJ (R.S.A.: 202.922/2018). FFMUSP (M.S.R.: 206.706; C.A.P.J.); Imperial College Covid-19 Research Fund (H.S.; S.F.); CAPES (G.M.F.; C.A.P.J. Code 001); Wellcome Trust Collaborator Award (P.L., A.R., and N.J.L.: 206298/Z/17/Z); European Research Council (P.L and A.R.: 725422 -ReservoirDOCS); European Union’s Horizon 2020 project MOOD (P.L. and M.U.G.K.: 874850); US National Institutes of Health (M.A.S.: U19 AI135995); Oxford Martin School (O.G.P.); Branco Weiss Fellowship (M.U.G.K); Covid-19 Research Fund (S.F.); EPSRC (S.F.: EP/V002910/1; M.M. through the EPSRC Centre for Doctoral Training in Modern Statistics and Statistical Machine Learning); BMGF (S.B.), UKRI (S.B.), Novo Nordisk Foundation (S.B.); Academy of Medical Sciences (S.B.), BRC (S.B.) and MRC (S.B.); Bill & Mellinda Gates foundation (O.R.: OPP1175094). We acknowledge support from the Rede Corona-ômica BR MCTI/FINEP affiliated to RedeVírus/MCTI (FINEP 01.20.0029.000462/20, CNPq 404096/2020-4). FAPESP Proj. no. 2018/12579-7 CNPq Proj. no. 304714/2018-6 (V.N.) EPSRC Centre for Doctoral Training in Modern Statistics and Statistical Machine Learning at Imperial and Oxford (M.M.). Bill & Melinda Gates foundation (OPP1175094)(O.R.). This work received funding from the UK Medical Research Council under a concordat with the UK Department for International Development. We additionally acknowledge support from Community Jameel and the NIHR Health Protection Research Unit in Modelling Methodology. Last, we also gratefully acknowledge support from Oxford Nanopore Technologies for a donation of sequencing reagents and NVIDIA Corporation and Advanced Micro Devices, Inc., for a donation of parallel computing resources. Author contributions: Conceptualization: NRF, TAM, CW, ICM, DSC, AR, CD, OGP, SF, SB, ECS. Methodology: NRF, TAM, CW, TAM, ICM, DSC, SM, FCS, IH, MSR, JGJ, LAMF, PSA, TMC, CAMS, ERM, JTM, RHM, PSP, MUGK, RH, TB, OGP, MS, SP, OR, NF, NJL, PL, AR, CD, OGP, SF, SM, ECS. Investigation: NRF, TAM, CW, IMC, DSC, SM, MAEC, FCSS, IH, MSR, JGJ, LAMF, PSA, TMC, CAMS, ERM, JTM, RHMP, PSP, MUGK, RH, NG, WMS, LJTA, CCC, HH, GMF, ECR, LMS, MCP, FSVM, ABL, JPS, DAGZ, ACSF, RPS, DJL, PGTW, HS, ALPS, MSV, CCC, VSDC, RMFF, HMS, RSA,, BN, JH, MM, XM, HC, RS, AG, MS, TB, SP, CHW, OR, NMF, CPJ, VHN, NJL, PL, AR, NAF, MPSSC, CD, OGP, SF, SB, ECS. Visualization: NRF, TAM, CW, DSC, IMC, JT, AR, SP, TB, CW, SB. Funding acquisition: NRF, NL, AR, OGP, NF, SF, SB, ECS. Project administration: NRF, ECS. Supervision: NRF, OGP, AR, CD, NL, SB, ECS Writing – original draft: NRF, TM, CW, IMC, DSC, SF, SB, OGP, CD, ECS. Writing – review and editing: All authors. Competing interests: S.B. declares he advises on The Scientific Pandemic Influenza Group on Modelling (SPI-M) and advises the FCA on a legal matter regarding COVID-19 infections in England in March 2020. He is not paid for either of these advisory roles, and neither are related to the work in this paper. All other authors declare that they have no competing interests. Data and materials availability: All data, code, and materials used in the analysis are available in a dedicated GitHub Repository (64, 65). This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. To view a copy of this license, visit This license does not apply to figures/photos/artwork or other content included in the article that is credited to a third party; obtain authorization from the rights holder before using such material. Data and materials are available in (64, 65).

Stay Connected to Science

Navigate This Article