Research Article

Inferring the effectiveness of government interventions against COVID-19

See allHide authors and affiliations

Science  15 Dec 2020:
DOI: 10.1126/science.abd9338


Governments are attempting to control the COVID-19 pandemic with nonpharmaceutical interventions (NPIs). However, the effectiveness of different NPIs at reducing transmission is poorly understood. We gathered chronological data on the implementation of NPIs for several European, and other, countries between January and the end of May 2020. We estimate the effectiveness of NPIs, ranging from limiting gathering sizes, business closures, and closure of educational institutions to stay-at-home orders. To do so, we used a Bayesian hierarchical model that links NPI implementation dates to national case and death counts and supported the results with extensive empirical validation. Closing all educational institutions, limiting gatherings to 10 people or less, and closing face-to-face businesses each reduced transmission considerably. The additional effect of stay-at-home orders was comparatively small.

Worldwide, governments have mobilized resources to fight the COVID-19 pandemic. A wide range of nonpharmaceutical interventions (NPIs) has been deployed, including stay-at-home orders and the closure of all nonessential businesses. Recent analyses show that these large-scale NPIs were jointly effective at reducing the virus’ effective reproduction number (1), but it is still largely unknown how effective individual NPIs were. As more data become available, we can move beyond estimating the combined effect of a bundle of NPIs and begin to understand the effects of individual interventions. This can help governments efficiently control the epidemic, by focusing on the most effective NPIs to ease the burden put on the population.

A promising way to estimate NPI effectiveness is data-driven, cross-country modeling: inferring effectiveness by relating the NPIs implemented in different countries to the course of the epidemic in these countries. To disentangle the effects of individual NPIs, we need to leverage data from multiple countries with diverse sets of interventions in place. Previous data-driven studies (table S8) estimate effectiveness for individual countries (24) or NPIs, although some exceptions exist [(1, 58); summarized in table S7]. In contrast, we evaluated the impact of several NPIs on the epidemic’s growth in 34 European and seven non-European countries. If all countries implemented the same set of NPIs on the same day, the individual effect of each NPI would be unidentifiable. However, the COVID-19 response was far less coordinated: countries implemented different sets of NPIs, at different times, in different orders (Fig. 1).

Fig. 1 Timing of NPI implementations in early 2020.

Crossed-out symbols signify when an NPI was lifted. Detailed definitions of the NPIs are given in Table 1.

Even with diverse data from many countries, estimating NPI effects remains a challenging task. First, models are based on uncertain epidemiological parameters; our NPI effectiveness study incorporates some of this uncertainty directly in the model. Second, the data are retrospective and observational, meaning that unobserved factors could confound the results. Third, NPI effectiveness estimates can be highly sensitive to arbitrary modeling decisions, as shown by two recent replication studies (9, 10). Fourth, large-scale public NPI datasets suffer from frequent inconsistencies (11) and missing data (12). Hence, the data and the model must be carefully validated if they are to be used to guide policy decisions. We have collected a large public dataset on NPI implementation dates that has been validated by independent double entry, and extensively validated our effectiveness estimates. This is a crucial, but often absent or incomplete, element of COVID-19 NPI effectiveness studies (10).

Our results provide insight on the amount of COVID-19 transmission associated with various areas and activities of public life, such as gatherings of different sizes. Therefore, they may inform the packages of interventions that countries implement to control transmission in current and future waves of infections. However, we need to be careful when interpreting this study’s results. We only analyzed the effect NPIs had between January and the end of May 2020, and NPI effectiveness may change over time as circumstances change. Lifting an NPI does not imply that transmission will return to its original level and our window of analysis does not include relaxation of NPIs. These and other limitations are detailed in the Discussion section.

Cross-country NPI effectiveness modeling

We analyzed the effects of seven commonly used NPIs between the 22nd of January and the 30th of May 2020. All NPIs aimed to reduce the number of contacts within the population (Table 1). If a country lifted an NPI before the 30th of May, the window of analysis for that country terminates on the day of the lifting (see Methods). To ensure high data quality, all NPI data were independently entered by two of the authors (independent double entry) using primary sources, and then manually compared with several public datasets. Data on confirmed COVID-19 cases and deaths were taken from the Johns Hopkins CSSE COVID-19 Dataset (13). The data used in this study, including sources, are available online on GitHub (14).

Table 1 NPIs included in the study.

View this table:

We estimated the effectiveness of NPIs with a Bayesian hierarchical model. We used case and death data from each country to infer the number of new infections at each point in time, which is itself used to infer the (instantaneous) reproduction number Rt over time. NPI effects were then estimated by relating the daily reproduction numbers to the active NPIs, across all days and countries. This relatively simple, data-driven approach allowed us to sidestep assumptions about contact patterns and intensity, infectiousness of different age groups, and so forth, that are typically required in modeling studies. It also allowed us to directly model many sources of uncertainty, such as uncertain epidemiological parameters, differences in NPI effectiveness between countries, unknown changes in testing and infection fatality rates, and the effect of unobserved influences on Rt. The code is available online on GitHub (14).

Effectiveness of individual NPIs

Our model enabled us to estimate the individual effectiveness of each NPI, expressed as a percentage reduction in Rt. We quantified uncertainty with Bayesian prediction intervals, which are wider than standard credible intervals. These reflect differences in NPI effectiveness across countries among several other sources of uncertainty. Bayesian prediction intervals are analogous to the standard deviation of the effectiveness across countries, rather than the standard error of the mean effectiveness. Under the default model settings, the percentage reduction in Rt (with 95% prediction interval; Fig. 2) associated with each NPI was: limiting gatherings to 1000 people or less: 23% (0 to 40%); to 100 people or less: 34% (12 to 52%); to 10 people or less: 42% (17 to 60%); closing some high-risk face-to-face businesses: 18% (−8 to 40%); closing most nonessential face-to-face businesses: 27% (−3 to 49%); closing both schools and universities in conjunction: 38% (16 to 54%); and issuing stay-at-home orders (additional effect on top of all other NPIs): 13% (−5 to 31%). Note that we were not able to robustly disentangle the individual effects of closing schools and closing universities since these NPIs were implemented on the same day or in close succession in most countries [except Iceland and Sweden, where only universities were closed (see also fig. S21)]. We thus reported “schools and universities closed in conjunction” as one NPI.

Fig. 2 NPI effectiveness under default model settings.

Posterior percentage reductions in Rt with median, 50% and 95% prediction intervals shown. Prediction intervals reflect many sources of uncertainty, including NPI effectiveness varying by country and uncertainty in epidemiological parameters. A negative 1% reduction refers to a 1% increase in Rt. “Schools and universities closed” shows the joint effect of closing both schools and universities in conjunction; the individual effect of closing just one will be smaller (see text). Cumulative effects are shown for hierarchical NPIs (gathering bans and business closures) i.e., the result for “Most nonessential businesses closed” shows the cumulative effect of two NPIs with separate parameters and symbols—closing some (high-risk) businesses, and additionally closing most remaining (non-high-risk, but nonessential) businesses given that some businesses are already closed.

Some NPIs frequently co-occurred, i.e., were partly collinear. However, we were able to isolate the effects of individual NPIs since the collinearity was imperfect and our dataset large. For every pair of NPIs, we observed one without the other for 504 country-days on average (table S5). The minimum number of country-days for any NPI pair is 148 (for limiting gatherings to 1000 or 100 attendees). Additionally, under excessive collinearity, and insufficient data to overcome it, individual effectiveness estimates would be highly sensitive to variations in the data and model parameters (15). Indeed, high sensitivity prevented Flaxman et al. (1), who had a smaller dataset, from disentangling NPI effects (9). In contrast, our effectiveness estimates are substantially less sensitive (see below). Finally, the posterior correlations between the effectiveness estimates are weak, further suggesting manageable collinearity (fig. S22).

Effectiveness of NPI combinations

Although the correlations between the individual estimates were weak, we took them into account when evaluating combined NPI effectiveness. For example, if two NPIs frequently co-occurred, there may be more certainty about the combined effectiveness than about the effectiveness of each NPI individually. Figure 3 shows the combined effectiveness of the sets of NPIs that are most common in our data. In combination, the NPIs in this study reduced Rt by 77% (67 to 85%). Across countries, the mean Rt without any NPIs (i.e., the R0) was 3.3 (table S4). Starting from this number, the estimated Rt likely could have been brought below 1 by closing schools and universities, high-risk businesses, and limiting gathering sizes to at most 10. Readers can interactively explore the effects of sets of NPIs with our online mitigation calculator (16). A CSV file containing the joint effectiveness of all NPI combinations is available online (14).

Fig. 3 Combined NPI effectiveness for the 15 most commonly implemented sets of NPIs in our data.

Solid and shaded regions denote 50% and 95% Bayesian prediction intervals. (A) Predicted Rt after implementation of each set of NPIs, assuming R0 = 3.3. (B) Maximum R0 that can be reduced to Rt below 1 by common sets of NPIs. Readers can interactively explore the effects of all sets of NPIs, while setting R0 and adjusting NPI effectiveness to local circumstances, with our online mitigation calculator (16).

Sensitivity and validation

We performed a range of validation and sensitivity experiments (figs. S2 to S19). First, we analyzed how the model extrapolated to countries that did not contribute data for fitting the model, and found that it could generate calibrated forecasts for up to 2 months, with uncertainty increasing over time. Multiple sensitivity analyses showed how the results changed when we modified the priors over epidemiological parameters, excluded countries from the dataset, used only deaths or confirmed cases as observations, varied the data preprocessing, and more. Finally, we tested our key assumptions by showing results for several alternative models [structural sensitivity (10)] and examined possible confounding of our estimates by unobserved factors influencing Rt. In total, we considered NPI effectiveness under 206 alternative experimental conditions (Fig. 4A). Compared with the results obtained under our default settings (Figs. 2 and 3), median NPI effectiveness varied under alternative plausible experimental conditions. However, the trends in the results are robust, and some NPIs outperformed others under all tested conditions. While we tested large ranges of plausible values, our experiments did not include every possible source of uncertainty.

Fig. 4 Median NPI effectiveness across the sensitivity analyses.

(A) Median NPI effectiveness (reduction in Rt) when varying different components of the model or the data in 206 experimental conditions. Results are displayed as violin plots, using kernel density estimation to create the distributions. Inside the violins, the box plots show median and interquartile-range. The vertical lines mark 0%, 17.5%, and 35% (see text). (B to E) Categorized sensitivity analyses. (B) Sensitivity to model structure. Using only cases or only deaths as observations (2 experimental conditions; fig. S7); varying the model structure (3 conditions; fig. S8, left). (C) Sensitivity to data and preprocessing. Leaving out countries from the dataset (42 conditions; figs. S5 and S21); varying the threshold below which cases and deaths are masked (8 conditions; fig. S13); sensitivity to correcting for undocumented cases and to country-level differences in case ascertainment (2 conditions; fig. S6). (D) Sensitivity to epidemiological parameters. Jointly varying the means of the priors over the means of the generation interval, the infection-to-case-confirmation delay, and the infection-to-death delay (125 conditions; fig. S10); varying the prior over R0 (4 conditions; fig. S11); varying the prior over NPI effect parameters (3 conditions; fig. S11); varying the prior over the degree to which NPI effects vary across countries (3 conditions; fig. S12). (E) Sensitivity to unobserved factors influencing Rt. Excluding observed NPIs one at a time (8 conditions; fig. S9); controlling for additional NPIs from a different dataset (6 conditions; fig. S9).

We categorized NPI effects into small, moderate, and large, which we define as a posterior median reduction in Rt of less than 17.5%, between 17.5 and 35%, and more than 35% (vertical lines in Fig. 4). Four of the NPIs fell into the same category across a large fraction of experimental conditions: closing both schools and universities was associated with a large effect in 96% of experimental conditions, and limiting gatherings to 10 people or less had a large effect in 99% of conditions. Closing most nonessential businesses had a moderate effect in 98% of conditions. Issuing stay-at-home orders (i.e., in addition to the other NPIs) fell into the “small effect” category in 96% of experimental conditions. Three NPIs fell less clearly into one category: Limiting gatherings to 1000 people or less had a moderate-to-small effect (moderate in 81% of conditions) while limiting gatherings to 100 people or less had a moderate-to-large effect (moderate in 66% of conditions). Finally, closing some high-risk businesses, including bars, restaurants, and nightclubs had a moderate-to-small effect (moderate in 58% of conditions). Limiting gatherings to 1000 people or less was the NPI with the highest variation in median effectiveness across the experimental conditions (Fig. 4A), which may reflect the NPI’s partial collinearity with limiting gatherings to 100 people or less.

Aggregating all sensitivity analyses can hide sensitivity to specific assumptions. We display the median NPI effects in four categories of sensitivity analyses (Fig. 4, B to E), and each individual sensitivity analysis is shown in the supplementary materials. The trends in the results are also stable within these categories.


We used a data-driven approach to estimate the effects that seven nonpharmaceutical interventions had on COVID-19 transmission in 41 countries between January and the end of May 2020. We found that several NPIs were associated with a clear reduction in Rt, in line with mounting evidence that NPIs are effective at mitigating and suppressing outbreaks of COVID-19. Furthermore, our results indicate that some NPIs outperformed others. While the exact effectiveness estimates vary with modeling assumptions, the broad conclusions discussed below are largely robust across 206 experimental conditions in 11 sensitivity analyses.

Business closures and gathering bans both seem to have been effective at reducing COVID-19 transmission. Closing most nonessential face-to-face businesses was only somewhat more effective than targeted closures, which only affected businesses with high infection risk, such as bars, restaurants, and nightclubs (see also Table 1). Therefore, targeted business closures can be a promising policy option in some circumstances. Limiting gatherings to 10 people or less was more effective than limits of up to 100 or 1000 people and had a more robust effect estimate. Note that our estimates are derived from data between January and May 2020, a period when most gatherings were likely indoors due to weather.

Whenever countries in our dataset introduced stay-at-home orders, they essentially always also implemented, or already had in place, all other NPIs in this study. We accounted for these other NPIs separately and isolated the effect of ordering the population to stay at home, in addition to the effect of all other NPIs. In accordance with other studies that took this approach (2, 6), we found that issuing a stay-at-home order had a small effect when a country had already closed educational institutions, closed nonessential businesses, and banned gatherings. In contrast, Flaxman et al. (1) and Hsiang et al. (3) included the effect of several NPIs in the effectiveness of their stay-at-home order (or “lockdown”) NPIs and accordingly found a large effect for this NPI. Our finding suggests that some countries may have been able to reduce Rt to below 1 without a stay-at-home order (Fig. 3) by issuing other NPIs.

We found a large effect for closing schools and universities in conjunction, which was remarkably robust across different model structures, variations in the data, and epidemiological assumptions (Fig. 4). It remained robust when controlling for NPIs excluded from our study (fig. S9). Our approach cannot distinguish direct effects on transmission in schools and universities from indirect effects, such as the general population behaving more cautiously after school closures signaled the gravity of the pandemic. Additionally, since school and university closures were implemented on the same day, or in close succession in most of the countries we study, our approach cannot distinguish their individual effects (fig. S21). This limitation likely also holds for other observational studies that do not include data on university closures and estimate only the effect of school closures (13, 58). Furthermore, our study does not provide evidence on the effect of closing preschools and nurseries.

Previous evidence on the role of pupils and students in transmission is mixed. Although infected young people (aged ca. 12 to 25) are often asymptomatic, they appear to shed similar amounts of virus as older people (17, 18), and might therefore infect higher-risk individuals. Early data suggested that children and young adults had a notably lower observed incidence rate than older adults—whether this was due to school and university closures remains unknown (1922). In contrast, the recent resurgence of cases in European countries has been concentrated in the age group corresponding to secondary school and higher education (especially the latter), and is now spreading to older age groups as well as primary-school-aged children (23, 24). Primary schools may be generally less affected than secondary schools (20, 2528), perhaps partly because children under the age of 12 are less susceptible to SARS-CoV-2 (29).

Our study has several limitations. First, NPI effectiveness may depend on the context of implementation, such as the presence of other NPIs, country demographics, and specific implementation details. Our results thus need to be interpreted as the effectiveness in the contexts in which the NPI was implemented in our data (10). For example, in a country with a comparatively old population, the effectiveness of closing schools and universities would likely have been on the lower end of our prediction interval. Expert judgement should thus be used to adjust our estimates to local circumstances. Second, Rt may have been reduced by unobserved NPIs or voluntary behavior changes such as mask-wearing. To investigate whether the effect of these potential confounders could be falsely attributed to the observed NPIs, we performed several additional analyses and found that our results are stable to a range of unobserved factors (fig. S9). However, this sensitivity check cannot provide certainty and investigating the role of unobserved factors is an important topic to explore further. Third, our results cannot be used without qualification to predict the effect of lifting NPIs. For example, closing schools and universities in conjunction seems to have greatly reduced transmission, but this does not mean that reopening them will necessarily cause infections to soar. Educational institutions can implement safety measures such as reduced class sizes as they reopen. However, the nearly 40,000 confirmed cases associated with universities in the UK since they reopened in September 2020 show that educational institutions may still play a large role in transmission, despite safety measures (30). Fourth, we do not have data on some promising interventions, such as testing, tracing, and case isolation. These interventions could become an important part of a cost-effective epidemic response (31), but we did not include them because it is difficult to obtain comprehensive data on their implementation. In addition, although the data are more readily available, it is difficult to estimate the effect of mask-wearing in public spaces because there was limited public life as a result of other NPIs. We discuss further limitations in the supplementary text, section E.

Although our work focused on estimating the impact of NPIs on the reproduction number Rt, the ultimate goal of governments may be to reduce the incidence, prevalence, and excess mortality of COVID-19. For this, controlling Rt is essential, but the contribution of NPIs toward these goals may also be mediated by other factors, such as their duration and timing (32), periodicity and adherence (33, 34), and successful containment (35). While each of these factors addresses transmission within individual countries, it can be crucial to additionally synchronize NPIs between countries, since cases can be imported (36).

Many governments around the world seek to keep Rt below 1 while minimizing the social and economic costs of their interventions. Our work offers insights into which areas of public life are most in need of virus containment measures so that activities can continue as the pandemic develops; however, our estimates should not be taken as the final word on NPI effectiveness.

Materials and methods


We analyzed the effects of NPIs (Table 1) in 41 countries (37) (see Fig. 1). We recorded NPI implementations when the measures were implemented nationally or in most regions of a country (affecting at least three-fourths of the population). We only recorded mandatory restrictions, not recommendations. Supplementary text section G details how edge cases in the data collection were handled. For each country, the window of analysis starts on the 22nd of January and ends after the first lifting on an NPI, or on the 30th of May 2020, whichever was earlier. The reason to end the analysis after the first major reopening (38) was to avoid a distribution shift. For example, when schools reopened, it was often with safety measures, such as smaller class sizes and distancing rules. It is therefore expected that contact patterns in schools will have been different before school closure compared to after reopening. Modeling this difference explicitly is left for future work. Data on confirmed COVID-19 cases and deaths were taken from the Johns Hopkins CSSE COVID19 Dataset (13). The data used in this study, including sources, are available online on GitHub (14).

Data collection

We collected data on the start and end date of NPI implementations, from the start of the pandemic until the 30th of May 2020. Before collecting the data, we experimented with several public NPI datasets, finding that they were not complete enough for our modeling and contained incorrect dates (39). By focusing on a smaller set of countries and NPIs than these datasets, we were able to enforce strong quality controls: We used independent double entry and manually compared our data to public datasets for cross-checking.

First, two authors independently researched each country and entered the NPI data into separate spreadsheets. The researchers manually researched the dates using internet searches: there was no automatic component in the data gathering process. The average time spent researching each country per researcher was 1.5 hours.

Second, the researchers independently compared their entries to the following public datasets and, if there were conflicts, visited all primary sources to resolve the conflict: the EFGNPI database (40) and the Oxford COVID-19 Government Response Tracker (41).

Third, each country and NPI was again independently entered by one to three paid contractors, who were provided with a detailed description of the NPIs and asked to include primary sources with their data. A researcher then resolved any conflicts between this data and one (but not both) of the spreadsheets.

Finally, the two independent spreadsheets were combined and all conflicts resolved by a researcher. The final dataset contains primary sources (government websites and/or media articles) for each entry.

Data preprocessing

When the case count is small, a large fraction of cases may be imported from other countries and the testing regime may change rapidly. To prevent this from biasing our model, we neglected case numbers before a country has reached 100 confirmed cases and death numbers before a country has reached 10 deaths. We included these thresholds in our sensitivity analysis (fig. S13).

Short model description

In this section, we give a short summary of the model (Fig. 5). The detailed model description is given in the supplementary text section A. In short, our model uses case and death data from each country to “backward” infer the number of new infections at each point in time, which is itself used to infer the reproduction numbers. NPI effects are then estimated by relating the daily reproduction numbers to the active NPIs, across all days and countries. This relatively simple, data-driven approach allowed us to sidestep assumptions about contact patterns and intensity, infectiousness of different age groups, and so forth that are typically required in modeling studies. Code is available online on GitHub (14).

Fig. 5 Model overview.

Unshaded, white nodes are observed. We describe the diagram from bottom to top: The mean effect parameter of NPI i is αi, and the country-specific effect parameter is αi,c. On each day t, a country’s daily reproduction number Rt,c depends on the country’s basic reproduction number R0,c and the active NPIs. The active NPIs are encoded by xi,t,c, which is 1 if NPI i is active in country c at time t, and 0 otherwise. Rt,c is transformed into the daily growth rate gt,c using the generation interval parameters, and subsequently is used to compute the new infections Nt,c(C) and Nt,c(D) that will subsequently become confirmed cases and deaths, respectively. Finally, the expected number of daily confirmed cases yt,c(C) and deaths yt,c(D) are computed using discrete convolutions of Nt,c(.) with the relevant delay distributions. Our model uses both case and death data: it splits all nodes above the daily growth rate gt,c into separate branches for deaths and confirmed cases. We account for uncertainty in the generation interval, infection to case confirmation delay, and the infection to death delay by placing priors over the parameters of these distributions.

Our model builds on the semi-mechanistic Bayesian hierarchical model of Flaxman et al. (1), with several additions. First, we allow our model to observe both case and death data. This increases the amount of data from which we can extract NPI effects, reduces distinct biases in case and death reporting, and reduces the bias from including only countries with many deaths. Second, since epidemiological parameters are only known with uncertainty, we place priors over them, following recent recommended practice (42). Third, as we do not aim to infer the total number of COVID-19 infections, we can avoid assuming a specific infection fatality rate (IFR) or ascertainment rate (rate of testing). Fourth, we allow the effects of all NPIs to vary across countries, reflecting differences in NPI implementation and adherence.

We now describe the model by going through Fig. 5 from bottom to top. The growth of the epidemic is determined by the time- and country-specific reproduction number Rt,c, which depends on (i) the (unobserved) basic reproduction number in country c, R0,c, and (ii) the active NPIs at time t. R0,c accounts for all time-invariant factors that affect transmission in country c, such as differences in demographics, population density, culture, and health systems (43).

Following Flaxman et al. and others (1, 6, 8), each NPI is assumed to independently affect Rt,c as a multiplicative factorRt,c=R0,ci=1Iexp(αi,cxi,t,c)where xi,t,c = 1 indicates that NPI i is active in country c on day t (xi,t,c = 0 otherwise), I is the number of NPIs, and αi,c is the “effect parameter” for NPI i in country c. The multiplicative effect encodes the plausible assumption that NPIs have a smaller absolute effect when Rt,c is already low.

We assume that the effect of each NPI on Rt,c is stable across time but can vary across countries to some degree. Concretely, the effect parameter of intervention i in country c is defined as αi,c = αi + zi,c, where αi represents the mean effect parameter, and zi,cN(0,σi2). The variance σi2 corresponds to the degree of cross-country variation in the effectiveness of NPI i and is inferred from the data. This partial pooling of NPI effect parameters minimizes bias from country-specific sources while also reflecting that NPI effectiveness is likely different across countries. We define the “effectiveness” of NPI i as the percentage reduction in Rt associated with NPI i across countries. This effectiveness, displayed in Figs. 2 to 4, is computed as 1 – exp(–(αi + zi)), where again ziN(0,σi2) and σi2 is drawn from its posterior. We place an asymmetric Laplace prior on αi that allows for both positive and negative effects but places 80% of its probability mass on positive effects, reflecting that NPIs are more likely to reduce Rt,c than to increase it.

In the early phase of an epidemic, the number of new daily infections grows exponentially. During exponential growth, there is a one-to-one correspondence between the daily growth rate and Rt,c (44). The correspondence depends on the generation interval (the time between successive infections in a chain of transmission), which we assume to have a gamma distribution. The prior on the mean generation interval has a mean of 5.06 days, derived from a meta-analysis (45).

We model the daily new infection count separately for confirmed cases and deaths, representing those infections which are subsequently reported and those which are subsequently fatal. However, both infection numbers are assumed to grow at the same daily rate in expectation, allowing the use of both data sources to estimate each αi. The infection numbers translate into reported confirmed cases and deaths after a delay. The delay is the sum of two independent distributions, assumed to be equal across countries: the incubation period and the delay from onset of symptoms to confirmation. We put priors over the means of both distributions, resulting in a prior over the mean infection-to-confirmation delay with a mean of 10.92 days (45), see supplementary text section A.3. Similarly, the infection-to-death delay is the sum of the incubation period and the delay from onset of symptoms to death, and the prior over its mean has a mean of 21.8 days (45). Finally, as in related models (1, 6), both the reported cases and deaths follow a negative binomial output distribution with separate inferred dispersion parameters for cases and deaths.

Using a Markov chain Monte Carlo (MCMC) sampling algorithm (46), this model infers posterior distributions of each NPI’s effectiveness while accounting for cross-country variations in effectiveness, reporting, and fatality rates as well as uncertainty in the generation interval and delay distributions. To analyze the extent to which modeling assumptions affect the results, our sensitivity analysis included all epidemiological parameters, prior distributions, and many of the structural assumptions introduced above. MCMC convergence statistics are shown in fig. S19.

Supplementary Materials

Supplementary Text

Figs. S1 to S24

Tables S1 to S8

References (4779)

MDAR Reproducibility Checklist

This is an open-access article distributed under the terms of the Creative Commons Attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

References and Notes

  1. The countries were selected for the availability of reliable NPI data at the time when we started data collection and modelling (April 2020); and for their presence in at least one of the public datasets that we used to cross-validate our collected data. We excluded countries with fewer than 100 cases (or 10 deaths) by March 31, as our model neglects new cases and deaths below these thresholds. We also excluded a small number of countries if there were credible media reports casting doubt on the trustworthiness of their reporting of cases and deaths. Finally, we excluded very large countries like China, the United States, and Canada, for ease of data collection, as these would require more locally fine-grained data. Of the 41 included countries, 33 are in Europe. As a result, the NPI effectiveness estimates may be biased toward effects in Europe, and NPI effectiveness may have been different in other parts of the world.
  2. Concretely, the window of analysis extended until 2 days after the first reopening for confirmed cases, and 10 days after the first reopening for deaths. These durations correspond to the 5% quantiles of the infection-to-case-confirmation and infection-to-death distributions, ensuring that less than 5% of the new infections on the reopening day or later were observed in the window of analysis.
  3. We evaluated the following datasets: the Oxford COVID-19 Government Response Tracker (OxCGRT), the Epidemic Forecasting Global NPI Database, and the ACAPS #COVID19 Government Measures Dataset. Note that these datasets are under continuous development. Many of the mistakes found will already have been corrected. We know from our own experience that data collection can be very challenging. We have the fullest respect for the people behind these datasets. In this paper, we focus on a more limited set of countries and NPIs than these datasets contain, allowing us to ensure higher data quality in this subset. Given our experience with public datasets and our data collection, we encourage fellow COVID-19 researchers to independently verify the quality of public data they use, if feasible.
Acknowledgments: We thank J. Lagerros for operational support and for introducing some of the authors to each other. We thank M. Balatsko, M. Pukaj, and T. Witzany for developing the interactive website. We thank T. Groemer, G. Krönke, and M. Herrmann for advice and mentorship. Funding: J.M.B. was supported by the EPSRC Centre for Doctoral Training in Autonomous Intelligent Machines and Systems (EP/S024050/1) and by Cancer Research UK. S.M.’s funding for graduate studies was from Oxford University and DeepMind. M.S. was supported by the EPSRC Centre for Doctoral Training in Autonomous Intelligent Machines and Systems (EP/S024050/1). G.L. was supported by the UKRI Centre for Doctoral Training in Interactive Artificial Intelligence (EP/S022937/1). V.M. contributed in his personal time while employed at DeepMind. L.C. acknowledges funding from the MRC Centre for Global Infectious Disease Analysis (reference MR/R015600/1), jointly funded by the UK Medical Research Council (MRC) and the UK Foreign, Commonwealth & Development Office (FCDO), under the MRC/FCDO Concordat agreement and is also part of the EDCTP2 program supported by the European Union; and acknowledges funding by Community Jameel. Y.W.T. is also a principal research scientist at DeepMind. The paid contractor work helping with the data collection, the development of the interactive website, and the costs for cloud compute were funded by the Berkeley Existential Risk Initiative. Author contributions: D.J., J.M.B., J.K., G.A., A.J.N., J.T.M., G.L., and V.M. designed and conducted the NPI data collection. S.M., M.S., J.M.B., A.B.S., H.G., Y.W.T., Y.G., J.K., T.G., J.S., V.M., M.A.H., and L.C. designed the model and modeling experiments. M.S., A.B.S., T.G., and J.S. performed and analyzed the modeling experiments. J.M.B., S.M., M.S., J.K., and T.G. conceived the research. S.M., M.S., J.M.B., L.C., J.K., and T.B. did the literature search. J.M.B., S.M., M.S., G.L., L.C., T.B., and V.M. wrote the manuscript. All authors read and gave feedback on the manuscript and approved the final manuscript. J.M.B., S.M., and M.S. contributed equally. L.C., Y.G., and J.K. contributed equally to senior authorship. Competing interests: No conflicts of interests. L.C. has acted as a paid consultant to Pfizer and the Foundation for Innovative New Diagnostics, outside of the submitted work. Y.G. has received a research grant (studentship) from GlaxoSmithKline, outside of the submitted work. J.K. has advised several governmental and nongovernmental entities about interventions against COVID-19. Data and materials availability: All data and code are available in the paper or publicly online at (14). This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. To view a copy of this license, visit This license does not apply to figures/photos/artwork or other content included in the article that is credited to a third party; obtain authorization from the rights holder before using such material.
View Abstract

Stay Connected to Science


Navigate This Article