## Abstract

As COVID-19 is rapidly spreading across the globe, short-term modeling forecasts provide time-critical information for decisions on containment and mitigation strategies. A major challenge for short-term forecasts is the assessment of key epidemiological parameters and how they change when first interventions show an effect. By combining an established epidemiological model with Bayesian inference, we analyze the time dependence of the effective growth rate of new infections. Focusing on COVID-19 spread in Germany, we detect change points in the effective growth rate that correlate well with the times of publicly announced interventions. Thereby, we can quantify the effect of interventions, and we can incorporate the corresponding change points into forecasts of future scenarios and case numbers. Our code is freely available and can be readily adapted to any country or region.

During the initial outbreak of an epidemic, reliable short-term forecasts are key to estimate medical requirements and capacities, and to inform and advise the public and decision makers (*1*). During this initial phase, three tasks are important to provide time-critical information for crisis mitigation: (i) establishing central epidemiological parameters, such as the basic reproduction number, that can be used for short-term forecasting; (ii) simulating the effects of different possible interventions aimed at the mitigation of the outbreak; (iii) estimating the actual effects of the measures taken not only to make rapid adjustments but also to adapt short-term forecasts. Tackling these tasks is challenging due to the large statistical and systematic errors that occur during the initial stages of an epidemic when case numbers are low. This is further complicated by the fact that mitigation measures are taken rapidly while the outbreak unfolds, but they take effect only after an unknown delay. To obtain reasonable parameter estimates for short-term forecasting and policy evaluation despite these complications, any prior knowledge available needs to be integrated into modeling efforts to reduce uncertainties. This includes knowledge about basic mechanisms of disease transmission, recovery, as well as preliminary estimates of epidemiological parameters from other countries, or from closely related pathogens. The integration of prior knowledge, the quantitative assessment of the remaining uncertainties about epidemiological parameters, and the principled propagation of these uncertainties into forecasts is the domain of Bayesian modeling and inference (*2*, *3*).

Here, we draw on an established class of models for epidemic outbreaks: The Susceptible-Infected-Recovered (SIR) model (*4*–*7*) specifies the rates that population compartments change over time, i.e., when susceptible people become infectious, and when infectious people recover. This simple model can be formulated in terms of coupled ordinary differential equations (in mean field), which enable analytical treatment (*8*, *9*) or fast evaluation (ideally suited for Bayesian inference). Accordingly, SIR-like models have been used to model epidemic spreads, from Bayesian Markov Chain Monte Carlo (MCMC) parameter estimation (*10*–*12*) to detailed scenario discussions (*13*–*16*). This family of models has played a dominant role in the analyses of the coronavirus (SARS-CoV-2) pandemic, from inference (*17*–*19*) through scenario forecasting (*20*–*27*) to control strategies (*28*, *29*).

We combine the SIR model (and generalizations thereof) with Bayesian parameter inference and augment the model by a time-dependent spreading rate. The time dependence is implemented via potential change points that reflect changes in the spreading rate driven by governmental interventions. Based on three distinct measures taken in Germany, we detect three corresponding change points in the reported COVID-19 case numbers. by April 1, 2020 we reported evidence for the first two change points, and predicted the third (*30*). Now, with data until April 21, we have evidence for all three change points. First, the spreading rate decreased from 0.43 (with 95% credible interval, CI

Our framework is designed to infer the effectiveness of past measures and to explore potential future scenarios along with propagating the respective uncertainties. In the following, we demonstrate the potential impact of timing and magnitude of change points, and report our inference about the three past governmental interventions in Germany. Our framework can be readily adapted to any other country or region. The code (already including data sources from many other countries), as well as the figures are all available on Github (*31*).

## Basic inference of central epidemiological parameters during the initial phase of the outbreak in Germany

To assess the general effect of different possible interventions on the spread of COVID-19 in Germany, we first focus on the initial phase of the outbreak when no serious mitigation measures were implemented. In the absence of interventions, an epidemic outbreak can be described by SIR models with constant spreading rate (Methods). In Germany, the first serious interventions occurred around March 9, 2020 and affected the case reports after an observation delay (a combination of incubation period with a median 5–6 days (*32*)) and a test delay (time until doctor is visited plus test-evaluation time that we assumed to be both about 2–3 days). Hence, to infer central epidemiological parameters, we considered the initial phase to be March 2 to March 15. The central epidemiological parameters estimated here will also be estimated under the full model with change points on the data records up to April 21, allowing for a consistency check.

We perform Bayesian inference for the central epidemiological parameters of an SIR model using Markov-Chain Monte Carlo (MCMC) sampling (Fig. 1). The central parameters are the spreading rate λ, the recovery rate μ, the reporting delay D and the number of initially infected people

As median estimates, we obtain for the spreading rate *18*, *33*, *34*). Overall, the model shows good agreement with both new cases (Fig. 1 A) and cumulative cases (Fig. 1 B), which show the expected exponential growth (linear in log-lin plot). The observed data are clearly informative about λ,

To quantify the impact of possible interventions, we concentrate on the effective growth of active infections before and after the intervention. As long as the number of infections and recoveries are small compared to the population size, the number of active infections per day can be approximated by an exponential growth (Fig. 1, A and B) with effective growth rate *35*).

## Magnitude and timing of interventions matter for the mitigation of the outbreak

To simulate the effect of different possible interventions, we model the effects of interventions as change points in the spreading rate (Methods). We consider different, hypothetical interventions following the initial phase to show that both, the amount of change in behavior (leading to a change in spreading rate λ; Fig. 2A) and the exact timing of the change (Fig. 2B) determine future development. Hypothetical interventions build on the inferred parameters from the initial phase (in particular median

(i) No social distancing: Public behavior is unaltered and spread continues with the inferred rate (

(ii) Mild social distancing: The spreading rate decreases to

(iii) Strong social distancing: Here, the spreading rate decreases to

In this, the timing of an intervention matters: Apart from the strength of an intervention, its onset time has great impact on the total case number (Fig. 2, B and C). For example, focusing on the strong intervention (iii), by which a stable plateau is reached, the effect of advancing or delaying the change point by just five days leads to more than a three-fold difference in cumulative cases.

While we find that the timing of an intervention has a great effect on case numbers, the duration over which the change takes place has only minor effect if the intervals of change are centered around the same date. In Fig. 2C we illustrate the adjustment of

## Change point detection for the spread of COVID-19 in Germany

To model real-world data, we further refined the SIR model. Most importantly, we account for systematic variations of case reports throughout the week, in particular lower case numbers on the weekend, by explicitly modeling a weekly reporting modulation (see Methods). Indeed, comparisons confirm that models with this correction outperform those without (see table S2). In the supplemental material, we further generalize our model to include an explicit incubation period (as in SEIR models; fig. S3) that yields results consistent with our main model.

We incorporate the effect of governmental interventions into our models by introducing flexible change points in the spreading rate (see Methods). During the COVID-19 outbreak in Germany, governmental interventions occurred in three stages from (i) the cancellation of large events with more than 1000 participants (around March 9), through (ii) closing of schools, childcare facilities and the majority of stores (in effect March 16), to (iii) the contact ban and closing of all non-essential stores (in effect March 23). The aim of all these interventions was to reduce the (effective) growth rate

Detecting change points in the spreading rate and quantifying the amount of change as quickly as possible becomes a central modeling challenge when short-term forecasts are required. To address this challenge, we assume an initial spreading rate

The inferred parameters for the models with change points are consistent with the inferred parameters from the exponential onset phase (Figs. 1 and 3 and figs. S1 and S2). In particular, all estimated

## Models with two or three change points fit the observed data better

The models with three change points describe the data better than models with fewer change points, as indicated by the leave-one-out (LOO) cross-validation-based Bayesian model comparison (*36*) (lowest LOO-score in Table 1). However, the LOO-scores of the model with two and three change points differ by less than one standard error. This originates from an extended duration of the second change point in the two-change-point model, which partially captures the effect of the third intervention. As expected, the models with none or a single change point have LOO-scores that are at least one standard deviation higher (worse) than those of the best models, and we will not consider them further.

When comparing our inference based on three change points to the number of confirmed cases, we find them to largely match (Fig. 3, B and C). The dominant periodic change in the daily new reported cases (Fig. 3B) is well described by the weekday modulation. In addition to the periodic change, the daily new case numbers also reflect the fairly sudden change of the spreading rate at the change points (cf. Fig. 2 and fig. S4 for the effect of change points without the modulation). Most importantly, the cumulative effect of change points manifests in an overarching decay in new case numbers that is visible after April 5 and follows the third change point (with reporting delay).

## Change point detection quantifies the effect of governmental interventions

Ideally, detected changes can be related to specific mitigation measures, so that one gains insights into the effectiveness of different measures (Fig. 3). Indeed, we found clear evidence for three change points in the posterior distributions of the model parameters: First,

## Discussion

We presented a Bayesian approach for monitoring of the effect of non-pharmaceutical governmental interventions on the epidemic spread of an infectious respiratory disease. Using the example of the COVID-19 outbreak in Germany, we applied this approach to infer the central epidemiological parameters and three change points in the spreading rate from the number of reported cases. We showed that change points in the spreading rate affect the confirmed case numbers with a delay of about two weeks (median reporting delay of

Our results indicate that the full extent of interventions was necessary to stop exponential growth. The first two interventions brought a reduction of the growth rate

We used a formal Bayesian model comparison to validate the presence of change points. Our model comparison ruled out models with fewer than two change points (Table 1 and table S2). While this may seem trivial, it has important consequences for making short-term forecasts that decision makers rely on. Demonstrating and quantifying the effect of past change points can be used to formulate priors for the effects of future, similar change points. These priors help to project the effects of more recent change points into future forecasts, even when these change points are not yet apparent in the reported case numbers. Consequently, it is important to look out for and identify potential change points as early as possible to incorporate them into forecasts.

The detection of change points and their interpretation depend crucially on an accurate estimate of the reporting delay D. Therefore, the validity of its estimate should be evaluated. In our model, D contains at least three distinct factors: the biological incubation period (median 5–6 days) (*32*), an additional delay from first symptoms to symptoms motivating a test (1–3 days) and a possible delay before a testing results come in (1–4 days). The sum of these delays seems compatible with our inferred median delay of

We chose to keep our main model comparatively simple, because of the small number of data points initially available during an epidemic outbreak. With few data points, only a limited number of parameters can be effectively constrained. Hence, we chose to approximate a time-dependent spreading rate

Our framework can be easily adapted to other countries and enables incorporation of future developments. For other countries, or for forecasts within smaller communities (e.g., federal states or cities), additional details may become important, such as explicit modeling of incubation time distributions (*17*, *37*) (i.e., as in fig. S3), spatial heterogeneity (*17*, *21*), isolation effects (*20*, *37*), subsampling effects hiding undetected cases even beyond the reporting delay (*38*, *39*), or the age and contact structure of the population (*26*). In countries where major changes in test coverage are expected, this will have to be included as well. The methodology presented here is capable in principle of incorporating such details. It also lends itself to modeling of continuous drifts in the spreading rate, e.g., reflecting reactions of the public to news coverage of a catastrophic situation, or people growing tired of mitigation measures. Such further adaptations, however, can only be performed on a per-country basis by experts with an intimate knowledge of the local situation. Our code provides a solid and extensible base for this. For Germany, several developments in the near future may have to be included in the model. First, people may have transiently changed their behavior over the Easter holidays; second, we expect a series of change points, as well as continuous drifts, with governments trying to ease and calibrate mitigation measures. Third, extensions to hierarchical models will enable regional assessments, e.g., on the level of federal states.

Even after the three major governmental interventions in Germany, effective growth rates remain close to zero and warrant careful consideration of future measures. At present, estimates of effective growth rates dropped to

In conclusion, our Bayesian approach allows detection and quantification of the effect of governmental interventions and, combined with potential subsequent interventions, forecasting future case number scenarios. Our analysis highlights the importance of precise timing and magnitude of interventions for future case numbers. It also stresses the importance of including the reporting delay D between the date of infection and the date of the confirmed case in the model. The reporting delay, D, together with the time required to implement interventions means that changes in our behavior today can only be detected in confirmed cases in two weeks time. Thus, this delay, combined with a current spreading rate that is still close to zero, indicates extremely careful planning of future measures is essential.

## Materials and methods

As a basis for our Bayesian inference and the forecast scenarios, we use the differential equations of the well-established SIR (Susceptible-Infected-Recovered) model. We also test the robustness of our results by means of more sophisticated models, in particular an SEIR-like model that explicitly incorporates an incubation period (fig. S3). While the SIR model-dynamics is well understood in general, here our main challenge is to estimate model parameters specifically for the COVID-19 outbreak, and to use them for forecasting. To that end, we combined a Bayesian approach — to incorporate prior knowledge — with Markov Chain Monte Carlo (MCMC) sampling — to compute the posterior distribution of the parameters and to sample from it for forecasting. Put simply, we first estimate the parameter distribution that best describes the observed situation, and then we use many samples from this parameter distribution to evolve the model equations and thus forecast future developments.

The data used comes from the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE) dashboard (*40*). The JHU CSSE provides up-to-date data on COVID-19 infections, usually a few days ahead of official German sources. The exact version of the data and code is available at (*31*). Data were incorporated until April 21. Note that after this cutoff date, additional modeling of the effects of behavioral changes over the Easter holidays becomes necessary.

### Simple model: SIR model with stationary spreading rate

We consider a time-discrete version of the standard SIR model. In short, we assume that the disease spreads at rate λ from the infected population compartment (I) to the susceptible compartment (S), and that the infected population compartment recovers (R) at rate μ. This well-established model for disease spreading can be described by the following set of (deterministic) ordinary differential equations (see, e.g., Refs (*5*, *6*, *20*)). Within a population of size N,

In our model we do not explicitly incorporate the inflow of additional infected people by travel for two reasons. First, we implicitly model the initial surge of infections with *41*).

### Full model: SIR model with weekly reporting modulation and change points in spreading rate

Our change point detection builds on a generalization of the simple SIR model with stationary spreading rate. We now assume that the spreading rate

In addition, we include a weekly modulation to account for lower case reports around the weekend which subsequently accumulate during the week. To model the systematic variation of case reports during the week, we adapted the newly reported cases by a reporting fraction

### Estimating model parameters with Bayesian MCMC

We estimate the set of model parameters *42*) with NUTS (No-U-Turn Sampling) (*43*) using multiple, independent Markov chains. The structure of our approach is the following:

### Initialization of the Markov chains via variational inference

The posterior is approximated by Gaussian distributions ignoring correlations between parameters through automatic differentiation variational inference (ADVI) (*44*), which is implemented in PyMC3. From this distribution, four starting points for four chains are sampled.

#### Burn-in phase

Each chain performs 1000 burn-in (tuning) steps using NUTS, which are not recorded. This serves as equilibration in order to sample from an equilibrium distribution in the next step.

#### Sampling phase

Each chain performs 4000 steps, which are used to approximate the posterior distribution. To ensure that the chains are equilibrated and sampled from the whole posterior distribution (ergodicity), we verified that the R-hat statistic is below 1.05, which is implemented in PyMC3. The rank normalized R-hat diagnostic tests for lack of convergence by comparing the variances within chains and between chains: For identical within-chain and between-chain variances R-hat becomes 1, indicating convergence. For well-converged chains the resulting samples will describe the real-world data well, so that consistent forecasts are possible in the forecast phase.

### Forecast using Monte Carlo samples

For the forecast, we take all samples from the MCMC step and continue time integration according to different forecast scenarios explained below. Note that the overall procedure yields an ensemble of forecasts — as opposed to a single forecast that would be solely based on one set of (previously optimized) parameters.

### MCMC sampling details

Each MCMC step requires to propose a new set of parameters θ, to generate a (fully deterministic) time series of new infected cases *45*), and thus reporting noise. The case-number-dependent width is motivated by observation noise through random subsampling (*38*), resulting in a variance proportional to the mean. Our likelihood neglects any noise in the dynamic process, as the SIR model is deterministic, but could be in principle extended to incorporate typical demographic noise from stochastic spreading dynamics (*35*, *46*).

### Priors that constrain model parameters

As short-term forecasts are time-critical at the onset of an epidemic, the available real-world data are typically not informative enough to identify all free parameters, or to empirically find their underlying distributions. We therefore chose informative priors on initial model parameters where possible and complemented them with uninformative priors otherwise. Our choices are summarized in Table 2 for the simple model, i.e., the SIR model with stationary spreading rate for the exponential onset phase, and in Table 3 for the full model with change points, and are discussed in the following.

### Priors for the simple model (Table 2)

In order to constrain our simple model, an SIR model with stationary spreading rate for the exponential onset phase, we chose the following informative priors. Because of the ambiguity between the spreading and recovery rate in the exponential onset phase (see description of simple model), we chose a narrow log-normal prior for the recovery rate *20*). Note that our implementation of μ accounts for the recovery of infected people and isolation measures because it describes the duration during which a person can infect others. For the spreading rate, we assume a broad log-normal prior distribution *18*, *33*, *34*). In addition, we chose a log-normal prior for the reporting delay *32*) plus a delay from infected people waiting to contact the doctor and get tested.

The remaining model parameters are constrained by uninformative priors, in practice the Half-Cauchy distribution (*47*). The half-Cauchy distribution

### Priors for the full model (Table 3)

In order to constrain our full model, an SIR model with weekly reporting modulation and change points in the spreading rate, we chose the same priors as for the simple model but added the required priors associated with the change points. In general, we assume that each set of governmental interventions (together with the increasing awareness) leads to a reduction (and not an increase) of λ at a change point. As we cannot know yet the precise reduction factor, we adhere to assume a reduction by

For the spreading rates, we chose log-normal distributed priors as in the simple model. In particular, we chose for the initial spreading rate the same prior as in the simple model,

For the timing of change points, we chose normally distributed priors. In particular, we chose *Kontaktsperre*), which includes the prohibition of even small public gatherings as well as complete closing of restaurants and non-essential stores was imposed by the government authorities. Further policy changes may occur in future; however, for now, we do not include more change points.

The change points take effect over a certain time period

The number of tests that are performed and reported vary regularly over the course of a week and are especially low during weekends. To account for this periodic variation, we modulated the number of inferred cases by the absolute value of a sine function with in total a period of 7 days. We chose this function as it is a non-symmetric oscillation, fitting the weekly variation of cases on a phenomenological level. For the amplitude of the modulation we chose a weakly informative Beta prior:

### Model comparison

Since change point detection entails evaluating models with different numbers of parameters, some form of fair model comparison is needed. This is necessary to compensate for the higher flexibility of more complex models, as this flexibility carries the risk of overfitting and overconfident forecasts. The standard approach to avoid over-fitting in machine learning is cross-validation, and cross validation has recently also been advocated for Bayesian model comparison (e.g., (*3*, *36*)), especially for models employed for predictions and forecasts. Thus, one would ideally like to compare the models with different numbers of change points by the probability they assign to previously unobserved data points. Technically this is measured by their out-of-sample prediction accuracy, i.e., their log pointwise predictive density (lppd):

## Supplementary Materials

science.sciencemag.org/cgi/content/full/science.abb9789/DC1

Tables S1 and S2

Figs. S1 to S7

MDAR Reproducibility Checklist

This is an open-access article distributed under the terms of the Creative Commons Attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

## References and Notes

**Acknowledgments:**We thank Tim Friede, Theo Geisel, Knut Heidemann, Moritz Linkmann, Matthias Loidolt, and Vladimir Zykov for carefully and promptly reviewing our work internally. We thank the Priesemann group - especially Daniel Gonzalez Marx, Fabian Mikulasch, Lucas Rudelt and Andreas Schneider - for exciting discussions and for their valuable comments. We thank the colleagues of the Göttingen Campus, with whom we were discussing the project and the COVID-19 case forecast in the past weeks very intensively: Heike Bickeböller, Eberhard Bodenschatz, Wolfgang Brück, Alexander Ecker, Andreas Leha, Ramin Golestanian, Helmut Grubmüller, Stephan Herminghaus, Reinhard Jahn, Norbert Lossau and Simone Scheithauer. We thank Nils Bertschinger for stimulating discussion on Bayesian modeling and model comparison.

**Funding:**All authors received support from the Max-Planck-Society. JD and PS acknowledge funding by SMARTSTART, the joint training program in computational neuroscience by the VolkswagenStiftung and the Bernstein Network. JZ received financial support from the Joachim Herz Stiftung. MW is employed at the Campus Institute for Dynamics of Biological Networks funded by the VolkswagenStiftung.

**Author contributions:**JD, JZ, MW, MW, VP designed research. JD, JZ, PS conducted research. JD, JZ, PS, JPN, MW, MW, VP analyzed the data. JD, PS, JPN created figures. All authors wrote the paper.

**Competing interests:**VP is currently active in various groups to advise the government for which VP is not receiving payment.

**Data and materials availability:**Both data (

*48*) and analysis code (

*31*) are available online. This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0/. This license does not apply to figures/photos/artwork or other content included in the article that is credited to a third party; obtain authorization from the rights holder before using such material.