Research Article

Inferring change points in the spread of COVID-19 reveals the effectiveness of interventions

See allHide authors and affiliations

Science  10 Jul 2020:
Vol. 369, Issue 6500, eabb9789
DOI: 10.1126/science.abb9789

Compose eLetter

Plain text

• Plain text
No HTML tags allowed.
• Lines and paragraphs break automatically.
Author Information
Statement of Competing Interests

This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.

Vertical Tabs

• Test number adjustment entails important corrections on the curve of new infections and the initial spreading rate
• Johannes Wollbold, 2005 - 2015 data analyst and modeler in systems biology, now teacher and scientific journalist

I refer to my comment of June 6, "RE: Comment by Frank Michler, 29 May 'the elephant in the room – number of tests'"

The number of tests strongly increased, along with positive results, until week 12 (March 14 to 20, compare [1]). This includes the initial phase of the model (March 2 - 15), where the stationary spreading rate lambda_0 is calculated from the daily new reported cases I. This is a critical step dominating the whole model. It strongly depends on the data fit of I in Fig. 1A.

However, the resulting exponential curve of I is very misleading. As pointed out in my comment of June 6, it would be more realistic to use the weekly or daily ratio positive / overall tests [2], at least for an alternative calculation. Then, the data fit of I would resemble more a linear than an exponentially growing function. Hence, a stationary growth rate lambda_0 = 0,41 would no longer be supported by data (I suppose lambda as decreasing from the very beginning).

I doubt that in this case, you would detect a significant influence of governmental measures on the effective growth rate lambda* = lambda / my, as suggested by Fig. 3A. Not to speak about the supplementary use of official data for the onset of symptoms instead for the reporting date of new infections. You did this - without test number adjustment - in the hardly visible Technical notes, Fig. 17A [3]. As Stefan Homburg pointed out on June 8, already this figure turns your main conclusi...

Competing Interests: None declared.
• Use of inappropriate and unreliable data
• Christof Kuhbandner, Full professor, Chair of Psychology, Department of Human Sciences, University of Regensburg, Germany

To assess the potential effects of interventions on the spread of a virus, it is crucial to determine the date of infection as exactly as possible. With misspecified infection dates, any conclusions about the effect of interventions are meaningless. In their original paper, the authors estimated the date of infection based on the date when a confirmed case was reported, according to the Johns Hopkins University Center for Systems Science and Engineering dashboard.
However, since the intervals between dates of actual infections, diagnostic testing, and reporting differ vastly across people, it is hardly possible to conclude anything meaningful from modeling the spread of infections using reporting dates. Germany’s Robert-Koch Institute (RKI) employs a more sophisticated approach. They model the spread of infections based on dates of symptom onset [1], indicating that the spread of infections was already in decline before the first intervention, and was even negative before the extensive lockdown.
In their recently published technical note on their original paper [2], Dehning et al. reconsider their model, modeling the spread of infections based on the onset of symptoms as well. Their new principal result corroborates the findings of the RKI modeling (see Fig. 17A in their technical notes paper): the effective growth rate started declining sharply on March 7 before the first intervention, reached a value of zero at the time of school closure, and became negative...

Competing Interests: None declared.
• Answers to some recent questions
• Viola Priesemann, corresponding author, on behalf of all authors, Max Planck Institute for Dynamics and Self-Organization

We would like to thank all authors of the past eLetters for their interest and questions. For the replies, we refer to our technical notes at [1], which address most questions in a systematic manner, and with figures and equations. Please note that addressing the questions thoroughly may need some time. Therefore, we kindly ask you for some patience. The technical notes are continuously updated and thus should be considered at present as a non-peer reviewed internal document. Our main motivation to share it at this stage is to enable rapid communication of further analyses, and clarification of open points. In addition, you find below the replies to a few specific technical aspects.

Question:
Daily reports from the Robert-Koch Institute presents smoothened data. Why did you not use smoothened data in your technical notes but hold on to the week-wise modulation, and how would using smoothened data affect the results?

We would first like to point out the fundamental difference between techniques aimed at a compact description of the day to day changes, and a modeling of disease dynamics and the observations process (i.e. weekly modulations) intended to infer spreading parameters; details can be found in our technical report ([1], sections II and III). In the first descriptive approach noise is not modeled, and therefore directly affects the unsmoothed results. This is then remedied post-hoc using data smoothing. In contrast, in our Bayesian m...

Competing Interests: None declared.
• RE: COVID-19 containment policies and voluntary behavioural adaptation
• Moritz A. Drupp, Assistant Professor, Department of Economics, University of Hamburg, Germany
• Other Contributors:
• Björn Bos, Department of Economics, University of Hamburg, Germany
• Eli P. Fenichel, Professor, Yale School of Forestry & Environmental Studies, Yale University, USA
• Jasper N. Meya, German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Germany
• Martin F. Quaas, Professor, German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Germany
• Till Requate, Professor, Department of Economics, Kiel University, Germany
• Hanna Schenk, German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Germany

In their article “Inferring change points in the spread of COVID-19 reveals the effectiveness of interventions” (15 May, 10.1126/science.abb9789), J. Dehning et al. study the development of COVID-19 in Germany using an epidemiological SIR model and Bayesian inference to draw conclusions on the effectiveness of governmental interventions for containing COVID-19, in particular regarding the German contact ban.

We argue that analyses quantifying the effectiveness of regulatory interventions need to consider voluntary behavioural reactions to the pandemic motivated by self-protection and the protection of others, and how individual behaviour interacts with interventions. After all, most interventions are behavioural (1). In an interview with a German newspaper, the corresponding author highlights her own forward-looking voluntary behaviour in reducing the spread of the pandemic (2), an aspect not captured in their study. Research from the 2009 A/H1N1 swine flu and COVID-19 suggests that voluntary behavioural changes are substantial (3-5). Mobility data from Germany and survey evidence from around 3500 residents in Germany show that contacts were already reduced drastically before the contact ban (6). The actual development of infection rates thus reflects a mix of governmental interventions and voluntary behaviour.

Studies evaluating governmental interventions should therefore make use of extended SIR models that consider voluntary, forward-looking decision-mak...

Competing Interests: None declared.
• Calculation with one point of longer duration
• Sergey Litvinov, Computational Science and Engineering Laboratory, ETH Zürich
• Other Contributors:
• Petr Karnakov, Computational Science and Engineering Laboratory, ETH Zürich

The paper does not rule out obvious alternative explanations and makes conclusions based on inadequate statistical evidence.

Using the code shared by the authors, we have found a new model that has only one change point but shows almost the same LOO-CV score as the best model with three change points from the paper [1]. To achieve this, we have only modified the prior distributions by allowing a longer duration of the change point. The scores of the models depending on the number of change points are summarized as (lower is better):

- one, 819 ± 16
- one with longer duration, 790 ± 15
- two, 795 ± 17
- three, 786 ± 17

We disagree with the authors who find a model with longer duration “implausible”. People may gradually adjust their behavior instead of responding to specific interventions within narrow time windows as postulated by the authors.

Conclusions made purely from the LOO-CV score lack common sense. The LOO-CV (leave-one-out cross validation) score is a measure for model comparison and shows how well the model predicts a single point excluded from the dataset. However, in this paper the authors apply the measure to a time series, and the LOO-CV score is computed given the entire history and the future, which are strongly correlated. The authors compare the LOO-CV score of the models with one and three change points without mentioning that the actual predictions of daily new cases differ by at most 5% and only for about...

Competing Interests: None declared.
• RE: Reply to: "Premature Conclusions on the Effectiveness of Government Interventions"

I appreciate that the authors have provided clarification and additional material to elucidate their approach. One can certainly justify their fitting procedure and their choice of priors if one buys into the two critical assumptions that they make, i.e. that the assumed SIR/SEIR model dynamics is correct, and that it is the government interventions *only* that change the growth rate. But if that is their approach, then the conclusion of the paper is a fallacy because they are assuming the conclusion. This is not an uncommon mistake with Bayesian inference; for this approach to work correctly, it is critical that the choice of competing hypotheses is not unduly constrained by zero priors or exceedingly small priors.

To draw conclusions on the effect of NPIs, the authors would either have to i) use a more agnostic approach or ii) explicitly include a model that incorporates changes in R(t) by natural effects or by self-organised changes in the behavior of the population. The model proposed by the authors would also be strengthened if it displayed an unusual degree of robustness and consistency that cannot be explained by chance. One way (among others) to accomplish this would be to relax the priors and critically examine to what extent their inference actually rests on the data and not on their priors. Considering their responses and the additional material in [1], I do not see that this is the case, quite the contrary.

The arguments stated to support step-wis...

Competing Interests: None declared.
• RE: Still concerns in the "Technical Report", communication of results

Thanks for the clarifying technical notes and the extensive answers to most of the mentioned points in these eLetters.
However there are still some things to discuss.

— 1 —

FIG. 19. in the technical note [1] to this study shows the results for an SEIR-like Model using the onset of symptoms referencing the „nowcast from May 22“.
The RKI-nowcast-sheet [2],[3] contains two columns for case numbers, one for new cases („Punktschätzer der Anzahl Neuerkrankungen (ohne Glättung)“) and a smoothened number of new cases („Punktschätzer der Anzahl Neuerkrankungen“).

The plot in Fig 19. suggests, that the updated results from the technical note are calculated using the first column as data to fit (one-peek vs. two-neighbour-peek for the real „nowcast“).
The RKIs Daily Reports show the smoothened data as „nowcast“. It does not show the significant week-wise modulation and uses statistical methods to estimate a „virtual“ symptom-onset for the cases without date for onset-of-symptoms.
As these methods are using epidemiological and systematical insight from RKI [4] it seems to be favoured over a week-wise-modulation.

Why do the authors favour holding on the week-wise-modulation over using the smoothened data from RKI?
How does it effect the results, if the smoothened number of cases is used?

References:
[1]
...

Competing Interests: None declared.
• Technical Report

The article's main conclusion "that the full extent of interventions was necessary to stop exponential growth" was based on Fig. 3A. Your technical report replaces this graph by Fig. 17A, which is computed from reliable official data to which you had access before completing your article.

Your Fig. 17A shows: The growth rate fell sharply since 5 March. Inhibition of large gatherings (9 March) did not affect it visibly. Growth of the corona virus was zero before school closings (16 March) and negative long before the general lockdown (23 March). This puts your original conclusions upside down: Neither intervention was necessary.

Competing Interests: None declared.
• RE: Comment by Frank Michler, 29 May "the elephant in the room – number of tests"
• Johannes Wollbold, 2005 - 2015 data analyst and modeler in systems biology, now teacher and scientific journalist

Very important argument falsifying the premature conclusion of the article: the effectiveness of interventions. Question to the authors: Do you see a chance to correct the heavy bias not to consider the initial strong increase in the test activity?

It would be more realistic to use the weekly or daily ratio positive / overall tests, see Table 8 and Fig. 8 in the RKI situation reports on Wednesdays: https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Situationsb..., e.g. https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Situationsb... (in the English version, there are only weekly ratios).

The problem remains, that these data refer to the dates of the tests. In order to evaluate temporal dependencies between Covid-19 countermeasures and the disease development, one has to consider the unknown time point of the infection. The day of symptom onset (situation report Fig. 7) allows statistical computations of the infection time. These data, however, are not corrected by test number.

Hence, detailed and statistical new computations from official raw data are required. Before they are available, fitting a model to heavily biased data doesn't make any sense, by my opinion.

Competing Interests: None declared.
• Reply to: "Premature Conclusions on the Effectiveness of Government Interventions"
• Viola Priesemann, corresponding author, on behalf of all authors, Max Planck Institute for Dynamics and Self-Organization

Mueller: "Recently, Dehning et al. analyzed COVID-19 case numbers from Germany and claimed evidence for change points in the epidemic curve that reflect and support the effectiveness of government interventions in combating the disease (1). Close scrutiny reveals that these conclusions are premature for various reasons.

Figures S1 and S2 in the Supplementary Materials suggest that the alternative scenarios of one or two steps in the effective growth rate may have been unduly dismissed. The posterior distribution of the change duration peaks in the tail of the prior distribution, which implies that the parameter estimate is heavily affected by the prior and constrained to a bad fit."

Reply: Thank you for your comments on our paper, which we would like to reply to point by point. In addition to our replies here, we prepared a technical report, which addresses the central points in all extent, including figures and sketches [1].

Let us stress from the outset that, when it only comes to providing a good fit to the data, a large number of models could be applied. However, especially given the scarce data at the beginning of an epidemic, minimal models based on plausible assumptions are required. This is why we chose an established epidemic model coupled to a Bayesian inference of the model parameters.

To be explicit here once more, our modeling relies on two main assumptions: (1) Disease spread can be described by the epidemiological SIR/SE...

Competing Interests: None declared.
• Reply to: "51 data points of an observational study do not allow detailed causal conclusions"
• Viola Priesemann, corresponding author, on behalf of all authors, Max Planck Institute for Dynamics and Self-Organization

Höhle et al.: "In their study, Dehning et al. claim that “In conclusion, our Bayesian approach allows detection and quantification of the effect of governmental interventions …”. As much as we endorse their data and model based approach on this delicate matter, we urge that such highly relevant statements be made with due caution for the following reasons: Their approach is based on observational data, specifically on the one-dimensional time series of the number of newly reported Covid-19 infections in Germany per day (51 data points)."

Reply: Thank you for sharing your comments on our paper, which we would like to address point by point. In addition, we prepared a paper draft, which addresses the central points in all extent, including figures and sketches [1].

We would like to start with a small remark on the title of your eLetter, which seems to suggest that (strict) causality could be established using more data points, which is unfortunately not the case, as also detailed below in our discussion of causality. Yet, we assume that the title was actually meant to point to the problem of overfitting. Here, our reply would be (1) that the use of informative priors drastically reduces the effective degrees of freedom of our model to approximately 10, (2) that the uncertainty induced by the small number of datapoints is also reflected in the size of the Bayesian credible intervals, and that (3) Bayesian model selection in addition takes care of the p...

Competing Interests: None declared.
• COVID-19 Overfitted

In ‘Inferring change points in the spread of COVID-19 reveals the effectiveness of interventions’ (1) Jonas Dehning et al. attempt to identify change points in the propagation dynamics of SARS-CoV-2 as measured by reported German case numbers. They arrive at the conclusion that “that the full extent of interventions was necessary to stop exponential growth,” a result with strong political implications but very weak support from their data.

The authors’ model uses daily German case numbers of positive tests for the novel coronavirus from not quite two months, giving 51 observations. Due to varying reporting delays, these observations are not fully independent of each other, so that the number of effective independent observations is even lower. What is more, when converted into metrics of propagation, such as reproduction numbers, those observations show a steady decline of virus propagation over the time period chosen by the authors to evaluate the effectiveness of interventions in March, and essentially a featureless flat line in April. (2)

To explain the decline in disease propagation activity, the authors used the MCMC method to fit a model, and their favored model by their own reckoning has an effective number of parameters of 13.4. With the three change points proposed by the authors falling on March 9, 16, and 23, this gives one model parameter for about every two relevant observations, less when accounting for observations not being independent. Of cou...

Competing Interests: None declared.
• Covid-19: Effect of Governmental Interventions

In their article “Inferring change points in the spread of COVID-19 reveals the effectiveness of interventions” (1), Dehning et al. present a computational model for the current Covid-19 epidemic in Germany. They find 3 turning points and attribute all changes in the epidemics to 3 sets of governmental interventions. They eventually conclude that “lifting restrictions too much will quickly lead to renewed exponential growth”.

This paper has severe issues and is misleading.

First of all, there is no “control group”. As politicians are not scientists, and scientific experiments affecting all citizens are ethically problematic, that fact is not a cause for rejection per se. Scientists must be aware of it and try to find ways to challenge their hypotheses.

They could e.g. compare parts of the country. As Germany is a federation of 16 states, they could check their model on the basis of individual states or aggregations of states. Since they state that “within smaller communities (e.g., federal states or cities), additional details may become important” their model may have failed with that.
They do not consider confounding variables at all – their article does not even contain the word “confounding”. Such a confounding variable may be the seasonality of infectious respiratory diseases. Late winter is “flu season” in Germany, and the flu ebbs off in spring without any governmental intervention. The seasonality of Covid-19 is not yet known, but the fa...

Competing Interests: None declared.
• RE: the elephant in the room – number of tests
• Frank Michler, Software Developer, Computational Neuroscientist, Philipps-University Marburg

Why have authors and peer reviewers ignored the elephant in the room – the
influence of the number of tests?

In their article, Dehning et al. use numbers of official COVID-19 cases
(people positively tested for Sars-CoV-2) to fit parameters of a SIR based
epidemiological model.
Based on these modeling results the authors make far reaching political conclusions,
which have already been quoted in public news papers.
Corresponding author Viola Priesemann claims in "Der Spiegel",
that only government mandated contact restrictions have brought down case numbers [Spiegel].
In German television she argued for prolonging government restrictions of public life [AnneWill].

for epidemiological modeling, but also participating
in the political discourse about civil liberties.

Conclusions drawn from statistical analysis and modeling
critically hinge on the validity of the underlying data.
The authors used the number of people who had been positively tested for Sars-CoV-2
as a measure for the number of infected people.
Obviously, the number of positively tested people depends on the number of tests.
More tests lead to more positive results.
The rise of the estimated R value - as calculated from the Nowcast by RKI -
in early March (03.03.2020: R=1.9; 10.03.2020: R=3.3) [RKI-17/2020]
is an artef...

Competing Interests: None declared.
• RE: changing number of tests make used data difficult to interpret

The here presented study on “Inferring change points in the spread of COVID-19 reveals the effectiveness of interventions” uses a data set published by John Hopkins University containing a time series of number of new Covid-19 cases per day in Germany. Despite the statement “Tackling these tasks is challenging due to the large statistical and systematic errors that occur during the initial stages of an epidemic when case numbers are low.” at the article’s introduction, the authors completely ignore the fact that the number of tests increased from <100.000 to 360.000 per week during weeks 10 to 13 (as published by german RKI (https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Testzahl.html first link on page).
Beginning of March symptomatic people in Germany were often sent home without testing because lack of available tests. Increased test capacities would have detected the truly infected among them and increased the number of new confirmed cases linearly with the number of tests. The changing number of tests shapes the presented data significantly and therefore any “change points” within. This should have been addressed and discussed in the article and I would be surprised to find valid arguments to ignore this feature of the data for the purpose presented.

Competing Interests: None declared.
• RE:

I miss the necessary caution in drawing conclusions. The (main) reasons are:

i) The authors don't seem to take voluntarily social distancing and decreasing movement into account. Compare for instance the google community mobility reports for Germany.

ii) The authors don't seem to take the changing amounts of tests and test procedures into account.

I hope these issues will be addressed and discussed in a future version.

Competing Interests: None declared.
• 51 data points of an observational study do not allow detailed causal conclusions
• Michael Höhle, Department of Mathematics, Stockholm University, Sweden
• Other Contributors:
• Felix Günther, Statistical Consulting Unit StaBLab, LMU Munich and Department of Genetic Epidemiology, University of Regensburg, Germany
• Helmut Küchenhoff, Statistical Consulting Unit StaBLab, LMU Munich, Germany

In their study, Dehning et al. claim that "In conclusion, our Bayesian approach allows detection and quantification of the effect of governmental interventions ...". As much as we endorse their data and model based approach on this delicate matter, we urge that such highly relevant statements be made with due caution for the following reasons: Their approach is based on observational data, specifically on the one-dimensional time series of the number of newly reported Covid-19 infections in Germany per day (51 data points). Furthermore, it is based on many modelling assumptions and the use of strong informative prior distributions for the change-points. Although the authors document their modelling thoroughly, they lack the necessary caution in the interpretation of their results. Finally, the authors do not address the important issue of how possible changes in test procedures and amount of tests affect the number of reported cases.

One major conclusion of the study by Dehning et al. was that the shutdown measures on the 23rd of March were necessary to bring down the effective growth rate below zero. This is an unwarranted interpretation of an association as being causal. Furthermore, this result contradicts the analysis of the German Robert Koch Institute (RKI) based on estimated onset of disease - see Figure 2 of An der Heiden and Hamouda (2020). From the time scale of disease onset in the figure an additional incubation time period (approx. 5 days) h...

Competing Interests: None declared.
• Premature Conclusions on the Effectiveness of Government Interventions

Recently, Dehning et al. analyzed COVID-19 case numbers from Germany and claimed evidence for change points in the epidemic curve that reflect and support the effectiveness of government interventions in combating the disease (1). Close scrutiny reveals that these conclusions are premature for various reasons.

Figures S1 and S2 in the Supplementary Materials suggest that the alternative scenarios of one or two steps in the effective growth rate may have been unduly dismissed. The posterior distribution of the change duration peaks in the tail of the prior distribution, which implies that the parameter estimate is heavily affected by the prior and constrained to a bad fit. Furthermore, a careful reconstruction of the actual infection dates shows no evidence for stepwise reductions in the growth rate (2). The assumed disease model also suffers from a parameter degeneracy; it remains invariant under a shift of the intervention dates with a concomitant change in reporting delay and a rescaling of the initial case number. The data therefore cannot constrain the timing of putative changes in growth rate relative to the government interventions.

Finally, the authors do not take into account that multi-scale effects can naturally result in deviations from exponential growth with the basis reproduction number well below the classical herd immunity threshold (3). As the infection has progressed to less mobile strata of society during the outbreak (2), such natural chan...

Competing Interests: None declared.
• RE: Statistically Significant Variables

I question their use of case numbers and I imagine this will be a major point of contention with the peer review. Their whole thesis is based on case numbers. They even admit that they had to correct for reduced reporting rates during the weekend? I have to clean up data all the time when I am assembling statistical models, it is part of the job, but this is a red flag. It means that the public is controlling the dependent variable that does not seem to be accounted for in this analysis. In addition, if you look at Germany’s testing rates, they were on an upward trajectory and then plateaued during the data time series. There wasn’t enough tests to go around and even according to the german health advisory they are still not testing enough. Another example of controlling the dependent variable. In short, if the dependent variable (case reporting) is being controlled by another more statistically significant independent variable (outside of the analysis), your curve fit, in this case using MCMC, will be wrong. I would be interested to see them apply an MCMC on the testing rate and, most importantly, fatalities.

Competing Interests: None declared.
• RE: Wrong data used

I was quite surprised that the team found indication for the 3rd intervention on March 23rd. The German RKI already reported there R-value estimation a month ago showing a stable below 1 value of R already some days before the 23rd.
https://edoc.rki.de/bitstream/handle/176904/6650.4/17_2020_2.Artikel.pdf...
The RKI is using the corrected data based on reported or estimated symptom onset date. In the daily reports from the RKI the plots for symptom onset date could be found.
https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Situationsb...
The reported date plot is in the German version only
https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Situationsb...
Especially for the critical time frame end of march a big offset is visible. The onset date curve had its maximum already on March 17th, the reported curve maximum is on April 2nd. The study is using John Hopkins data which is quite similar to the reported curve with only a few days before. The difference of this curves is driven by reporting lacks. This issue is likely caused by either bureaucratic reason...

Competing Interests: None declared.
• RN

Is it part of the German culture to follow the restrictions placed on the people of Germany. Were masks part of the requirements? Similar restrictions have been placed in the USA but people are not following them including our president. Is the USA doomed? So good to see a study that proves science works. As a hospital infection Control nurse I worked for 17 years trying to get employees just to wash their hands it was difficult. If everyone in the World stayed home for 2 weeks the pandemic could be eliminated.

Competing Interests: None declared.
• RE:

Hi, thanks for the article. I am surprised that you didn't adjust the number of confirmed cases by taking into account the number of conducted tests. When looking at the data from around the world, we see that exponential growth of confirmed cases was only observed during periods of exponential growth of testing. Moreover, even during the periods of exponential growth in testing, the growth of confirmed cases quickly becomes linear. See for example here (the short video shows the data for Germany and various other countries such as Sweden, UK, USA, Belarus, etc.): https://youtu.be/N9a7yR8MMOQ

Would it be possible to update your study in general, and your conclusion in particular, to take this bias into account?

Thanks and best regards,

Dirk

Competing Interests: None declared.