Measles and the canonical path to elimination

All World Health Organization regions have set measles elimination goals. We find that as countries progress toward these goals, they undergo predictable changes in the size and frequency of measles outbreaks. A country’s position on this “canonical path” is driven by both measles control activities and demographic factors, which combine to change the effective size of the measles-susceptible population, thereby driving the country through theoretically established dynamic regimes. Further, position on the path to elimination provides critical information for guiding vaccination efforts, such as the age profile of susceptibility, that could only otherwise be obtained through costly field studies or sophisticated analysis. Equipped with this information, countries can gain insight into their current and future measles epidemiology and select appropriate strategies to more quickly achieve elimination goals.


Data
We used data on population, birth rates, vaccination rates and measles cases by country and year from 1980. Population and birth rate data by country and year were available from the World Bank (23,24). Annual measles vaccination and case data by country were available from the WHO (25,26).
Calculation of mean incidence and coefficient of variation of incidence for incidence-space To calculate the mean incidence over a period of years, we did not calculate a moving average of the incidence in the given period, shifting by 1 year each time. This would result in fast, nonsmooth shifts in values of mean incidence as years come into and drop out of our time window after a period of time. Instead we took a weighted mean which never fully discounts a year in the past, though diminishes its impact considerably over time. We began by calculating the incidence of cases per 100,000 people for each year, which is given by the number of cases in the year divided by the population size for that year (Fig. S1A shows this for Nigeria). We wished to calculate the weighted mean incidence over the previous years, and analysis began when we had at least years of data. For our analysis, we set = 10, so the first year we included in the analysis was 1990 for which we calculated the weighted mean incidence for years [1981][1982][1983][1984][1985][1986][1987][1988][1989][1990]. To calculate the mean incidence over the previous years, we multiplied this time series by a truncated Gaussian distribution with a standard deviation of 3 years peaking 2 years before the year in question ( Fig. S1B shows examples of this Gaussian). The result of this example is shown in Fig. S1C.
The coefficient of variation of the incidence (CV) is defined as the ratio of the standard deviation of incidence by the mean of incidence. To calculate the weighted CV, we used the same Gaussian functions to estimate the weighted standard deviation and weighted mean incidence. However, when using the described Gaussian distributions, if we see an extended period of time where cases are far lower than had previously been seen (such as when approaching elimination), will result in a sharp increase in the weighted CV. This does not occur using unweighted CV when there is an unbroken chain of zeros for the entire length of the given period (i.e., 10 years) (see Fig. S2A for a comparison of these two methods for Bolivia as it approached elimination).
To avoid this problem, we instead began by calculating the unweighted CV for each 10-year period (e.g. for the 1990 entry we calculated the CV of cases for years [1981][1982][1983][1984][1985][1986][1987][1988][1989][1990]. We call these the "local CV" for each year (see Fig. S2B for example from Nigeria). These local CV values were then weighted using similar Gaussian distributions to those previously described; however, the Gaussians differ in that we did not wait until we had 10 years' worth of previous points to use (see Fig. S2C for example from Nigeria).
To generate the canonical path (see Fig. 1B & 2A for examples), this process was applied to all countries in Africa and the Americas. Following that the mean point in incidence space for the countries in both of these continents was taken each year (see green and purple arrows in Fig. 1A for the mean position of Africa and the Americas over time respectively, and Fig. S13 for the median positions). To generate the canonical path, these arrows are joined at the point at which they cross, giving a picture of what progression from endemicity of measles (as seen in Africa at the beginning of the data) to near elimination (seen in the Americas at the end of the data) would ideally look like. This linked path is what is seen in Fig. 1B. In Fig. 2A, each yearly point on the path is shown, with an adjustment in the mean incidence so that the individual incidences lie in the same range as the coefficient of variation.
Scaling of incidence and CV for placing countries on the canonical path Assigning countries to the nearest point on the canonical path first required that the incidence and CV are on the same scale. At non-scaled values, incidence will dominate each country's point on the path due to the larger range in incidence compared to the CV. Additionally, for countries with low incidence it is difficult to differentiate between their points on the path. To remedy this, we took the natural log of the incidence values, which gives us much more resolution for countries with low levels of incidence, and scaled both the logged incidence and the CV so that all of their values are between 0 and 1 (Fig. S15). Figure S16 depicts the same results as Fig. 1, although includes this scaling, with an added dashed line to show the mean position of the Americas from 2015-2017, which regresses with respect to the canonical path, due to an increased number of cases in these years. This scaling was used for the construction of Fig. 2C, Fig. 3, and Fig. 4 in the main text.
We conducted a sensitivity analysis of two additional scaling techniques: i) taking the square root of incidence values and then scaling both square rooted incidence and coefficient of variation so that all of their values are between 0 and 1, and ii) taking the natural log of incidence values and square root of the coefficient of variation and again scaling both so that all of their values are between 0 and 1. The results of the sensitivity analysis, displayed via position and speed of movement by country over time (methods discussed below), are shown in Fig. S29.
Calculating a granular version of the canonical path Along with the canonical path with a point per year (38 points), with distances between points being allowed to vary depending on the data, we also created a version of the path which added four equally spaced points between the original path points (186 granular points). This was done in order to be able to more accurately discuss the speed of movement along the path, and the expected versus observed movement of countries along the path. Without this additional path, a country moving by a certain amount near one part of the path could transition through a different number of points in the path compared with a country moving the exact same amount in another part of the path. This granular path can be seen in Fig. S15 as the black dotted line.

Simulation of country-specific measles incidence
We used an established discrete time age-structured mathematical model, introduced in (27,28) to simulate measles transmission dynamics for each country in the WHO Americas and Africa Regions. The key feature of the model is a matrix that at every time-step defines transition from every possible epidemiological stage (e.g. maternally immune, susceptible, infected, recovered, and vaccinated) and age combination to every other possible epidemiological stage and age combination. The matrix captures transitions between each epidemiological stage within each age class and discrete time-step, where the time-step was set to the approximate generation time of measles, about 2 weeks, assuming 24 infection generations in a year.
Simulations were based on country-specific reported data, including demographic (population age distribution, birth rate, death rate) (29) and vaccination coverage data (25), and shared empirically derived measles epidemiologic parameters, including rate of decline in maternal immunity (30), vaccination efficacy (31), basic reproductive number (R0=12), and age-contact rates (32), but were otherwise completely model driven. Starting conditions were based on the distribution of immunity resulting from simulations of 30 years of uncontrolled measles in a population with a constant birth rate and an age distribution equivalent to that in the country as of 1980.
The model outputs the number of individuals in each age class and epidemiological stage at every time-step, allowing us to directly extract the number of measles cases per age and timestep. Cases were aggregated over each year and divided by the mid-year population to estimate measles incidence per year and country. Each country was stochastically simulated 100 times. We then calculate the weighted mean incidence and weighted coefficient of variation of incidence as described above using a Gaussian distribution. Figure S3-S12 displays the result of this simulation and compares it to the incidence space based on estimated country incidence over time correcting for under-reporting using a state space model (12,13), see below for more information.
Estimating measles attack rates and population susceptibility As we wished to link the position in incidence-space to a distribution of susceptibility, we first needed country-specific susceptibility profiles. The susceptibility in the population is essentially unobservable, but we can use reported cases to infer the susceptibility over time. Simons et al., 2012 fit a dynamic transmission model to correct for under-reporting of measles cases, giving an estimate of true measles incidence rather than reported incidence. An extended Kalman filter was then used to infer what the number of susceptible individuals is over time (13). To generate the profile of susceptibility, it is assumed that the probability of natural infection by age is given by Ρ(infection by age ) = 1 − (−0.149 × ). Combining this with information on birth rates, supplementary immunization activities, and routine vaccination, we generate the number of susceptibles in each age group in each year.
This method also outputs the estimated true incidence each year, meaning that we can both correct case numbers for underreporting and generate susceptibility profiles from the model output.

Using cases corrected for under-reporting to generate a canonical path
In addition to being used to generate estimates of susceptibility by age, the output of the Simons et al., model gives us estimates of true measles incidence rather than reported incidence. These estimates can then be treated in the same way as the reported cases, to generate a trajectory through incidence-space for each country. The equivalent plots to Fig. 1A, 2, 3 and 4 for when cases are estimated are seen in Fig. S14, S26-S28.
Position and speed of movement along the canonical path We calculated the position and speed of movement along the canonical path for each country between 1990 and 2017. We assigned countries to the nearest point on the granular version of the canonical path each year given their location in incidence space. We smoothed countries' position on the path over time by taking the average position for a given year along with two years before and after. Estimates begin in 1993, and 2016 and 2017 estimates are limited to 4 and 3 year averages, respectively. Speed of movement is simply calculated as the change in smoothed position from year to year. Yearly region averages (non-weighted and populationweighted (23)) were also calculated. Figure 3A-B display the results of this analysis. We additionally highlighted 10 countries interest that represent typical and non-typical movements by plotting their position and speed of movement over time in Fig. S17.

Direction of movement in incidence space
We estimated the observed and expected direction of movement for each country by year between 1990 and 2017 (total of 5044 observed movements). The observed direction of movement was estimated as the angle between a country's position in incidence space from year to year. We relied on our assignment of each country to the nearest point on the granular canonical path to estimate the expected direction of movement as the angle between the nearest point on the granular path to the point five positions further (representing a one year change); with the exception of the last 5 position on the granular path in which countries could move in the direction of the very last position. Angles ranging from -180 to +180 were grouped in 10 degree increments. We calculated frequencies of angles for each region in 10 and 8 year groupings (1990-1999, 2000-2009, and 2010-2017). We then smoothed the expected direction of movement frequencies for each region and year grouping by taking the average frequency for a given 10-degree angle group along with frequencies plus and minus 20 degrees. Figure 3C and 3D displays the results of this analysis.
We conducted an extensive supplemental analysis assessing direction of movement in incidence space by country and year compared to the canonical path. Figure S18 displays the difference between observed and expected direction of movement by country and year. We assigned both observed and expected movements to a quadrant (I-IV) and compared (Fig. S18A). We also directly compared direction of movement by subtracting the observed from expected and plotting the absolute differences on a color scale from green to brown (Fig. S18B). In order to assess which dimension (incidence or cv) was contributing more to each movement, we transformed all movement vectors unto unit vectors and subtracted each observed unit vector dimension from the respective expected unit vector dimension (Fig. S18C-D).
The length of observed vectors was estimated and compared to the difference between observed and expected movements in order to assess the potential association between differences in observed and expected movements and the magnitude of the movement. We estimated quartiles and deciles of all observed vectors, and plotted them against absolute and quadrant differences in observed and expected movements (Table S1, Fig S19).

Distance from the canonical path
We estimated individual dimension (incidence and CV) distance and total distance from the canonical path of each country's position in incidence space between 1990 and 2017. The distance measures are divided by the total range of the canonical path for each dimension and multiplied by 100 to get percentage. The observed vector distance is divided by the total length of the canonical path times 100 to get percentage. Distance from the canonical path was grouped by distance per log population size of each respective country and year (Fig. S20), and by position on the canonical path (Fig. S21).

Generalized additive models
To fit the expected location in incidence-space given birth rate, vaccination level or susceptible recruitment (i.e., the number of susceptible individuals recruited into the population each year defined as birth rate multiplied by (1-vaccination coverage)), we fit a generalized additive model (33). Data is of the form � , �, = 1, … , for a response variable and a predictor variable (in general we may have a vector of predictor variables, but we consider only one predictor at a time). Here our response variable, , is either incidence or CV, and predictor variables, , are one of birth rate, vaccination, or susceptible recruitment. The generalized additive model then fits the model, ( | ) = ( ) where is an unknown smooth function, which was estimated from the data by the model. We repeated this for all six combinations of response and predictor variable. For each predictor variable, we plotted the expected incidence and CV for the range of values in our data, generating the estimates shown in Fig. S22 and Fig. S23 for reported cases and estimated cases corrected for under-reporting, respectively.

Supplementary Text: Direction of Movement Analysis
For all countries and years (1990-2016), the median difference between observed (1 year forward) and expected movement is 0.00 degrees (IQR -57. 13  For all countries' movements between 1990 and 2016, 54.98% of observed movements were within the same quadrant of expected movement ( Figure S18) and 54.10% were within +/-45 degrees of the expected direction of movement ( Figure S18-19). Figure S18 displays the difference between observed and expected movements by quadrant for each country and year. Given that the path was built from Americas and Africa regions, we would expect these regions to have more observed movements within expected quadrants. This was true; countries in the Americas and Africa regions moved within the expected (absolute) quadrant of movement 64% and 58% of the time, while 50%, 49%, 48%, and 55% of observed movements were within the expected quadrant in Eastern Mediterranean, Europe, South-East Asia, and Western Pacific regions, respectively.  Simulated mean and coefficient of variation of measles incidence compared to estimated mean and CV of measles incidence for countries with population sizes exceeding 30 million from WHO's Africa Region, ordered by population size. The colored dots represent the simulated mean and coefficient of variation, and the plus signs represent the estimated mean and coefficient of variation. The colors from green to orange represent progression of the canonical path between 1990 and 2017, and black lines represent the means across all 100 simulations.

Fig. S4.
Simulated mean and coefficient of variation of measles incidence compared to estimated mean and CV of measles incidence for countries with population sizes between 25 and 15 million from WHO's Africa Region, ordered by population size. The colored dots represent the simulated mean and coefficient of variation, and the plus signs represent the estimated mean and coefficient of variation. The colors from green to orange represent progression of the canonical path between 1990 and 2017, and black lines represent the means across all 100 simulations.

Fig. S5.
Simulated mean and coefficient of variation of measles incidence compared to estimated mean and CV of measles incidence for countries with population sizes between 14 and 9.5 million from WHO's Africa Region, ordered by population size. The colored dots represent the simulated mean and coefficient of variation, and the plus signs represent the estimated mean and coefficient of variation. The colors from green to orange represent progression of the canonical path between 1990 and 2017, and black lines represent the means across all 100 simulations.

Fig. S6.
Simulated mean and coefficient of variation of measles incidence compared to estimated mean and CV of measles incidence for countries with population sizes between 9.4 and 3 million from WHO's Africa Region, ordered by population size. The colored dots represent the simulated mean and coefficient of variation, and the plus signs represent the estimated mean and coefficient of variation. The colors from green to orange represent progression of the canonical path between 1990 and 2017, and black lines represent the means across all 100 simulations.

Fig. S7.
Simulated mean and coefficient of variation of measles incidence compared to estimated mean and CV of measles incidence for countries with population sizes between 2.2 million and 700,000 from WHO's Africa Region, ordered by population size. The colored dots represent the simulated mean and coefficient of variation, and the plus signs represent the estimated mean and coefficient of variation. The colors from green to orange represent progression of the canonical path between 1990 and 2017, and black lines represent the means across all 100 simulations.

Fig. S8.
Simulated mean and coefficient of variation of measles incidence compared to estimated mean and CV of measles incidence for countries with population sizes less than 700,000 from WHO's Africa Region, ordered by population size. The colored dots represent the simulated mean and coefficient of variation, and the plus signs represent the estimated mean and coefficient of variation. The colors from green to orange represent progression of the canonical path between 1990 and 2017, and black lines represent the means across all 100 simulations.

Fig. S9.
Simulated mean and coefficient of variation of measles incidence compared to estimated mean and CV of measles incidence for countries with population sizes exceeding 17 million from WHO's America Region, ordered by population size. The colored dots represent the simulated mean and coefficient of variation, and the plus signs represent the estimated mean and coefficient of variation. The colors from green to orange represent progression of the canonical path between 1990 and 2017, and black lines represent the means across all 100 simulations.

Fig. S10.
Simulated mean and coefficient of variation of measles incidence compared to estimated mean and CV of measles incidence for countries with population sizes between 15 and 6 million from WHO's America Region, ordered by population size. The colored dots represent the simulated mean and coefficient of variation, and the plus signs represent the estimated mean and coefficient of variation. The colors from green to orange represent progression of the canonical path between 1990 and 2017, and black lines represent the means across all 100 simulations.

Fig. S11.
Simulated mean and coefficient of variation of measles incidence compared to estimated mean and CV of measles incidence for countries with population sizes between 5 million and 350,000 from WHO's America Region, ordered by population size. The colored dots represent the simulated mean and coefficient of variation, and the plus signs represent the estimated mean and coefficient of variation. The colors from green to orange represent progression of the canonical path between 1990 and 2017, and black lines represent the means across all 100 simulations.

Fig. S12.
Simulated mean and coefficient of variation of measles incidence compared to estimated mean and CV of measles incidence for countries with population sizes less than 350,000 from WHO's America Region, ordered by population size. The colored dots represent the simulated mean and coefficient of variation, and the plus signs represent the estimated mean and coefficient of variation. The colors from green to orange represent progression of the canonical path between 1990 and 2017, and black lines represent the means across all 100 simulations.

Fig. S13.
Characterizing the canonical path to elimination using median incidence. Similar to Fig. 1A in main text, but here the median is used to calculate the average position, rather than the mean as in Fig. 1. Dashed arrow shows the mean position of the Americas from 2015-2017.

Fig. S14.
Characterizing the canonical path to elimination using incident cases corrected for underreporting. Similar to Fig. 1A in main text, using estimated cases corrected for under-reporting rather than reported cases as in Fig. 1. Dashed arrow shows the mean position of the Americas from 2015-2017.

Fig. S15.
The canonical path along with a more granular version of the path, with equally spaced points. In color we see the original path scaled so that there is more resolution at the lower end of the incidence scale. The small black dots show a more granular version of the path, with the steps between the points all having the same size.  Expected mean and coefficient of variation of incidence given a range of birth rates, vaccination coverage levels, and susceptible recruitment. Output is based on a statistical generalized additive model using reported cases (in comparison to Fig. S23 which uses estimated cases corrected for underreporting). Values ranged for birth rate per 1,000 people (7 to 56; br, green points), vaccination coverage (0 to 99%; vacc, blue triangles), and susceptible recruitment per 1,000 (0 to 48; br(1-vacc), red squares). Generalized additive models were fit to data from: (A) all regions combined, and (B) all regions excluding Europe. Generally, over time we moved toward the lower right of the figures A and B, with birth rate and susceptible recruitment decreasing and vaccination coverage increasing. (C) Association between incidence or coefficient of variation with the birth rate and vaccination coverage in Africa. We recovered the expected positive correlation between birth rate and incidence and the negative correlation between vaccination and incidence. When birth rates decreased, variation increased, as with lower birth rates we recruited fewer susceptibles into the population, reducing the predictability of dynamics. Similarly increasing vaccination coverage diminished susceptible recruitment, and hence drove bigger uncertainty in the dynamics.