A hard look at global health measures

Science  12 Sep 2014:
Vol. 345, Issue 6202, pp. 1260-1265
DOI: 10.1126/science.345.6202.1260

Researchers seek convincing evidence that large-scale projects save lives.

Since 2002, rich countries have poured more than $10 billion into malaria control. The money has helped pay for planeloads of bed nets treated with insecticides, hundreds of millions of doses of a powerful combination therapy, widespread indoor spraying of homes, and prophylactic treatment of pregnant women, an especially vulnerable group. The generous, large-scale programs have saved the lives of hundreds of thousands of people, most of them African children.

Or have they? It may sound strange, but some analysts say we don't really know. Yes, the World Health Organization estimates that between 2000 and 2012, malaria cases have dropped by 25% worldwide and deaths have been cut by 42%. But in April, researchers at the widely respected Center for Global Development (CGD) in Washington, D.C., triggered a fierce debate among malaria experts when they wrote in a blog post that they couldn't find a single study with convincing data that showed how a large-scale intervention directly led to lower numbers of cases or deaths. (CGD and the Disease Control Priorities Network wanted an example for the third edition of Millions Saved, a book that documents proven successes in global health.)

The CGD researchers don't doubt that malaria interventions can work. Controlled clinical trials among several thousands of people have shown with statistical significance that each can reduce cases and deaths. But efficacy in a carefully managed, tightly monitored study does not equal effectiveness in the messiness of the real world. Confusing matters further still, weather patterns, an economic upswing, or improved housing can also have a big impact on disease.

The CGD researchers are part of a growing movement that seeks harder data about the number of lives actually saved by the billions poured into health in poor and middle-income countries. Such evidence is critical, proponents argue. After a decadelong explosion, funding for global health has leveled off (see p. 1258); governments and charities need to know the impact of their dollars to justify their investments and to change programs that don't work well enough or not at all.

The new field of what is called impact evaluation is rapidly gathering steam. Large global health donors and developing world governments have widely accepted that they need better evidence of what works, and several new institutes are devoted to gathering it. Publications that use impact evaluation methods have skyrocketed. It's becoming increasingly difficult for the development assistance world to take credit for changes that might have occurred without their interventions—and to ignore the possibility that the money might have spared more people from disease if spent elsewhere.

“Agencies have come to realize that impact evaluation is the only way you can meaningfully talk about results,” says Howard White, who heads the International Initiative for Impact Evaluation (3ie), a nonprofit launched in 2008. “They want to be able to go back to their funders or boards and say, ‘We've lifted 18 million out of poverty.’”

But debates are raging about what constitutes convincing evidence of effectiveness. Randomistas, as some derisively call them, will only seriously consider supersized versions of the randomized, controlled studies used to evaluate the efficacy of drugs and vaccines. Others, like the researchers at CGD who select case studies for Millions Saved, considered other evidence as well. (The sidebars about specific successes and disappointments in Vietnam, Zambia, South Africa, and Peru in this special news section are based on draft case studies in the upcoming book.)

A few prominent critics, meanwhile, say the new focus on evidence is going way too far. They worry that it diverts money and attention from the actual battle against disease. And rigorous attempts to measure impact can cause unease among major donors, the groups they fund to roll out programs, and disease advocates, says Ruth Levine, a development economist at the William and Flora Hewlett Foundation in Menlo Park, California, who edited the first edition of Millions Saved. “The support for global health rests on a collective hope that money is turning into lives saved, and anything that punctures that belief is really very threatening,” Levine says.

UNTIL 2000, hardly any impact evaluations were done in global health, or for that matter in development aid in general. “If you asked anyone what their impact was—and I don't care whether it was diabetes, hypertension, HIV—the answer would have been, ‘We're spending X amount of dollars,’” says Mark Dybul, who heads the Global Fund to Fight AIDS, Tuberculosis and Malaria, which was formed in 2002.

The Global Fund and the U.S. President's Emergency Plan for AIDS Relief (PEPFAR), which started in 2003 and was later headed by Dybul, together have spent more than $60 billion on HIV/AIDS, and both have received flak for not taking a close enough look at their own impact. For instance, they have long used the number of patients given antiretroviral drugs as a major yardstick of success. But people don't always take their pills, or they may drop out of treatment. So the precise public health impact, as well as the cost-effectiveness of specific programs, remained unclear.

Even if public health does improve after the rollout of a program, there may be no causal relationship. What's needed for a thorough evaluation, says epidemiologist Nancy Padian, who has appointments at both the Berkeley and San Francisco campuses of the University of California (UC), is a way to assess what would have happened if the intervention had not occurred. This is known in the lingo of impact evaluations as a counterfactual, and it's akin to a placebo control in a drug trial. “It's all about having the most robust counterfactual you can have,” says Padian, who was a lead scientific adviser for PEPFAR.

A landmark demonstration of the value of this approach involved a social welfare program launched in Mexico in 1997, called PROGRESA, in which families received cash for keeping their kids in school and using preventive health services. PROGRESA's main architect, Mexican Deputy Finance Minister Santiago Levy, was worried that the next government might shut the initiative down unless hard evidence showed that it worked to improve kids' health. Levy enlisted an evaluation team led by UC Berkeley health economist Paul Gertler.

Gertler proposed taking advantage of the fact that Mexico could not afford to roll out PROGRESA nationwide all at once. He suggested a lottery to determine which communities could participate in the program first. Other villages would start 2 years later; they became the counterfactual. The comparison showed that the cash transfer led to a significant drop in illness and hospital visits among children, and adults benefited, too. The study was “a phenomenal breakthrough,” Levine says, and PROGRESA survived.

RANDOMIZED, CONTROLLED STUDIES create their own counterfactual by randomly assigning participants to intervention or control groups; a mainstay of clinical research, they have rapidly multiplied in global health (see graphic). One leader is the Abdul Latif Jameel Poverty Action Lab (J-PAL), founded in 2003 at the Massachusetts Institute of Technology to do such studies in health and other development projects. J-PAL has since done follow-up evaluations of PROGRESA (renamed Oportunidades) and studied the impacts of hand-washing promotion on diarrhea in Peru, double-fortified salt on anemia in India, and deworming on school attendance in Kenya. 3ie, founded 5 years later, was established to fund evaluations and serve as an online repository of high-quality studies.

While randomized, controlled trials may be the ideal for measuring impact, they aren't always feasible. It's also widely considered unethical to withhold a proven intervention—some of them life-saving—from one group of people simply to test how well a large-scale rollout works. So scientists have developed several less rigorous methods. Eligibility criteria, for example, have a built-in counterfactual: If an intervention applies to people only under 14, kids who just turned 15 become good comparators. Sophisticated techniques can also “match” the group receiving an intervention to an artificial counterfactual created by statisticians. “There is a suite of methods,” Padian says, some of which aren't used in standard drug or vaccine trials but are considered convincing to many, including the CGD editors of Millions Saved.

The Global Fund and PEPFAR have embraced the idea of more rigorously measuring impact. “It's becoming a top priority,” says Deborah Birx, who heads PEPFAR. Today, both programs want to track how many people on HIV treatment have fully suppressed the virus for prolonged periods, which means they're actually healthier. To accomplish this, PEPFAR is distributing computer tablets to clinics and asking them to record—and report in real time—HIV levels in patients on treatment. Dybul says he's also “really adamant right now” about finding out how specific clinics are doing, instead of focusing on national data. That's a great step forward, says Stefano Bertozzi, dean of the School of Public Health at UC Berkeley. “If you know you have clinics in one country that go from 25% to 90% of patients being virally suppressed, as a manager, you have incredible information to know what's working and what isn't,” Bertozzi says.

But the rising popularity of impact evaluations has triggered plenty of debates, which some have slagged as “wonk wars.” A single impact study often doesn't mean much because the results may not apply elsewhere, says Harvard University economist Lant Pritchett, a prominent critic of impact evaluations and a nonresident fellow at CGD. The randomistas tend to overlook the “key failing” in developing countries, he says: Organizations don't work. “The policemen don't police, the teachers don't teach, and the doctors don't doctor,” he says. “We know for sure that varies orders of magnitude across countries of the world.” How then can you assume that a successful intervention in South Africa will translate to Colombia?

The drive to use “easily measured indicators” to claim success and impress donors also worries Michel Kazatchkine, Dybul's predecessor at the Global Fund's helm. Kazatchkine would like to move beyond quantitative indicators to more qualitative ones, like changes in laws or social policies. “Numbers of lives saved is a very American concept,” he says. “The European audience would wish to know about something a little more conceptual than just a number. Have we changed the system and addressed the roots and the causal determinants and insured that the people, in addition to having their lives saved, live a proper life?”

CGD'S EFFORT to weigh malaria interventions stirred new controversy. Malaria control didn't make it into the first two editions of Millions Saved, in 2004 and 2007, which documented triumphs from global ones like smallpox eradication to little-known efforts to combat diarrheal disease in Egypt or trachoma in Morocco. Although malaria has plummeted in many countries, the CGD researchers said none of the existing evaluations met their criteria: a study of a large-scale intervention of at least 2 years duration that demonstrated a clear, causal link to a drop in disease or death. They also wanted to see evidence that the intervention had an acceptable cost based on the number of cases averted or lives saved.

“We know from a bunch of small-scale studies that bed nets can protect you from mosquitoes biting you,” says Amanda Glassman, who heads global health policy at CGD. “That's not what we're interested in evaluating.” In the real world, nets aren't always used, for instance because they're uncomfortable on hot nights or people think there are few mosquitoes around.

The blog posting led to fierce rebuttals from both the U.S. President's Malaria Initiative (PMI) and the Roll Back Malaria Partnership. Erin Eckert, an epidemiologist at PMI, says impact evaluations are critical. But when it comes to the type of national level programs that CGD is evaluating, she says a “rigorous academic definition of impact evaluation is not always necessary or appropriate.” As Eckert and a colleague wrote in a riposte to CGD's blog, “The malaria field is full of examples of solid evaluations of interventions and the impact of scaling up those interventions on malaria burden.”

The CGD researchers eventually met with their critics, including Eckert, to sort through the literature, and they agreed that one large-scale intervention in Zambia had enough evidence that it worked, and thus deserved inclusion in the 2015 edition of Millions Saved. The study, by a team that included researchers from Harvard's School of Public Health and the PATH Malaria Control and Evaluation Partnership in Africa, enrolled 81,600 farmers, half of whom received insecticide-treated bed nets, whereas the other half didn't. There was a nearly 50% drop in self-reported malaria among farmers with the nets.

The debate seems set to continue. “What we've found doing a massive trawl of the literature is that the quality of evidence for well-regarded and well-funded interventions is still pretty poor,” says Miriam Temin, the coordinating editor of the new edition of Millions Saved. It remains difficult for many to accept, she says, that just understanding the effect of a drug, a vaccine, or any other intervention on a human body isn't enough. “We think of the body as something with unknown processes,” Temin says. “Wouldn't it be interesting if we thought of communities that way?”


Navigate This Article