In DepthReproducibility

Sloppy reporting on animal studies proves hard to change

See allHide authors and affiliations

Science  29 Sep 2017:
Vol. 357, Issue 6358, pp. 1337-1338
DOI: 10.1126/science.357.6358.1337

Papers often don't state whether animals were randomized to control and treatment groups.


Closely read any paper on an animal experiment, and you're likely to have many questions. What strain of mice was used, and what were their sex and age? Were animals randomly assigned to control and treatment groups? Was the researcher who examined outcomes blinded to what group they were in? The absence of such details partly explains why between 51% and 89% of animal studies aren't reproducible. It may also help explain why so many treatments reported to work in animals have flopped in humans (Science, 22 November 2013, p. 922). Yet it's proving surprisingly hard to solve the problem.

In 2010, the U.K. National Centre for the Replacement, Refinement & Reduction of Animals in Research (NC3Rs) in London developed a checklist of items that any paper about in vivo research ought to include. More than 1000 scientific journals and two dozen funding agencies have endorsed the so-called ARRIVE guidelines—short for Animal Research: Reporting of In Vivo Experiments. (Science has not officially endorsed them, but encourages authors to comply.) But 7 years later, studies suggest that many scientists are either unaware of the guidelines or are ignoring them.

“We just don't seem to make much progress,” says Merel Ritskes-Hoitinga of Radboud University Medical Center in Nijmegen, the Netherlands, who co-organized a 25 September roundtable in Edinburgh where scientists met with journal editors and funders such as the United Kingdom's Medical Research Council and the Wellcome Trust to discuss ways of speeding up implementation of the guidelines. One problem may be that ensuring compliance can take a lot of work, both for authors and journals.

The 38 items in the checklist provide a “gold standard,” says Malcolm Macleod, a neurologist at the University of Edinburgh who has studied the problems in animal experimentation. The list covers a wide range of issues, from a paper's title and study design to how the animals were cared for, results, and conflicts of interest. But a 2014 survey showed almost no improvement in reporting in journals of Nature Publishing Group (NPG) and PLOS during the first 2 years after the guidelines were introduced, even though both publishers had endorsed ARRIVE. That study's last author, Sandra Amor of VU University Medical Center in Amsterdam, says that an as-yet-unpublished analysis shows that things weren't much better in the 2012–15 period.

Macleod and colleagues have tested one potential strategy for improving compliance. They devised a randomized controlled trial of almost 1700 scientists who submitted a paper to PLOS ONE. Half were told that they needed to fill out an ARRIVE checklist—and indicate where in their manuscript they had addressed each item—before peer review could begin. For the other half, everything was business as usual. Neither the authors nor PLOS ONE's reviewers were told that the experiment was taking place.

The results, reported on 11 September by Macleod's senior postdoc Emily Sena at the Eighth International Congress on Peer Review in Chicago, Illinois, weren't encouraging. Papers from the “treatment group” improved on only two ARRIVE items: husbandry and housing. For all others, authors had duly filled out the checklist but their papers were not actually more compliant. “Apparently, a checklist alone doesn't help much,” Ritskes-Hoitinga says. One reason may be that authors don't know what is expected of them, she says; they may think they have appropriately described a water maze, for example, but have left out key details such as water temperature.

Another reason may be that PLOS ONE editors didn't enforce compliance, Macleod says. A study he presented at the same meeting (also published as a preprint on bioRxiv on 12 September) showed that the adoption of a reporting checklist at NPG in 2013 led to significant improvements in reporting; Macleod believes that's because editors made sure the authors complied. “They went back to the authors four, five, or six times, if necessary,” he says. (NPG's checklist covers all of the life sciences, not just animal research.)

Focusing enforcement on the most important items could speed up improvements, Macleod argues. (For him, those items would include randomization of animals, blinding of researchers, whether any animals were dropped from the analysis, and whether the researchers calculated the sample size before they began.) But Nathalie Percie du Sert, program manager for experimental design at NC3Rs, disagrees: “I don't think you can prioritize one item over the other.”

Participants at this week's roundtable, held on the sidelines of an international meeting about reproducibility in animal research, agreed that animal researchers can learn from human trials, where guidelines, education, and other measures have slowly helped improve reporting. NC3Rs will organize another workshop in London in November and has surveyed thousands of scientists and interviewed editors to get a handle on the problem. “You have to change the behavior of scientists and educate them about why this matters,” Percie du Sert says. “That doesn't happen overnight.”

View Abstract

Stay Connected to Science

Navigate This Article