Evaluating replicability of laboratory experiments in economics

See allHide authors and affiliations

Science  25 Mar 2016:
Vol. 351, Issue 6280, pp. 1433-1436
DOI: 10.1126/science.aaf0918

eLetters is an online forum for ongoing peer review. Submission of eLetters are open to all. eLetters are not edited, proofread, or indexed.  Please read our Terms of Service before submitting your own eLetter.

Compose eLetter

Plain text

  • Plain text
    No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Author Information
First or given name, e.g. 'Peter'.
Your last, or family, name, e.g. 'MacMoody'.
Your email address, e.g.
Your role and/or occupation, e.g. 'Orthopedic Surgeon'.
Your organization or institution (if applicable), e.g. 'Royal Free Hospital'.
Statement of Competing Interests

This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.

Vertical Tabs

  • RE: Comment on Camerer et al. (2016) “Evaluating replicability of laboratory experiments in economics.”
    • Keith Marzilli Ericson, Assistant Professor, Boston University Questrom School of Business
    • Other Contributors:
      • Andreas Fuster, Federal Reserve Bank of New York

    We applaud the effort of Camerer et al. (2016) to replicate studies in experimental economics. We were pleased to see that the results from the replication study strongly support the findings from Ericson and Fuster (2011). However, we believe the classification of our study as “not replicated” is misleading because the authors focus on one particular statistical test from our paper that is both less powerful and more restrictive than regression specifications reported in our paper and in Camerer et al.’s replication report. These regressions strongly replicate the results from our original paper, both in terms of statistical significance and in terms of magnitudes of the effect.

    Camerer et al. (2016)’s report* of Ericson and Fuster shows that the effect studied is significant at p<0.01 in the paper’s preferred regression specifications (see Table 2, columns (2) to (5)) and the estimated effect size is almost identical. (Similarly, the simple comparisons of willingness to accept (WTA) for a mug, and also of log(WTA) across treatments are significant at p<0.01). However, for classification purposes, Camerer et al. focus on a test that is less powerful than the regression and imposes undesirable functional form restrictions: the t-test for whether the difference in log(WTA) for a mug versus log(WTA) for a pen is the same in the low versus high expectations conditions. This t-test gives a p-value of 0.055, and hence counts as “not replicated” relative to the p=0...

    Show More
    Competing Interests: None declared.

Stay Connected to Science

Navigate This Article