Letters

Stereotype Threat: A Clarification

Science  02 Jun 2006:
Vol. 312, Issue 5778, pp. 1310b-1312b
DOI: 10.1126/science.312.5778.1310b

The Review by D. Lewis (24 June 2005, p. 1871) of the book Gender Differences in Mathematics (1) inadvertently perpetuates misinformation that has appeared elsewhere (24) about a key finding in our study of stereotype threat on an Advanced Placement calculus test (5, 6).

The study investigated whether asking women to record their gender immediately before taking the test elicited stereotype threat and thereby adversely affected their test performance. A quote in the review from a chapter in the book (4) cites an erroneous statistic from our initial technical report (5), corrected in our subsequent article (6), that women asked their gender before the test scored significantly lower than those asked afterward. In fact, the mean difference between the two groups of women was not statistically significant. Another quote from the same chapter repeats an estimate (made by a referee in a privileged review of a draft of our article) that “as many as 2837 additional women per year” (about 47,000 took the test at the time of the study) would pass the test, earning advanced credit for calculus, if they were asked about their gender after the test rather than before it. This estimate is unreliable. It is based on rough calculations and extrapolations from the atypical sample of test takers in the study.

In short, this study of actual test taking, in common with our replication that examined community college students being given placement tests for algebra and arithmetic (6), found no evidence of the deleterious effects of stereotype threat on the performance of women on quantitative tests that have been observed in laboratory experiments (4).

  • *Any opinions expressed in this letter are those of the author and not necessarily of Educational Testing Service.

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.

Response

My use of a quotation including “significantly” without qualification or explanation was indeed ambiguous and open to misinterpretation. Neither gender alone nor the interaction of gender and the experimental-control (E-C) conditions was found to have a sufficiently large effect on AP grades to meet the standard of practical significance used by Stricker and Ward (1). However, the difference in the mean grades for the experimental and control groups of girls does meet a widely accepted standard of statistical significance, whether the means (2.62 and 2.41) reported in the original report (2) or those (2.61 and 2.41) reported in (1) are used.

Assuming normal means, the mean difference in boys' and girls' grades within the experimental group has the 95% confidence interval (3) 0.16 ± 0.18, while the mean difference within the control group is 0.51 ± 0.17. The mean difference for the experimental and control groups of girls is 0.20 ± 0.18; that for boys is −0.15 ± 0.16. (If the stereotype threat/lift hypothesis is correct, the experimental conditions should improve girls' performance and degrade boys'.) By way of comparison, note that in the introduction to their paper, Stricker and Ward describe the mean difference of 0.31 for boys and girls taking the exam in 1995 as substantial [(1), p. 669].

As regards the AP exam, practical significance seems to be the appropriate measure—the costs and possible unintended consequences of attempting to neutralize a very small effect don't seem justified. However, given the inherent limitations of laboratory tests of stereotype threat—the stress and motivation associated with the AP, SAT, and GRE exams presumably aren't easily reproduced in the lab—and the obvious ethical and practical restrictions on experimentation with the actual exams, small but statistically significant differences resulting from apparently subtle changes in the exams may merit consideration as experimental results and as reality checks. If something as seemingly innocuous as filling in the appropriate gender bubble can have an effect on mathematical performance comparable in size to the effect of gender itself, we should think carefully about just how big the gender gap in mathematics really is, and what the effects of the popularization of that gap may be.

References and Notes

  1. 1.
  2. 2.
  3. 3.
  4. 4.

Related Content

Navigate This Article