## Abstract

Ginther and Kahn claim that academics’ beliefs about the importance of brilliance do not predict gender gaps in Ph.D. attainment beyond mathematics and verbal test scores. However, Ginther and Kahn’s analyses are problematic, exhibiting more than 100 times the recommended collinearity thresholds. Multiple analyses that avoid this problem suggest that academics’ beliefs are in fact uniquely predictive of gender gaps across academia.

In a nationwide study of academics in 30 disciplines, we found that the distribution of gender gaps in Ph.D. attainment is predicted by the extent to which practitioners of a given discipline believe that success requires raw, unteachable aptitude (*1*). These field-specific ability beliefs combine with cultural stereotypes linking men, but not women, with such inherent brilliance, thereby decreasing women’s participation. Ginther and Kahn claim that these field-specific ability beliefs are irrelevant to female representation at the Ph.D. level (*2*). According to Ginther and Kahn, what instead accounts for the pattern of gender gaps is the extent to which fields emphasize mathematics-intensive content matter. Ginther and Kahn base this conclusion on regression analyses in which Graduate Record Examination (GRE) scores predict female representation but ability beliefs do not.

Ginther and Kahn’s preferred regression models (namely, 3 and 5) are problematic because they include three variables that are highly redundant with one another: the quantitative GRE score, the verbal GRE score, and their ratio. A more appropriate modeling strategy would be, for example, to include just one of the GRE scores and their ratio, as other researchers have done (*3*). Models with multiple variables that are highly correlated encounter a problem known as multicollinearity, which inflates standard errors and leads to unreliable results (*4*–*6*). A typical means of quantifying multicollinearity is to calculate a variance inflation factor (VIF), which is the ratio between a predictor’s actual variance in a model and its variance, assuming that it was uncorrelated with the other predictors in the model. The statistics literature typically places an upper bound of 10 on the recommended VIFs (*5*, *6*); caution is recommended in interpreting models with VIFs above this threshold. By comparison, Ginther and Kahn’s models 3 and 5 have average VIFs of more than 1100—more than 110 times the conventional threshold. Moreover, the VIFs for the variables of most interest to Ginther and Kahn (namely, the quantitative GRE scores and the ratio of quantitative to verbal scores) have VIFs ranging from 3258 to 5046. As a sign of the problems caused by this extreme form of collinearity, note the unusual values for the standardized regression coefficients of these variables in Ginther and Kahn’s models 3 and 5. For instance, the quantitative:verbal GRE ratio has a standardized coefficient of –17.13 in their model 5. Taken at face value, this coefficient suggests that an increase of 1 standard deviation in the quantitative:verbal ratio would be accompanied by a decrease of 17.13 standard deviations in a field’s percentage of female Ph.D.’s (which, to provide an intuitive metric, is equivalent to a decrease that is more than 17 times the difference in female Ph.D.’s between history and mathematics). Such implausible estimates are not uncommon in highly collinear models (*5*, *6*).

Contrary to Ginther and Kahn’s claims, we found evidence that academics’ ability beliefs predict female representation in a wide range of models that include disaggregated GRE scores, as well as measures of a field’s relative emphasis on mathematics versus verbal ability, and that also have acceptable VIFs. We include 22 such models in Table 1. The top half of the table displays unweighted models, and the bottom half displays weighted models (in keeping with Ginther and Kahn’s models 4 to 6). As analytic weights, we used the inverse of the variance of the field-specific ability beliefs, which was calculated from the raw, individual-level data. These inverse-variance analytic weights allow fields whose ability beliefs are estimated more precisely (that is, with lower variance) to carry more weight in the regression models (*7*). In contrast, Ginther and Kahn used as their analytic weights the number of respondents within each field, which here does not track the precision with which ability beliefs were measured at the field level: There was no correlation between the number of respondents within a field and the variance of the ability belief estimate for that field, *r*(28) = –0.01, *P =* 0.929. There is thus no reason to penalize (i.e., downweight) fields that are small but whose ability beliefs were nevertheless measured with as much precision as those of larger fields. Within each half of Table 1, the models are sorted in increasing order of average VIF; thus, as one moves down the table, the models become more collinear. All of the models in Table 1 include as predictors the ability beliefs of each field’s practitioners and the three competing variables tested in (*1*) (namely, on-campus hours worked, selectivity, and systemizing versus empathizing; these coefficients are not displayed in the table). In addition, the models include various permutations of the following GRE-based variables: the quantitative score, the verbal score, the analytical writing score, the quantitative:verbal ratio score, and the quantitative−verbal difference score. Adjusting for quantitative GRE scores provides a conservative test of our hypothesis, as academics and nonacademics alike believe that success in mathematics depends largely on raw ability (*1*, *8*)*.* Thus, young men and women’s quantitative GRE scores may already reflect the influence of mathematics-specific ability beliefs, so adjusting for these scores in our analyses may underestimate the true impact of ability beliefs on gender gaps in representation.

The results displayed in Table 1 make it clear that academics’ ability beliefs are a significant predictor of female representation above and beyond whether a discipline (i) requires mathematical ability (as indicated by the quantitative GRE score) and (ii) privileges this ability relative to verbal ability (as indicated by the quantitative:verbal ratio or the quantitative−verbal difference) (see, e.g., models 7, 8, 9, 20, and 21). More generally, academics’ beliefs are a statistically reliable predictor of gender gaps in all 22 models with acceptable average VIFs. Additional models in which we weighted the observations by the inverse of the standard error of the field-specific ability beliefs, rather than the inverse of their variance, revealed the same results [*P*s ≤ 0.063 for ability beliefs in parallel versions of models 13 to 23 (not shown)]. Academics’ ability beliefs were also a significant predictor of gender gaps when we used the National Science Foundation’s definition of science, technology, engineering, and mathematics (STEM) disciplines, as supplied by Ginther and Kahn [*P*s ≤ 0.029 for parallel versions of models 1 to 11 and 13 to 23 (not shown)]. It is only in models whose average VIFs are greater than 1000 (models 12 and 24 in Table 1, which parallel Ginther and Kahn’s models 3 and 5) that ability beliefs no longer predict female representation.

In light of these analyses, the claims we made in Leslie, Cimpian *et al*. (*1*) remain valid as originally stated: Fields whose practitioners idolize brilliance and genius have fewer women.