Education ForumASSESSMENT

Standardized Tests Predict Graduate Students' Success

See allHide authors and affiliations

Science  23 Feb 2007:
Vol. 315, Issue 5815, pp. 1080-1081
DOI: 10.1126/science.1136618

Accurately predicting which students are best suited for postbaccalaureate graduate school programs benefits the programs, the students, and society at large, because it allows education to be concentrated on those most likely to profit. Standardized tests are used to forecast which students will be the most successful and obtain the greatest benefit from graduate education in disciplines ranging from medicine to the humanities and from physics to law. However, controversy remains about whether such tests effectively predict performance in graduate school. Studies of standardized test scores and subsequent success in graduate school over the past 80 years have often suffered from limited sample size and present mixed conclusions of variable reliability.

Several meta-analyses have been conducted to extract more reliable conclusions about standardized tests from a variety of disciplines. To date, these review studies have been conducted on several tests commonly used in the United States: the Graduate Record Examination (GRE-T) (1), Graduate Record Examination Subject tests (GRE-S) (1), the Law School Admissions Test (LSAT) (24), the Pharmacy College Admissions Test (PCAT) (5), the Miller Analogies Test (MAT) (6), the Graduate Management Admissions Test (GMAT) (7), and the Medical College Admissions Test (MCAT) (8, 9).

We collected and synthesized these studies. Four consistent findings emerged: (i) Standardized tests are effective predictors of performance in graduate school. (ii) Both tests and undergraduate grades predict important academic outcomes beyond grades earned in graduate school. (iii) Standardized admissions tests predict most measures of student success better than prior college academic records do (15, 7, 8). (iv) The combination of tests and grades yields the most accurate predictions of success (14, 7, 8).

Structure of Admissions Tests

Most standardized tests assess some combination of verbal, quantitative, writing, and analytical reasoning skills or discipline-specific knowledge. This is no accident, as work in all fields requires some combination of the above. The tests aim to measure the most relevant skills and knowledge for mastering a particular discipline. Although the general verbal and quantitative scales are effective predictors of student success, the strongest predictors are tests with content specifically linked to the discipline (1, 5).

Estimating Predictive Validity

The predictive validity of tests is typically evaluated with statistics that estimate the linear relationship between predictors and a measure of academic performance. Meta-analyses synthesizing primary studies of test validity aggregate Pearson correlations. In many primary studies, the correlations are weakened by statistical artifacts, thus contributing to misinterpretation of conclusions. The first attenuating factor is the restriction of range that occurs when a sample is selected on the basis of a predictor variable that has a nonzero correlation with an outcome measure (10). The second attenuating factor is unreliability in the success measure resulting from inconsistency in human judgment (11). Where possible, recognized corrections were used (12) to account for these artifacts.

Tests as predictors.

Standardized test scores correlate with student success in graduate school. See table S1 for detailed data.

Research has been conducted on the correlation between test scores and various measures of student success: first-year grade point average (GPA), graduate GPA, degree attainment, qualifying or comprehensive examination scores, research productivity, research citation counts, licensing examination performance, and faculty evaluations of students. These results are based on analyses of 3 to 1231 studies across 244 to 259,640 students. The programs represented include humanities, social sciences, biological sciences, physical sciences, mathematics, and professional graduate programs in management, law, pharmacy, and medicine. For all tests across all relevant success measures, standardized test scores are positively related to subsequent measures of student success [see chart, table S1, and supporting online material (SOM) text].

Utility of Standardized Tests

The actual applied utility of predictors is not easily inferred from the correlations shown in the chart. The number of correct and incorrect admissions decisions for a specific situation can be estimated from the correlation (13) (SOM text). In many cases, frequency of correct decisions can be increased from 5% to more than 30% (SOM text). In addition, correlations were converted into odds ratios using standard formulae (14) to facilitate interpretation (table S1). When an institution can be particular about whom it admits, even modest correlations can yield meaningful improvements in the performance of admitted students.

The worry that students' scores might contaminate future evaluations that, in turn, influence outcomes appears to be unfounded. The predictive validity of tests when evaluators do or do not know the individual's score is unaffected (15, 16) and, some outcomes, such as publication record, are not directly influenced by test scores.

Bias in Testing

One concern is that admissions tests might be biased against certain groups, including racial, ethnic, and gender groups. To test for bias, regression analyses of an outcome measure are compared for different groups (12, 17). If regression lines do not differ, then there is evidence that any given score on the test is associated with the same level of performance in school regardless of group membership. Overall and across tests, research has found that regression lines frequently do not differ by race or ethnic group. When they do, tests systematically favor minority groups (1823). Tests do tend to underpredict the performance of women in college settings (2426) (SOM text) but not in graduate school (1820, 23).

Items from most professionally developed tests were screened for content bias and for differential item functioning (DIF). DIF examines whether performance on an item differs across racial or gender groups when overall ability is controlled. Most items do not display DIF but some content patterns have emerged over time (see SOM text). To avoid negative effects, DIF items can be rewritten, eliminated before finalizing the test, or left unscored. Research has found that the DIF effects remaining in tests have effectively zero impact on decision-making (27).

Coaching Effects in Testing

If test preparation yields large gains, then concerns about the gain's effects on differential access to higher education and the predictive validity of tests are understandable. Scores on any test can be increased to some degree because standardized tests are, like all ability tests, assessments of developed skills and knowledge. The major concern is that coaching may produce gains that are unrelated to actual skill development.

In controlled studies, gains on standardized tests used for college or graduate admissions are consistently modest. The typical magnitude for coached preparation is about 25% of one standard deviation (2833). Longer and longer periods of study and practice are needed to attain further equal increments of improvement. Those item types that have been demonstrated to be most susceptible to coaching have been eliminated from tests (34). Test preparation or retaking does not appear to adversely affect the predictive validity of standardized tests (3537).

Future Directions

Standardized admission tests provide useful information for predicting subsequent student performance across many disciplines. However, student motivation and interest, which are critical for sustained effort though graduate education, must be inferred from various unstandardized measures including letters of recommendation, personal statements, and interviews. Additional research is needed to develop measures that provide more reliable information about these key characteristics.

These efforts will be facilitated with more information about the actual nature of student performance. Researchers have examined a number of important outcomes but have not captured other aspects of student performance including networking, professionalism, leadership, and administrative performance. A fully specified taxonomy of student performance dimensions would be invaluable for developing and testing additional predictors of student performance.

Results from a large body of literature indicate that standardized tests are useful predictors of subsequent performance in graduate school, predict more accurately than college GPA, do not demonstrate bias, and are not damaged by test coaching. Despite differences across disciplines in grading standards, content, and pedagogy, standardized admissions tests have positive and useful relationships with subsequent student accomplishments.

References and Notes

  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
  31. 31.
  32. 32.
  33. 33.
  34. 34.
  35. 35.
  36. 36.
  37. 37.

Stay Connected to Science

Navigate This Article