Report

Quantifying Long-Term Scientific Impact

See allHide authors and affiliations

Science  04 Oct 2013:
Vol. 342, Issue 6154, pp. 127-132
DOI: 10.1126/science.1237825
  1. Fig. 1 Characterizing citation dynamics.

    (A) Yearly citation ci(t) for 200 randomly selected papers published between 1960 and 1970 in the PR corpus. The color code corresponds to each papers’ publication year. (B) Average number of citations acquired 2 years after publication (c2) for papers with the same long-term impact (c30), indicating that for high-impact papers (c30 ≥ 400, shaded area) the early citations underestimate future impact. (Inset) Distribution of citations 30 years after publication (c30) for PR papers published between 1950 and 1980. (C) Distribution of papers’ ages when they get cited. To separate the effect of preferential attachment, we measured the aging function for papers with the same number of previous citations (here ct = 20; see also supplementary materials S2.1). The solid line corresponds to a Gaussian fit of the data, indicating that P(ln∆t|ct) follows a normal distribution. (D) Yearly citation c(t) for a research paper from the PR corpus. (E) Cumulative citations ct for the paper in (D) together with the best fit to Eq. 3 (solid line). (F) Data collapse for 7775 papers with more than 30 citations within 30 years in the PR corpus published between 1950 and 1980. (Inset) Data collapse for the 20-year citation histories of all papers published by Science in 1990 (842 papers). (G) Changes in the citation history c(t) according to Eq. 3 after varying the λ, μ, and σ parameters, indicating that Eq. 3 can account for a wide range of citation patterns.

  2. Fig. 2 Evaluating long-term impact.

    (A) Fitness distribution P(λ) for papers published by Cell, PNAS, and PRB in 1990. Shaded area indicates papers in the λ ≈ 1 range, which were selected for further study. (B) Citation distributions for papers with fitness λ ≈ 1, highlighted in (A), for years 2, 4, 10, and 20 after publication. (C) Time-dependent relative variance of citations for papers selected in (A). (D) Citation distribution 2 years after publication [P(c2)] for papers published by Cell, PNAS, and PRB. Shaded area highlights papers with c2∈[5,9] that were selected for further study. (E) Citation distributions for papers with c2∈[5,9], selected in (D), after 2, 4, 10, and 20 years. (F) Time-dependent relative variance of citations for papers selected in (D).

  3. Fig. 3 Quantifying changes in a journal’s long-term impact.

    (A) IF of Cell and NEJM reported by Thomson Reuters from 1998 to 2006. (B) Ultimate impact C (see Eq. 6) of papers published by the two journals from 1996 to 2005. (C) Impact time T (Eq. 7) of papers published by the two journals from 1996 to 2005. (Inset) Fraction of citations that contribute to the IF. (D to F) The measured time-dependent longevity (Σ), fitness (Λ), and immediacy (M) for the two journals. (G) Fitness distribution for individual papers published by Cell (left) and NEJM (right) in 1996 (black) and 2005 (red). (H) Immediacy distributions for individual papers published by Cell (left) and NEJM (right) in 1996 (black) and 2005 (red).

  4. Fig. 4 Predicting future citations.

    (A and B) Prediction envelopes for three papers obtained by using 5 (A) and 10 (B) years of training (shaded vertical area). The middle curve offers an example of a paper for which the prediction envelope misses the future evolution of the citations. Each envelope illustrates the range for which z ≤ 1. Comparing (A) and (B) illustrates how the increasing training period decreases the uncertainty of the prediction, resulting in a narrower envelope. (C) Complementary cumulative distribution of z30 [P>(z30)] (see also supplementary materials S2.6). We selected papers published in 1960s in the PR corpus that acquired at least 10 citations in 5 years (4492 in total). The red curve captures predictions for 30 years after publication for TTrain = 10, indicating that for our model 93.5% papers have z30 ≤ 2. The blue curve relies on 5-year training. The gray curves capture the predictions of Gompertz, Bass, and logistic models for 30 years after publication by using 10 years as training. (D) Goodness of fit using weighted KS test (supplementary materials S3.3), indicating that Eq. 3 offers the best fit to our testing base [same as the papers in (C)] (E and F) Scatter plots of predicted citations and real citations at year 30 for our test base [same sample as in (C) and (D)], using as training data the citation history for the first 5 (E) or 10 (F) years. The error bars indicate prediction quartiles (25 and 75%) in each bin and are colored green if y = x lies between the two quartiles in that bin and red otherwise. The black circles correspond to the average predicted citations in that bin.