Research ArticlesSCI COMMUN

Quantifying the evolution of individual scientific impact

See allHide authors and affiliations

Science  04 Nov 2016:
Vol. 354, Issue 6312, aaf5239
DOI: 10.1126/science.aaf5239
  • Random-impact rule.

    The publication history of two Nobel laureates, Frank A. Wilczek (Nobel Prize in Physics, 2004) and John B. Fenn (Nobel Prize in Chemistry, 2002), illustrating that the highest-impact work can be, with the same probability, anywhere in the sequence of papers published by a scientist. Each vertical line corresponds to a research paper. The height of each line corresponds to paper impact, quantified with the number of citations the paper received after 10 years. Wilczek won the Nobel Prize for the very first paper he published, whereas Fenn published his Nobel-awarded work late in his career, after he was forcefully retired by Yale. [Image of Frank A. Wilczek is reprinted with permission of STS/Society for Science & the Public. Image of John B. Fenn is available for public domain use on Wikipedia.org.]

  • Fig. 1 Patterns of productivity during a scientific career.

    (A) Publication history of Kenneth G. Wilson (Nobel Prize in Physics, 1982). Horizontal axis indicates the number of years after the scientist’s first publication, and each vertical line corresponds to a research paper. The height of each line corresponds to c10, that is, the number of citations the paper received after 10 years (sections S1.3 and S1.6). The highest-impact paper of Wilson was published in 1974, 9 years after his first publication, and it is the 17th of his 48 papers; hence, t* = 9, N* = 17, and N = 48. (B) Distribution of the highest-impact paper Embedded Image across all scientists. We highlight in blue the bottom 20% of the area, corresponding to low maximum impact scientists Embedded Image; red area indicates the high maximum impact scientists (top 5%, Embedded Image); yellow corresponds to the remaining 75% medium maximum impact scientists Embedded Image. These cutoffs do not change if we exclude review papers from our analysis (see figs. S4 and S36). (C) Number of papers N(t) published up to time t for three scientists with low, medium, and high impact but with comparable final number of papers throughout their career. (D) Distribution of the productivity exponents γ (18). The productivity of high-impact scientists grows faster than does that of low-impact scientists. (E) Dynamics of productivity, as captured by the average number of papers 〈n(t)〉 published each year for high-, average-, and low-impact scientists. t = 0 corresponds to the year of a scientist’s first publication.

  • Fig. 2 Patterns of impact during a scientific career.

    (A) Dynamics of impact captured by the yearly average impact of papers 〈c10(t)〉 for high, medium, and low maximum impact scientists, where t = 0 corresponds to the year of a scientist’s first publications. The symbols correspond to the data, whereas the shaded area indicates the 95% confidence limit of careers where the impact of the publications is randomly permuted within each career. (B) Average impact 〈c10〉 of papers published before and after the highest-impact paper Embedded Image of high-, middle-, and low-impact scientists. The plot indicates that there are no discernible changes in impact before or after a scientist’s highest-impact work. (C) Embedded Image and 〈c10〉 before and after a scientist’s most-cited paper. For each group, we calculate the average impact of the most-cited paper, Embedded Image, as well as the average impact of all papers before and after the most-cited paper. We also report the same measures obtained in publication sequences for which the impact Embedded Image is fixed, whereas the impact of all other papers is randomly permuted. (D) Distribution of the publication time t* of the highest-impact paper for scientists’ careers (black circles) and for randomized-impact careers (gray circles). The lack of differences between the two curves (P = 0.70 for the Mann-Whitney U test between the two distributions) supports the random-impact rule; that is, impact is random within a scientist’s sequence of publication. Note that the drop after 20 years is partly because we focus on careers that span at least 20 years (see fig. S22). (E) Cumulative distribution P(≥N*/N) for scientists with N ≃ 50, where N*/N denotes the order N of the highest-impact paper in a scientist’s career, varying between 1/N and 1. The cumulative distribution of N*/N is a straight line with slope 1, indicating that N has the same probability to occur anywhere in the sequence of papers published by a scientist. The flatness of P(N*/N) (all scientists, inset) supports the conclusion that the timing of the highest-impact paper is uniform. The small differences between the three curves are due to different number of publications N in the three groups of scientists [see fig. S24 for the plot of P(≥N*/N) for other values of N and figs. S25 and S26 for the impact autocorrelation throughout a scientific career].

  • Fig. 3 The Q-model.

    (A) Distribution of the paper impact c10 across all publications in the data set. The gray line corresponds to a log-normal function with average μ = 1.93 and SD σ2 = 1.05 (R2 = 0.98). (B) Distribution of the total number of papers published by a scientist (productivity). The gray line is a log-normal with μ = 3.6 and σ2 = 0.57 [weighted Kolmogorov-Smirnov (KS) test, P = 0.70]. (C) Citations of the highest-impact paper, Embedded Image, versus the number of publications N during a scientist’s career. Each gray point of the scatterplot corresponds to a scientist. The circles are the logarithmic binning of the scattered data. The cyan curve represents the prediction of the R-model, assuming that the impact of each paper is extracted randomly from the distribution P(c10) of Fig. 2A. The red curve corresponds to the analytical prediction (see eq. S35) of the Q-model (R2 = 0.97; see section S4.6 and fig. S29 for goodness of the fit). (D) Embedded Image versus Embedded Image. Each gray point in the scatterplot corresponds to a scientist, where Embedded Image is the average logarithm of her paper impact, excluding the most-cited paper Embedded Image. We report in cyan the R-model prediction and in red the analytical prediction (see eq. S36) of the Q-model (R2 = 0.99; see section S4.6 and fig. S29 for goodness of the fit). (E) Cumulative impact distribution of all papers published by three scientists with the same productivity, N ≃ 100, but different Q. (F) Distribution Embedded Image across all publications. For each paper α of scientist i, we have log pα = log c10,iα − log Qi, where Embedded Image. Therefore, the distribution of Embedded Image, except for a common translational factor μp, corresponds to the distribution of log c10,iα − 〈 log c10,i〉, which is a normal with μ = 0 and σ2 = 0.95 (KS test, p = 0.48). (G) Distribution of parameter Q, P(Q), for all scientists. The gray line corresponds to a log-normal function with μ = 0.93 and σ2 = 0.46 (weighted KS test, p = 0.59). (H) Cumulative distribution of the rescaled impact c10,iα/Qi for the three scientists in (E). The black line corresponds to the universal distribution P(p).The collapse is predicted by Eq. 1.

  • Fig. 4 Careers and their Q parameter.

    (A) Left: Analytically predicted cumulative impact distributions for different Q. The plot also highlights the impact distribution of the three scientists shown in Fig. 2E. The detailed publication record of each scientist is reported on the right, documenting the notable differences between them, given their different Q. (B) Left: Individual cumulative impact distributions P(c10,i). Given the modest number of publications N characterizing most scientists and the impossibility to compute statistically meaningful distributions for many of them, each distribution is computed across all publications of all scientists with the same Qi. The color code captures their Q parameter, as shown in (A). Right: Cumulative distributions of the rescaled impact c10,i/Qi for the scientists, indicating that the individual distributions collapse on the universal distribution P(p).

  • Fig. 5 Stability of the Q parameter.

    (A) Time variation of the Q parameter during individual careers. For scientists with at least 100 papers and Q ≃ 1.2, Q ≃ 3.8, and Q ≃ 6.5, we report QN), measured in a moving window of ΔN = 30 papers. For 75% of the scientists, the fluctuations are because we have a finite number of papers in the moving window, the magnitude of the changes being comparable to that predicted by the model with a constant Q (section S4.9). (B) Fluctuations of the Q parameter in model and data. We study the distribution of the uncertainty, Embedded Image, in both data and synthetic careers with constant QN = 5). For 74.7% of the scientists, the fluctuations are comparable to those of the model. For the remaining 25.3%, the SD is slightly higher than the one predicted by the model. (C) Comparison between early and late Q parameter. We compare the Q parameter at early-career (Qearly) and late-career (Qlate) stage of 823 scientists with at least 50 papers. We measured the two values of the parameters using only the first and second half of published papers, respectively. We perform these measurements on the real data (circles) and on randomized careers, where the order of papers is shuffled (gray shaded areas). For most of the careers, 95.1%, the changes between early- and late-career stages fall within the fluctuations predicted by the null model with randomized paper order, indicating that the Q parameter is stable throughout a career. The observed fluctuations are explained by the finite number of papers in a scientist’s career.

  • Fig. 6 Relation between Q and other impact indicators.

    (A) ROC plot capturing the ranking of scientists based on Q, Ctot, h-index, Embedded Image, and N. Each curve represents the fraction of Nobel laureates versus the fraction of other scientists for a given rank threshold. The diagonal (no-discrimination line) corresponds to random ranking; the area under each curve provides our accuracy to rank high Nobel laureates. The ranking accuracy is reported in the legend, 1 being the maximum. Precision and recall as a function of rank are discussed in section S7. (B) Expected citations to the highest-impact paper, Embedded Image, for a scientist with parameter Q and N publications. The plot illustrates the very low chance of a low Q researcher to publish a high-impact paper. (C) Observed versus predicted growth of the h-index for scientists with different Q. The plot documents the agreement between the analytically predicted h-index (eq. S38, continuous line) and the observed value 〈h(N)〉, obtained by averaging the h-index for scientists with the same Q (circles). (D) Top: Growth of the h-index for two scientists with at least 200 papers and different Q as a function of the productivity N (blue circles), compared with the prediction of eq. S38 (red line). Bottom: For the two scientists in the top panels, we measure the cumulative number of citations as a function of N, Ctot (N), and compare with the prediction of eq. S39. The close agreement between observation and prediction in (C) and (D) shows that the time-independent Q captures an intrinsic property of a scientist and that other indicators, like the h-index or cumulative citations, are uniquely determined by Q and productivity. (E) For two scientists, we show the h-index prediction as a function of N using only early-career information, namely, N0 = 20 (top) and N0 = 50 (bottom), to estimate the Q parameter. Although the initial h-index up to N0 = 20 highly overlaps for the two scientists, their long-term impact diverges, a difference accurately predicted by the Q-model. (F) Scatterplots of predicted and real h-index at N = 60 based on Q estimated at N0 = 20. The error bars indicate prediction quartiles (25 and 75%) in each bin and are colored green if y = x lies between the two quartiles in that bin and red otherwise. The circles correspond to the average h-index in that bin. (G) The zN score for each scientist captures the number of SDs the real h-index deviates from the most likely h-index after N publications. zN ≤ 2 indicates that the real data are within the prediction envelope.

Supplementary Materials

  • Quantifying the evolution of individual scientific impact

    Roberta Sinatra, Dashun Wang, Pierre Deville, Chaoming Song, Albert-László Barabási

    Materials/Methods, Supplementary Text, Tables, Figures, and/or References

    Download Supplement
    • Materials and Methods
    • Supplementary Text
    • Figs. S1 to S49
    • References

    Additional Data

    Data S1

Navigate This Article