## Scientific impact—that is the *Q*

Are there quantifiable patterns behind a successful scientific career? Sinatra *et al.* analyzed the publications of 2887 physicists, as well as data on scientists publishing in a variety of fields. When productivity (which is usually greatest early in the scientist's professional life) is accounted for, the paper with the greatest impact occurs randomly in a scientist's career. However, the process of generating a high-impact paper is not an entirely random one. The authors developed a quantitative model of impact, based on an element of randomness, productivity, and a factor *Q* that is particular to each scientist and remains constant during the scientist's career.

*Science*, this issue p. 596

## Structured Abstract

### INTRODUCTION

In most areas of human performance, from sport to engineering, the path to a major accomplishment requires a steep learning curve and long practice. Science is not that different: Outstanding discoveries are often preceded by publications of less memorable impact. However, despite the increasing desire to identify early promising scientists, the temporal career patterns that characterize the emergence of scientific excellence remain unknown.

### RATIONALE

How do impact and productivity change over a scientific career? Does impact, arguably the most relevant performance measure, follow predictable patterns? Can we predict the timing of a scientist’s outstanding achievement? Can we model, in quantitative and predictive terms, scientific careers? Driven by these questions, here we quantify the evolution of impact and productivity throughout thousands of scientific careers. We do so by reconstructing the publication record of scientists from seven disciplines, associating to each paper its long-term impact on the scientific community, as quantified by citation metrics.

### RESULTS

We find that the highest-impact work in a scientist’s career is randomly distributed within her body of work. That is, the highest-impact work can be, with the same probability, anywhere in the sequence of papers published by a scientist—it could be the first publication, could appear mid-career, or could be a scientist’s last publication. This random-impact rule holds for scientists in different disciplines, with different career lengths, working in different decades, and publishing solo or with teams and whether credit is assigned uniformly or unevenly among collaborators.

The random-impact rule allows us to develop a quantitative model, which systematically untangles the role of productivity and luck in each scientific career. The model assumes that each scientist selects a project with a random potential *p* and improves on it with a factor *Q*_{i}, resulting in a publication of impact *Q*_{i}*p*. The parameter *Q*_{i} captures the ability of scientist *i* to take advantage of the available knowledge in a way that enhances (*Q*_{i} > 1) or diminishes (*Q*_{i} < 1) the potential impact *p* of a paper. The model predicts that truly high-impact discoveries require a combination of high *Q* and luck (*p*) and that increased productivity alone cannot substantially enhance the chance of a very high impact work. We also show that a scientist’s *Q*, capturing her sustained ability to publish high-impact papers, is independent of her career stage. This is in contrast with all current metrics of excellence, from the total number of citations to the *h*-index, which increase with time. The *Q* model provides an analytical expression of these traditional impact metrics and allows us to predict their future time evolution for each individual scientist, being also predictive of independent recognitions, like Nobel prizes.

### CONCLUSION

The random-impact rule and the *Q* parameter, representing two fundamental characteristics of a scientific career, offer a rigorous quantitative framework to explore the evolution of individual careers and understand the emergence of scientific excellence. Such understanding could help us better gauge scientific performance and offers a path toward nurturing high-impact scientists, potentially informing future policy decisions.

## Abstract

Despite the frequent use of numerous quantitative indicators to gauge the professional impact of a scientist, little is known about how scientific impact emerges and evolves in time. Here, we quantify the changes in impact and productivity throughout a career in science, finding that impact, as measured by influential publications, is distributed randomly within a scientist’s sequence of publications. This random-impact rule allows us to formulate a stochastic model that uncouples the effects of productivity, individual ability, and luck and unveils the existence of universal patterns governing the emergence of scientific success. The model assigns a unique individual parameter *Q* to each scientist, which is stable during a career, and it accurately predicts the evolution of a scientist’s impact, from the *h*-index to cumulative citations, and independent recognitions, such as prizes.

Productivity, representing the number of publications authored by a scientist over time, and impact, often approximated by the number of citations a publication receives (*1*–*4*), are frequently used metrics to gauge a scientist’s performance. Despite their widespread use, we lack a quantitative understanding of the patterns these metrics follow during a scientist’s career (*5*). This is particularly alarming (*6*–*11*), given that they are increasingly adopted for academic assessment (*4*, *11*) and serve as the input for numerous indicators, like the *h*-index and its variants, which are frequently used to compare individual performance (*12*–*14*). Given the increasing interest in predicting the value of these indicators (*5*, *15*), here we ask: How do impact and productivity change over a typical scientific career? Does impact, arguably the most relevant performance measure, follow predictable patterns? Can we predict the timing of a scientist’s outstanding achievement? Can we untangle the role of impact, productivity, and luck within a scientific career?

To address these questions, we reconstruct the publication profile of scientists from multiple disciplines and associate each of their publications with an impact, as captured by *c*_{10}, the number of citations 10 years after publication (Fig. 1A; see Methods and section S1).

Motivated partly by the exceptional awareness of a scientist’s highest-impact work (*16*, *17*), like radioactivity for Marie Curie and the double helix for Watson and Crick, we identified for each researcher her most-cited paper, , that is, the paper with the highest number of citations 10 years after its publication. The distribution for the studied scientists indicates that only 5% have ; hence, most scientific careers have limited maximal impact. To systematically distinguish the careers on the basis of their peak impact, we group each scientist into high maximum impact (top 5%, ), low maximum impact (bottom 20%, ), and medium maximum impact (middle 75%, ) categories (Fig. 1B and section S2).

## Productivity and impact patterns in scientific careers

The total number of papers scientist *i* publishes up to time *t* after her first publication, *N*_{i}(*t*), asymptotically follows (Fig. 1C) (*18*). Hence, yearly productivity, *n*_{i}(*t*), follows the same scaling with exponent (γ_{i} − 1) (fig. S5). Yet, the scaling exponent is different for low-, medium-, and high-impact scientists (Fig. 1C). We find that for low-impact scientists, 〈γ〉 = 1.55, indicating on average a steady increase in their productivity. The increase is much faster for high-impact researchers, for whom 〈γ〉 = 2.05 (Fig. 1D). These trends are also confirmed by changes in the yearly productivity 〈*n*(*t*)〉: For high-impact scientists, productivity increases almost threefold during their career, whereas the increase is modest for low-impact scientists (Fig. 1E). Together, Fig. 1 (D and E) indicates that productivity changes throughout a scientific career. We find, however, that this trend is modulated by impact: Productivity growth is more pronounced for high-impact scientists and is modest for low-impact scientists (Fig. 1, C to E).

As Fig. 2A indicates, impact appears to follow similar patterns to productivity (Fig. 1E): Although *c*_{10} increases during a high-impact scientist’s career, an increase is hardly noticeable for average and low-impact individuals. Yet, we observe a markedly different pattern if we examine the impact in the vicinity of *t**, the publication time of the most-cited work . Plotting 〈*c*_{10}〉 for the sequence of papers before and after an individual’s most-cited paper, (Fig. 2B), we do not see a gradual increase in impact as a scientist approaches *t**, nor do we observe elevated citations after this breakthrough. Instead, the observed pattern exhibits a singular behavior. This singularity could be a simple result of averaging random- impact fluctuations present in careers. We find, however, that the result is robust if we use a moving average or consider only the publication with maximum impact in a rolling window (section S2.1 and fig. S6) and is validated using the fitting hypothesis , lacking differences in α_{i} before or after *t** (section S2.2 and fig. S7) (*19*). Also, the papers published before and after *t** show no discernible differences in their average number of citations (Fig. 2C). Finally, we randomize each career by leaving all productivity measures [total number of papers, *N*_{i}, and *n*_{i}(*t*)] unchanged but shuffling the impact of each paper within each career (Fig. 2C). The lack of differences between the original and the randomized careers supports our overall conclusion: There are no detectable changes in impact leading up to or following a scientist’s highest-impact work. We tested the robustness of this measure for different samples of scientists (figs. S8 and S9), for different definitions of impact (section S1.6 and fig. S10), and in data sets where we attribute different impact shares to each author of a paper (section S6 and fig. S11), arriving at the same conclusion. Yet, we cannot exclude that there are other choices of impact variables or data-set selection that can detect patterns before or after the highest-impact paper.

To understand when a scientist publishes her most important work, we measured the probability *P*(*t**) that the highest-impact paper is published at time *t** after a scientist’s first publication (Fig. 2D). The high *P*(*t**) between 0 and 20 years indicates that most scientists publish their highest-impact paper early or midcareer. The drop in *P*(*t**) after 20 years suggests that it is unlikely that a scientist’s most-cited work will come late in her career, a result well documented by the literature about creativity (see section S3.1) (*20*, *21*). To understand the origin of this pattern, we shuffled *c*_{10} among all papers published by the same scientist, preserving the scientist’s time-dependent productivity and paper-by-paper impact and randomizing only the order of her publications. The fact that *P*(*t**) for these synthetic careers is indistinguishable from the original data (Fig. 2D) indicates that variations in *P*(*t**) are not due to specific impact sequences or other features but are entirely explained by year-by-year variations in productivity throughout a career (fig. S12) (*20*, *21*).

These results prompted us to explore the position *N** of the highest-impact paper in the sequence of *N* publications of a scientist by measuring *P*(*N**/*N*), that is, the probability that the most-cited work is early (*N**/*N* small) or late (*N**/*N* ≃ 1) within the sequence of papers published by a scientist. We find that *P*(*N**/*N*) is flat (Fig. 2E, inset), a finding supported by the cumulative *P*(≥*N**/*N*) (Fig. 2E), which decreases independently of impact as (*N**/*N*)^{−1}, in line with a uniform *P*(*N**/*N*). Together, we arrive at a rather unexpected conclusion, representing our main empirical finding: Impact is randomly distributed within a scientist’s body of work, regardless of publication time or order in the sequence of publications. We call this the random-impact rule because it indicates that the highest-impact work can be, with the same probability, anywhere in the sequence of *N* papers published by a scientist. We find that the random-impact rule holds for scientists in different disciplines, with different career lengths, working in different decades, and publishing solo or with teams and whether credit is assigned uniformly or unevenly among collaborators (sections S1.4 and S6.1) (*22*).

The random-impact rule can explain the growing impact during a scientist’s career (Fig. 2A). To see this, we again randomly shuffle the impact of the papers within each career, leaving the individual productivity unchanged. The variations of impact of the randomized careers are indistinguishable from the original data for both high- and low-impact individuals (Fig. 2A). Hence, the growing average impact documented in Fig. 2A is the result of combining the increasing average productivity (Fig. 1E) with the heavy-tailed nature of the citation distribution (*6*, *23*–*25*). 〈*c*_{10}〉 is not stable but increases with the number of publications, resulting in the observed growing impact (Fig. 2A). Hence, growing productivity, rather than increasing ability or excellence, can account for the growth in average impact during a career in science.

The defining role of productivity in the timing of the highest-impact work persists if we extend the analysis to different samples of scientists, not only those with at least 20 years of publication record. We considered different selections of scientists, such as (i) grouping them by different career lengths (figs. S13 and S14), (ii) grouping them by decade of active career (figs. S15 and S16), (iii) removing multiauthored papers (fig. S17), (iv) including only papers published in one subarea of physics (fig. S38), (v) creating no filter and including all scientists (figs. S18 and S19), (vi) using different definitions of impact (figs. S20, S21, and S37), or (vii) considering the six different disciplines in data set (ii) (figs. S22 and S23). In all these cases, the location of the peak of the highest-impact work probability changes, but we never observe a difference with the randomized careers. Hence, the specific shape of *P*(*t**) is only a function of the selection of scientists and of their temporal productivity patterns, whereas impact is always randomly distributed within a scientist’s sequence of publications.

The documented random-impact rule raises an important question: What is the role of a researcher’s own ability, if any, in scientific excellence? We propose two quantitative models to answer this question.

## Random-impact model (R-model)

We can rely on the random impact rule to build a null model of scientific careers: We assume that each scientist publishes a sequence of papers whose impact is randomly chosen from the same impact distribution *P*(*c*_{10}). Consequently, the only difference between two scientists is their overall productivity *N*. With the observed *P*(*c*_{10}) and *P*(*N*) distributions (Fig. 3, A and B) as input, the obtained *R*-model (section S4.2) accurately reproduces the randomness of the impact sequence *P*(*N**/*N*) (Fig. 2E), but it also makes two predictions that are at odds with the data.

(a) Productivity alone begets success: If each paper’s impact is randomly drawn from the same *P*(*c*_{10}), a productive scientist (high *N*) will more likely score a high (see eqs. S7 and S18) (*26*, *27*). However, the *R*-model does not correctly reproduce the observed increase of as a function of *N* (Fig. 3C).

(b) Divergent impact: The higher the average impact of a scientist’s publications without the most-cited publication (Fig. 1A), the higher the impact of the most-cited paper, (Fig. 3D). Hence, papers with truly high impact are published by scientists with a consistent record of high impact. The *R*-model cannot account for this behavior, predicting that diverges when (Fig. 3D), a consequence of the log-normal nature of *P*(*c*_{10}) (section S4.1 and fig. S27).

Failures (a) and (b) prompt us to abandon our hypothesis that research papers are all drawn from the same impact distribution and hence researchers have no distinguishable individual impact, forcing us to explore more closely the relationship between productivity, impact, and chance.

*Q*-model

Crucially, in the *R*-model, scientists with similar productivity have indistinguishable impact. In reality, impact varies greatly between scientists (Fig. 3E), suggesting the existence of a hidden parameter *Q*_{i} that modulates impact, which has a unique value for each scientist *i*.

The log-normal nature of *P*(*c*_{10}) (Fig. 3A) (*24*) indicates the presence of multiplicative processes, prompting us to write the impact *c*_{10,iα} of paper α published by scientist *i* as(1)where *p*_{α} is the potential impact of paper α in the sequence of papers published by scientist *i*. The parameter *Q*_{i} captures the ability of scientist *i* to take advantage of the available knowledge in a way that enhances (*Q*_{i} > 1) or diminishes (*Q*_{i} < 1) the potential impact of paper α. We take the value of this parameter *Q*_{i} to be constant throughout a scientist’s career, a hypothesis we validate later (Fig. 5 and section S4.9). The obtained model assumes that each scientist randomly selects a project with potential *p*_{α} and improves on it with a factor *Q*_{i} that is unique to the scientist, resulting in a paper of impact (Eq. 1). Truly high-impact publications are therefore the result of a high *Q*_{i} scientist selecting by chance a high *p*_{α} project; any scientist, independently of her parameter *Q*_{i}, can publish low-impact papers by selecting a low *p*_{α}.

The stochastic process behind the model (Eq. 1) is determined by the joint probability *P*(*p*, *Q*, *N*), with unknown correlations between *p*, *Q*, and *N*. The log-normal nature of *P*(*c*_{10}) (Fig. 3A) allows us to measure *P*(*p*), finding that it can also be fitted with a log-normal function (Fig. 3F). Assuming that *Q* is also a log-normal (confirmed later), we denote with , obtaining the trivariate normal distribution (μ, Σ). Using a maximum-likelihood approach (see section S4.4), we estimate from the data the mean μ ≡ (μ_{p},μ_{Q},μ_{N}) = (0.92,0.93,3.34) and the covariance matrix(2)

The matrix (Eq. 2) leads to two key predictions:

(i) σ_{p,N} = σ_{p,Q} ≃ 0 indicates that the paper potential impact *p*_{α} is independent of a scientist’s productivity *N*_{i} and her hidden parameter *Q*_{i}. Therefore, scientists select the potential impact of each paper randomly from a *P*(*p*) distribution that is the same for all individuals, being independent of *Q* and *N*, capturing a universal—that is, scientist-independent—luck component behind impact.

(ii) The nonzero σ_{Q,N} indicates that the hidden parameter *Q* and productivity *N* do depend on each other (section S4.4), but its small value also shows that high *Q* is only slightly associated with higher productivity.

The lack of correlations between *p* and (*Q*,*N*) allows us to analytically calculate the dependence of the highest-impact paper on productivity *N* (section S4.10) and on the average impact of the other papers published by the same scientist (see S4.10). The model prediction for and is in excellent agreement with the data (Fig. 3, C and D, and fig. S30), indicating that the hidden parameter *Q* and variations in the productivity *N* can explain the empirically observed impact differences between scientists, correcting the shortcomings of the *R*-model.

In summary, the *Q*-model allows us to generate synthetic sequences of publications, by assigning to each scientist an individual parameter *Q* and a productivity *N*, extracted from the distribution *P*(*Q*, *N*). Each paper in the sequence is assigned an impact calculated as *p* × *Q*, where *p* is randomly drawn from the distribution *P*(*p*), identical for all scientists.

## The measurement and accuracy of the hidden parameter *Q*

The model allows us to calculate the parameter *Q*_{i} from the sequence of publications *c*_{10,iα} of each scientist (section S4.5), obtaining for large *N*_{i} (see eq. S28 for finite *N*_{i} and fig. S28 for the relation between the two estimations of the *Q* parameter)(3)

Given its dependence on log *c*_{10,iα}, *Q* is not dominated by a single high-impact (or low-impact) discovery but captures instead a scientist’s sustained ability to systematically turn her projects into high-impact (or low-impact) publications. For example, although the three scientists in Fig. 3E have the same productivity *N* ≃ 100, Eq. 3 predicts widely different *Q* values for them, namely, *Q* = 9.99, 3.31, and 1.49. These values accurately reflect persistent differences in their sequence of publications: The *Q* = 9.99 researcher consistently publishes high-impact papers, whereas the publications of the *Q* = 1.49 researcher are consistently of limited impact. Hence, the parameter *Q* captures a scientist’s differentiating ability to take random projects *p* and systematically turn them into high-impact (or low-impact) publications.

The *Q*-model makes the unexpected prediction that despite the obvious differences in individual career paths, differences in the impact of individual papers should disappear if we use the reduced variable *p*_{α} = *c*_{10,iα}/*Q*_{i}, a rescaling standard in statistical physics (*28*, *29*). Although the individual *P*(*c*_{10,iα}) distributions differ greatly, the *P*(*c*_{10,iα}/*Q*_{i}) distributions for all scientists collapse into a single universal curve *P*(*p*) (Fig. 4B), confirming the universal nature of impact across all careers (*30*). Finally, the log-normal *P*(*Q*) (Fig. 3G) confirms the model’s mathematical self-consistency.

A fundamental limitation of all metrics used in science is their nonstationarity: Productivity, the cumulative number of citations, and the *h*-index all grow in time, making it difficult to compare individuals at different stages of their career. In contrast, we find that the *Q* parameter is independent of the career stage. To show this, we used a Δ*N* = 30 paper window to measure changes in *Q* during the career of a scientist, observing that the *Q* parameter fluctuates narrowly throughout each career, without systematic changes (Fig. 5A). The magnitude of these fluctuations is explained for 75% of scientists by the stochastic nature of *Q* (section S4.9), because the estimated *Q* parameter lies within the uncertainty envelope provided by the model. In the remaining 25% of the cases, the variation in *Q* is slightly higher than the variation predicted by the stochastic nature of the model (Fig. 5B). However, the magnitude of this surplus variation never exceeds 15%, and the average relative error is always below 10% (section S4.9 and figs. S31 and S32).

Finally, to test the stability of the *Q* parameter throughout the overall career, and not as a function of productivity, *N*, we consider careers with at least 50 papers and calculate their early and late *Q* parameters (*Q*_{early} and *Q*_{late}, respectively) using Eq. 3 on the first and second half of their papers, respectively. In this case, the stochastic uncertainty explains the differences between *Q*_{early} and *Q*_{late} for the large majority of scientists (95.1%, Fig. 5C). Together, these measurements indicate that the *Q* parameter is generally stable throughout a career, allowing us to offer quantitative predictions on the evolution of a scientific career.

## The predictive power of the hidden parameter *Q*

The true value of the *Q* parameter comes in its predictive power:

(i) The *Q* parameter allows us to estimate the number of papers a scientist needs to write so that her highest-impact paper gathers citations (Fig. 6B). We find that scientists with low *Q* (≃1.2) must write at least 100 papers so that one of them gathers on average 30 citations. Yet, a scientist with the same productivity but *Q* = 10 is expected to author a = 250 paper. Doubling productivity will enhance only with seven citations the highest-impact paper of the low-*Q* scientist (*Q* = 1.2), whereas it will boost with more than 50 citations for the high-*Q* scientist. Overall, Fig. 6B documents that for low-*Q* scientists, increased productivity cannot boost substantially the chance of publishing a high-impact work; hence, it is very unlikely that they “get lucky.”

(ii) A scientist’s *h*-index, indicating that her *h* most-cited papers gather at least *h* citations (*12*, *15*), is jointly determined by the *Q* parameter and the productivity *N* (section S4.11). This analytical prediction reproduces not only the observed *h*-index of all scientists (fig. S33B) but also the evolution of the *h*-index during a scientist’s career (Fig. 6, C and D, and fig. S34A). Similar equations describe the cumulative number of citations (Fig. 6D and figs. S33, D to F, and S34B) and the *g*-index (section S4.11), indicating that the traditional performance measures are uniquely determined by *Q*. Given that *Q* is constant in time, we conclude that productivity only can account for career-wide changes in these measures (Fig. 6, C and D).

(iii) By determining the value of *Q* during the early stages of a scientific career, we can use it to predict future career impact. The estimation error Δ*Q* of *Q* decreases with the number of published papers *N* and drops below 10% already after *N* = 20 publications (section S4.12). We can therefore estimate *Q* based on a scientist’s first *N*_{0} published papers in Eq. 3 and then use the analytical expression of the *h*-index and of the total number of citations to predict the future impact of a scientist (section S4.12 and fig. S35). Given the stochastic nature of the *Q*-model, an uncertainty envelope accompanies the most likely value of each impact metric. In Fig. 6E, for two scientists, we show the *h*-index prediction up to *N* = 150 after we estimated *Q* from the first *N*_{0} = 20 (top) and *N*_{0} = 50 (bottom) papers. Although the initial *h*-index overlaps for the two scientists, their long-term impact diverges, a difference accurately predicted by the *Q*-model. Generalizing for a larger sample of scientists, we find a strong correlation between the predicted and observed *h*-index (Fig. 6F). To quantify the *Q* model’s overall predictive accuracy, we measured the fraction of times that the *h*-index falls within the envelope for scientists with at least 100 papers. The *z*_{N} score for each scientist captures the number of SDs the real *h*-index deviates from the most likely *h*-index after *N* publications. We find that 71% of scientists have *z*_{40} ≤ 2 based on *N*_{0} = 20, which improves to 81% for *N*_{0} = 50 and *z*_{70} (Fig. 6G). Together, we conclude that the estimation of the *Q* parameter at early stages has the capability to unveil the long-term career impact.

(iv) To test whether *Q*_{i} correlates with outstanding impact, we ranked scientists on the basis of *Q*, *N*, *C*_{tot}, , and their *h*-index. To validate these rankings, we use a receiver operating characteristic (ROC) plot that measures the fraction of Nobel laureates at the top of the ranked list (Fig. 6A). We find that the *Q*-based ranking predicts Nobel-winning careers most accurately, offering the highest area of all ranking measures (Fig. 6A) and the highest precision and recall (section S7 and fig. S45). Equally notable is the finding that the predictive powers of *C*_{tot},, and the *h*-index are indistinguishable from each other and that the productivity *N* is the least predictive. Similar results are obtained if we use *Q*_{i} to detect Dirac and Boltzmann medalists (figs. S46 and S47). The early-career *Q* has also the best accuracy in predicting Nobel laureates (section S7.1 and fig. S48).

High-impact discoveries often result from collaborative work (*31*–*33*), mixing scientists with different *Q*_{i}. To explore the influence of collaborators (*34*, *35*), we used a credit allocation algorithm (*22*, *36*) to attribute different impact shares to each author. We then repeated our entire analysis, finding that the *Q*-model, with slightly revised parameters, can explain the results (section S6.1 and figs. S40 to S43). Further, we find that *Q*_{i} is robust to the omission of individual collaborators (section S6.2 and fig. S44). Hence, although collaborative and team effects modulate the success of a particular publication, individual collaborators have only limited influence on *Q*_{i}.

## Summary and discussion

In summary, we offer empirical evidence that impact is randomly distributed within the sequence of papers published by a scientist, implying that temporal changes in impact during a scientific career can be explained by temporal changes in productivity, luck, and the heavy-tailed nature of a scientist’s individual impact distribution. This finding allowed us to systematically untangle the role of productivity, luck, and a scientist’s *Q*, predicting that truly high-impact papers require a combination of high *Q* and luck (high *p*) and that high productivity alone has only a limited effect on the likelihood of high-impact work in a scientific career, if it is not associated with high *Q*. The measurable *Q* parameter represents a scientist’s sustained ability to publish high-impact (or low-impact) papers.

Virtually, all currently used metrics of performance change during the career of a scientist, capturing progression, not sustained ability. In contrast, *Q* is constant throughout a scientist’s career for most scientists (76%), and it is not dominated by a single paper or collaborator, being a measure intrinsically linked to an individual. The fundamental nature of the *Q* parameter is supported by the fact that the currently used metrics of success, from the *h*-index to cumulative citations, can be calculated from it. *Q* predicts not only the value but also the time evolution of the traditional impact metrics (Fig. 6, C to F).

All findings presented above are based on a subset of 2887 physicists with a career spanning at least 20 years and a persistent publication record. These scientists have reached a mid- or late-career stage and hence can be considered successful as they survived many selection processes in academia. Although our findings hold in at least six more different disciplines (see section S1.2) and are robust to relaxing the selection criteria (see section S1.4), the studied data sets do not feature young scientists who have left academia early and hence have published only a few papers.

Throughout this work, we have treated long-term impact, as captured by , as an exogenous variable. It seems reasonable, however, that productivity and impact could influence each other. From a mechanistic perspective, for example, some early promising publications might help attract the resources leading to further productivity growth. Early-career impact, quantified with the average 〈*c*_{10}〉 for the first 10 papers of a scientist, is associated with career longevity, indicating that the probability to stay in academia is slightly influenced by the impact of a scientist’s early publications (fig. S49). The *Q*-model also indicates that the overall number of papers in a career weakly correlates with high *Q* (Eq. 2). Although the *Q*-model and the predictions provided here are immune from a possible coupling between early impact and overall productivity (section S5), these preliminary findings call for more measurements and models that can accurately capture the coevolution of short-term early impact and productivity (*37*).

Although *Q* can accurately predict a career impact, the dependence of *Q* on exogenous factors, such as the quality of the education and current institution (*38*, *39*), size of the research community (*24*, *40*), gender (*41*, *42*), dynamics of subfields (*43*, *44*), or publication habits, remains unknown. Mathematically speaking, the model remains the same if the *Q* parameter reflects other factors that characterize a scientist. The various robustness checks we performed to discover possible confounding factors, such as career length, decade, team effects, and the analysis of different disciplines and data sets, have failed to offer a simple, straightforward explanation for the origin of the different *Q* values scientists have. Most likely, the *Q* parameter is affected by multiple factors, rather than a single one, and more information about its nature might be unveiled once other detailed career information, such as grants and awards, will be available and included in the analysis. Nevertheless, the key differentiating factor of *Q* from luck is that it has to be sustained. *Q* is not determined by a single paper, a lucky draw, but by a sustained high performance, throughout the scientist’s career. This is reflected in the 〈log *c*_{10}〉 term in Eq. 3, indicating that a single very high impact paper has only a small impact on *Q*. A scientist needs multiple high *c*_{10} papers to ensure a high *Q*. Uncovering the origin of the *Q* parameter is a promising future goal, which not only could offer a better understanding of the emergence and evolution of scientific excellence but also might improve our ability to train and nurture high-impact scientists.

## Methods

### Data sets

We explore two types of data sets: (i) the publication record of 236,884 physicists publishing in the journal family *Physical Review* from 1893 to 2010 [American Physical Society (APS) data set, see section S1.1 and figs. S1 and S2] and (ii) the combination of 24,630 Google Scholar career profiles with Web of Science (WoS) data, covering 514,896 publications in biology, chemistry, cognitive sciences, ecology, economics, and neuroscience (WoS data set, described in section S1.2 and fig. S3). The results shown in this article refer to 2887 scientists, whose publication record spans at least 20 years, who have at least 10 publications and have authored at least one paper every 5 years, derived from the APS data set (see section S1.3).

Note that the APS data set contains only citations within the *Physical Review* corpus (see section S1.1); for this reason, the specific number citation numbers are systematically smaller compared to the citations reported by the WoS database.

Our findings are also supported by the analysis of different samples of scientists in the APS data set, selected using a number of different criteria (see section S1.4), and by the analysis of all other disciplines in the WoS data set, which are reported in the Supplementary Materials and referenced throughout the article.

### Citation measures

Citation-based measures of impact are affected by three major problems: (i) citations follow different dynamics for different papers (*6*, *45*), (ii) the average number of citations changes over time (*24*) and (iii) citation count is subfield-dependent (*24*). To overcome (i) for each paper, we use the cumulative number of citations the paper received 10 years after its publication, *c*_{10}, as a measure of its scientific impact (*6*, *45*). We can correct for (ii) and (iii) by normalizing *c*_{10} by the average 〈*c*_{10}〉 of papers published in the same year. Because these corrections do not alter our conclusions for the APS data set, we report results without normalization. For the WoS data set, we instead used normalized citation counts.

### Q-model

The stochastic process behind the *Q*-model is determined by the joint probability *P*(*p,Q,N*). The model assumes that a scientist *i* has a productivity *N*_{i} and a parameter *Q*_{i} sampled from the marginal distribution *P*(*Q,N*), and then extracts *N*_{i} values of *p* from the conditional distribution *P*(*p*|*Q,N*). By assuming that *P*(*p*,*Q*,*N*) follows a trivariate log-normal distribution with parameters μ and Σ, we can write the likelihood function _{i} that a scientist *i* with *Q*_{i} and *N*_{i} has a sequence of papers {α} with impact {*Q*_{i}*p*_{α}} (see Eq. 1). Finally, with numerical optimization methods, we identify the maximum of the overall log-likelihood function log , which provides the numerical estimate of μ and Σ reported in Eq. 2 (see also sections S4.3 and S4.4). This approach also estimates *Q*_{i}, obtained by maximizing the likelihood function for each scientist. The maximization provides an analytical expression for *Q*_{i}, which, for large productivity *N*_{i}, converges to Eq. 3 (see section S4.5)

This procedure and the measured parameters allow us to generate synthetic sequences of publications: We first extract an individual parameter *Q* and a productivity *N* from the distribution *P*(*Q,N*). Then, each paper in the synthetic sequence is assigned an impact *pQ*, where *p* is randomly drawn from the distribution *P*(*p*), identical for all scientists.

## Supplementary Materials

www.sciencemag.org/content/354/6312/aaf5239/suppl/DC1

Materials and Methods

Supplementary Text

Figs. S1 to S49

Supplementary data

References

## REFERENCES AND NOTES

**Acknowledgments:**The

*Physical Review*data set can be requested from the APS at http://journals.aps.org/datasets. Data with the result of the disambiguation procedure, described in section S1.1, can be found as supplementary materials. An interactive visualization of the data sets, focusing on the random-impact rule, can be found at www.barabasilab.com/scienceofsuccess/. We thank J. A. Evans, S. Fortunato, S. Lehmann, B. Uzzi, B. Coutinho, S. Gil, E. Guney, J. Huang, J. Menche, F. Simini, M. Szell, and all other colleagues at the Center for Complex Network Research (CCNR) for the valuable discussions and comments. We thank H. Shen for the help with the credit share analysis. R.S. and A.-L.B. were supported by Air Force Office of Scientific Research (AFOSR) grants FA9550-15-1-0077 and FA9550-15-1-0364. A.-L.B. was also supported by the Future and Emerging Technologies Project 317 532 “Multiplex” financed by the European Commission. D.W. was supported by AFOSR grant FA9550-15-1-0162 and a Young Investigator Award. P.D. acknowledges support by the National Foundation for Scientific Research and the Research Department of the Communauté française de Belgique (Large Graph Concerted Research Action). R.S. developed the majority of this work during her stay at the CCNR, supported by the J. S. McDonnell Foundation. All authors designed and did the research. R.S. analyzed the empirical data, developed the models and controls, and performed the calculations. A.-L.B. was the lead writer of the manuscript.