PerspectiveComputer Science

Future Science

See allHide authors and affiliations

Science  04 Oct 2013:
Vol. 342, Issue 6154, pp. 44-45
DOI: 10.1126/science.1245218

The massive growth of global research activity in recent years has spurred studies exploring how productive this expansion has been and what the future may hold. Although the creativity and serendipity of individual discoveries will remain difficult to model, quantitative research has revealed regularities in the rates of discovery and the outcome of published findings over time (1). Some of these studies demonstrate that innovation has been decreasing, which may reflect the output of a scientific enterprise whose system of input has supported the pursuit of research focused on “low-hanging fruit” (2). Coincident with this decline has been the recent global recession, a circumstance that called for accountability of economic and social returns from public investments in research (3). There is now a great demand for insight into how the system of science works. On page 127 of this issue, Wang et al. (4) offer one approach to assess, and perhaps even augment, scientific productivity.

An emerging area of interest in research on the “science of science” is the prediction of future impact. Impact prediction is consequential for the evaluation of research grants, the dispensing of scholarly awards, and the determination of faculty salaries, among other decisions. As predictions improve, they will play a larger role in directing choices about what areas public and private capital will choose to research, develop, and produce. But how can we predict the future?

A number of recent studies have honed our understanding of the factors that influence future citations to an article or researcher. For example, citations accrue to articles published in scientific journals over time according to a well-behaved log-normal distribution, with a rise in citations at the point of publication followed by a gradual decay (5). There is also a strong “first-mover” advantage to the receipt of citations—that is, an early mediocre article on a topic will often receive more citations than a later excellent one (6). There are still other factors that predict citations to a scientist's complete oeuvre, such as the subject diversity of journals in which the scientist's early articles were published (7). Wang et al. advances these efforts by integrating three key features into a generative model of an article's long-term impact: preferential attachment, a temporal citation trend, and the underlying “fitness” of the paper. Their model links previous models of what accounts for the popularity of Web sites [the preferential attachment with which other Web sites hyperlink to it (8) and the “quality” of the Web sites's content, reflected in its utility for Internet users (9)] and the log-normal probability with which an article's influence rises then falls over time. The authors combined these elements into a three-parameter model with a number of clear, derivable, and empirically estimable quantities associated with an individual article. These include the article's relative fitness (its importance relative to its peers), immediacy (the time required to reach its citation peak), longevity (its rate of citation decay), impact time (the characteristic time to attract the bulk of its citations), and ultimate impact (the citations it will acquire over its lifetime, which depends only on its relative fitness). The resulting model was used to predict later citations on the basis of early patterns. By accounting for these different features, all nonzero citation trajectories followed a similar path. Moreover, the authors generalize the model to the level of journals (as they could for researchers, departments, universities, countries, or the scientific system as a whole) to explain how a recent drop in the relative impact of a well-known journal can be solely accounted for by the rising impact time associated with its articles.

Winners foreseen.

Methods to predict future citations of an article could improve the overall productivity of the scientific enterprise.


We should consume these explanations and the concepts underlying them with caution (as we should with any simplifying model). For example, “fitness,” which is well-named, is not merely quality—the “inherent” or timeless value of an article to science—but the fit between an article and its perceived importance to an audience at the time of publication. Fitness is neither “inherent” nor timeless. With relevant developments in other areas of science, an article's ideas may become fit. The model of Wang et al. is also not magical; its estimation of the future improves with more past data. Moreover, citations represent only a single dimension of impact, because nonacademic practitioners may be influenced by an article but cannot cite it. Nonetheless, the model assembles a number of important factors for predicting future citation success.

The success of this model and others that will build on it raises the question of how predicting an article's success could change science. Widespread, consistent prediction of an idea's future impact will necessarily speed the resolution of which ideas win and which lose in the competition for attention and resources. Because we cannot know everything about the future, knowing only the momentum of an article's reception could act as a self-fulfilling prophecy—thus, scientists and funders could prematurely abort ideas that may yet have a second or third act to play. The classic work by mathematicians Erdős and Rényi on random networks, which much later became the foundation for popular approaches in network science, is such an example; Wang et al. admit that their model could not have predicted future citations of this seminal work. The ability to better predict an article's success could translate into a faster scientific life cycle for the discovery—from time of publication to widespread acceptance. This might then translate into faster convergence to best practices that would boost the number of scientists with skills required to build on an impactful discovery. This could happen, however, only because rapid swarming around new ideas will increase competition for follow-on research and publication, as a larger proportion of scientists pursue the same anointed path.

These cautions, however, do not apply to all that more successful impact prediction portends. The ability to automatically extract scientific claims from research articles and reason across them should lead to the prediction or computational generation of promising new hypotheses. It likely will also expose common assumptions and practices of science to scrutiny and explicit evaluation (10). In this way, citation prediction represents one step on the path to creating algorithmic or robot “scientists” (11) that are more creative, risky, persistent, and wide-reading than ourselves (12). By enabling scientists to consider not only the most fruitful hypothesis but also the most fruitful algorithm for generating hypotheses, future prediction methods would augment scientific ability, increase productivity, and multiply returns from science for society.


View Abstract

Stay Connected to Science

Navigate This Article