Technical Comments

Comment on “Quantifying long-term scientific impact”

See allHide authors and affiliations

Science  11 Jul 2014:
Vol. 345, Issue 6193, pp. 149
DOI: 10.1126/science.1248770

Abstract

Wang et al. (Reports, 4 October 2013, p. 127) claimed high prediction power for their model of citation dynamics. We replicate their analysis but find discouraging results: 14.75% papers are estimated with unreasonably large μ (>5) and λ (>10) and correspondingly enormous prediction errors. The prediction power is even worse than simply using short-term citations to approximate long-term citations.

Wang et al. (1) proposed an elegant model for citation dynamics and have quickly drawn attention, including a feature in Nature News (2), because high prediction power was reported: “With TTrain = 5, only 6.5% of the papers left the prediction envelope 30 years later...” Unfortunately, this is an odd way of evaluating the prediction power because the width of the prediction envelope is ignored. Figure 4A in (1) shows that the predicted number of citations in year 30 for the middle paper is between 100 and more than 300; such a wide range can hardly be said to constitute a “good” prediction, even if no paper left this envelope.

To examine the prediction power of their model (hereafter, WSB), we replicated their analysis using 5-year citation history to predict future citations. We used 1973 Physical Review papers published in 1980 that acquired at least 10 citations in the first 5 years. We first tried a Newton method to solve Eqs. S25 and S26 in the supplementary materials for (1), but results were sensitive to initial choices of the parameters. Thus, we adjusted the optimization method and maximized their objective function (Eq. S22) directly. Plugging Eq. S25 into Eq. S22 yields an objective function only depending on μ and σ. Then, we simply search within a region of μ and σ to find the maximization solution. One important task is to specify the searching region, and the trickiest part is the upper boundary for μ. As the impact time, Embedded Image ≈ exp(μ), “represents the characteristic time it takes for a paper to collect the bulk of its citations,”, we feel safe to assume μ ≤ 5 (Embedded Image≤ 148). Therefore, our final searching region is μ ϵ [–1.00, 5.00] and σ ϵ [0.10, 10.00], with a grid of 0.01.

Our results show that no papers hit the boundaries of σ or the lower boundary of μ, but 14.75% of the papers hit the upper boundary of μ, indicating that their best-fitting μ might be even larger than 5. Note that large μ* is accompanied by large λ* (Fig. 1A), and the ultimate impact, C = m(eλ – 1), “represents the total number of citations a paper acquires during its lifetime.” If λ > 10, then C > 660,764, which is very unrealistic. When μ* = 5, all papers except one have λ* larger than 10, and their citations are enormously overestimated (Fig. 1B).

Fig. 1 Prediction power evaluation.

TTrain = 5. (A) Scatterplot of the optimal λ* and μ* estimated for WSB. (B) Scatterplot of WSB-predicted C30 and true C30; citations are enormously overestimated, with very large μ*. Many observations are outside the vertical-axis boundary. (C) Scatterplot of the optimal λ* and μ* estimated for WSB-with-prior. (D) Scatterplot of WSB-with-prior predicted C30 and true C30. With TTrain = 5, WSB-with-prior failed to find finite values of α and β for optimal solutions, so we adopt the four sets of α and β values reported by Shen et al. (4). (C) and (D) use (α = 4.759, β = 4.440), the one with the smallest MAPE.

One possible explanation for the inclination to pick large μ* and λ* is the limitation of using only 5-year citation history. WSB exhibits an S-shaped curve, with acceleration followed by deceleration in the citation accumulation process. Many papers are still in the initial acceleration stage in the fifth year. Therefore, model estimation based on this short period may mistakenly pick a very large μ* and predict exponential growth all the way out to 148 years.

How good is the prediction? We use a naïve approach as a benchmark: C30 = C5, i.e., assuming that papers receive no further citations after the training period. We compare the true and predicted number of citations in year 30 (C30) and report three evaluation statistics: (i) mean absolute percentage error (MAPE), (ii) Spearman correlation, and (iii) percentage of correctly identified top 10% highly cited papers (3). We report these statistics on two sets of papers: all papers and papers with μ* < 5, i.e., excluding “misbehaving” papers unsuitable for WSB (Table 1). Even after excluding misbehaving papers, MAPE of WSB is 1.98 × 1057, Spearman correlation is 0.58, and only 31.95% of the top papers are correctly identified. The naïve approach outperforms WSB, with a much lower MAPE (0.56), a higher Spearman correlation (0.74), and a higher percentage of correctly identified top papers (58.29%).

Table 1 Prediction power evaluation.

View this table:

If we use 10-year citation history to predict C30, the percentage of misbehaving papers decreases to 1.57%. After excluding misbehaving papers, MAPE decreases to 0.38, and the Spearman correlation and the percentage of correctly identified top papers increase to 0.90 and 71.28%. However, the naïve approach (i.e., C30 = C10) still outperforms WSB.

In response to an earlier version of this Comment, the authors attributed our observed poor performance of WSB to overfitting and claimed that it can be addressed by a conjugate prior method proposed in their latest technique report (hereafter, WSB-with-prior) (4). Indeed, fitting WSB on N papers involves estimating 3N parameters (each paper with three individual parameters: λ, μ, and σ). The WSB-with-prior method incorporates a conjugate prior on λ— i.e., the N individual λ’s follow a gamma distribution, Γ(α, β), thereby reducing the number of estimated parameters from 3N to 2N + 2: one α, one β, N μ’s, and N σ’s. This overfitting issue and the conjugate prior method were not mentioned in Wang et al. (1). Nevertheless, we implemented WSB-with-prior. Unfortunately, its performance is still inadequate.

We first fit WSB-with-prior using the 5-year citation history, but it failed to find finite values of α and β for optimal solutions. This reveals anther issue of their methods (with or without prior): The maximum likelihood estimator does not always exist when the parameter space is not compact. Even if we adopt the four sets of α and β values reported by Shen et al. (4) and report their best evaluation statistics, the naïve approach still outperforms (Table 1).

With TTrain = 10, we found a finite solution: α* = 4.04 and β* = 3.83. Compared with the naïve approach, WSB has a smaller MAPE (0.27 versus 0.34), a slightly higher Spearman correlation (0.9100 versus 0.9052), but a lower percentage of correctly identified top papers (74.75% versus 76.41%). Compared with the naïve approach, the improvement of WSB-with-prior is marginal given its model/computational complexity.

Although ever-fancier statistical methods (e.g., regularization) can be incorporated to fit the parameters in WSB, the results will always work poorly on extreme values. Indeed, WSB is inadequate to any type of data that has extreme values. However, scientific outcomes are uncertain and skewed, and there are always high-impact outliers that represent the greatest discoveries and cannot be dismissed (5, 6). Although we find WSB elegant, we do not believe it to be very useful, and it would be going too far to market this model as a prediction tool for policy-making.

References

  1. Acknowledgments: Data are retrieved from a bibliometrics database developed by the Competence Center for Bibliometrics for the German Science System (K.B.) and derived from Thomson Reuters Web of Science. K.B. is funded by the German Federal Ministry of Education and Research (BMBF), project no. 01PQ08004A.
View Abstract

Subjects

Navigate This Article