Report

Atypical Combinations and Scientific Impact

See allHide authors and affiliations

Science  25 Oct 2013:
Vol. 342, Issue 6157, pp. 468-472
DOI: 10.1126/science.1240474

Making an Impact

How big a role do unconventional combinations of existing knowledge play in the impact of a scientific paper? To examine this question, Uzzi et al. (p. 468) studied 17.9 million research articles across five decades of the Web of Science, the largest repository of scientific research. Scientific work typically appeared to draw on highly conventional, familiar mixtures of knowledge. The highest-impact papers were not the ones that had the greatest novelty, but had a combination of novelty and otherwise conventional combinations of prior work.

Abstract

Novelty is an essential feature of creative ideas, yet the building blocks of new ideas are often embodied in existing knowledge. From this perspective, balancing atypical knowledge with conventional knowledge may be critical to the link between innovativeness and impact. Our analysis of 17.9 million papers spanning all scientific fields suggests that science follows a nearly universal pattern: The highest-impact science is primarily grounded in exceptionally conventional combinations of prior work yet simultaneously features an intrusion of unusual combinations. Papers of this type were twice as likely to be highly cited works. Novel combinations of prior work are rare, yet teams are 37.7% more likely than solo authors to insert novel combinations into familiar knowledge domains.

Scientific enterprises are increasingly concerned that research within narrow boundaries is unlikely to be the source of the most fruitful ideas (1). Models of creativity emphasize that innovation is spurred through original combinations that spark new insights (210). Current interest in team science and how scientists search for ideas is premised in part on the idea that teams can span scientific specialties, effectively combining knowledge that prompts scientific breakthroughs (1115).

Yet the production and consumption of boundary-spanning ideas can also raise well-known challenges (1621). If, as Einstein believed (21), individual scientists inevitably become narrower in their expertise as the body of scientific knowledge expands, then reaching effectively across boundaries may be increasingly challenging (4), especially given the difficulty of searching unfamiliar domains (17, 18). Moreover, novel ideas can be difficult to absorb (19) and communicate, leading scientists to intentionally display conventionality. In his Principia, Newton presented his laws of gravitation using accepted geometry rather than his newly developed calculus, despite the latter’s importance in developing his insights (22). Similarly, Darwin devoted the first part of the Origin of Species to conventional, well-accepted knowledge about the selective breeding of dogs, cattle, and birds. From this viewpoint, the balance between extending science with atypical combinations of knowledge while maintaining the advantages of conventional domain-level thinking is critical to the link between innovativeness and impact. However, little is known about the composition of this balance or how scientists can achieve it.

In this study, we examined 17.9 million research articles in the Web of Science (WOS) to see how prior work is combined. We present facts that indicate (i) the extent to which scientific papers reference novel versus conventional combinations of prior work, (ii) the relative impact of papers based on the combinations they draw upon, and (iii) how (i) and (ii) are associated with collaboration.

We considered pairwise combinations of references in the bibliography of each paper (23, 24). We counted the frequency of each co-citation pair across all papers published that year in the WOS and compared these observed frequencies to those expected by chance, using randomized citation networks. In the randomized citation networks, all citation links between all papers in the WOS were switched by means of a Monte Carlo algorithm. The switching algorithm preserves the total citation counts to and from each paper and the distribution of these citation counts forward and backward in time to ensure that a paper (or journal) with n citations in the observed network will have n citations in the randomized network. For both the observed and the randomized paper-to-paper citation networks, we aggregated counts of paper pairs into their respective journal pairs to focus on domain-level combinations (2426). In the data, there were over 122 million potential journal pairs created by the 15,613 journals indexed in the WOS.

Comparing the observed frequency with the frequency distribution created with the randomized citation networks, we generated a z score for each journal pair. This normalized measure describes whether any given pair appeared novel or conventional. Z scores above zero indicate pairs that appeared more often in the observed data than expected by chance, indicating relatively common or “conventional” pairings. Z scores below zero indicate pairs that appear less often in the observed WOS than expected by chance, indicating relatively atypical or “novel” pairings. For example, in the year 1980, the pairing Tetrahedron and Experientia had a high z score (21.55) indicating a conventional pairing, whereas Tetrahedron paired with Life Sciences had a negative z score (–17.67), indicating a pairing more unusual than chance. The supplementary materials detail these computations, the null model, and an illustrative example (table S1 and figs. S1 to S3).

As a simple validation of the z score measure, we found that journal pairs from the same WOS disciplinary designation had significantly higher z scores than did interdisciplinary journal pairs (table S3 and fig. S11). At the same time, only a minority (40.1%) of interdisciplinary journal pairs were novel, having z scores below zero in the 1990s. This pattern indicates that observed journal pairings from the same WOS disciplines tend to be conventional, and interdisciplinary WOS journal pairings are less substantially conventional but still not consistently novel.

The above method assigns each paper a distribution of journal pair z scores based on the paper’s reference list (Fig. 1A). To characterize a paper’s tendency to draw together conventional and novel combinations of prior work, we examined two summary statistics. First, to characterize the central tendency of a paper’s combinations, we considered the paper’s median z score. The median allows us to characterize conventionality in the paper’s main mass of combinations. Second, we considered the paper’s 10th-percentile z score. The left tail allows us to characterize the paper’s more unusual combinations, where novelty may reside.

Fig. 1 Novelty and conventionality in science.

For a sample paper, (A) shows the distribution of z scores for that paper’s journal pairings. The z score shows how common a journal pairing is as compared to chance. For each paper, we take two summary measures: its median z score, capturing the paper’s central tendency in combining prior work, and the 10th-percentile z score, capturing the paper’s journal pairings that are relatively unusual. For the population of papers, we then consider these values across all papers in the WOS published in the 1980s or 1990s. (B) considers the median z scores and shows that the vast majority of papers displays a high propensity for conventionality; in the 1980s and 1990s, fewer than 4% of papers have median z scores below 0 and more than 50% of papers have median z scores above 64. (C) considers the 10th-percentile z scores, which further suggest a propensity for conventionality; only 41% of papers in the 1980s and 1990s have a 10th-percentile z score below 0. Overall, by these measures, science rarely draws on atypical pairings of prior work.

We found that papers typically relied on very high degrees of conventionality. Figure 1B presents the distribution of papers’ median z scores for the WOS in the indicated decades. Considering that a z score below zero represents a journal pair that occurs less often than expected by chance, the analysis of median z scores suggests very high degrees of conventionality. Half of the papers have median z scores exceeding 69.0 in the 1980s and 99.5 in the 1990s. Moreover, papers with a median z score below zero are rare. In the 1980s, only 3.54% of papers had this feature, whereas in the 1990s the percentage fell to 2.67%, indicating a persistent and prominent tendency for high conventionality.

Focusing on each paper’s left tail combinations, we found that even among the paper’s relatively unusual journal combinations, the majority of papers did not feature atypical journal pairs. Figure 1C shows that 40.8% of the papers in 1980s and 40.7% in the 1990s have a 10th-percentile z score below zero. Overall, by these measures, science typically relies on highly conventional combinations and rarely incorporates journal pairs that are uncommon compared to chance. Analyses in the supplementary materials (fig. S6) show that these empirical regularities for the WOS taken as a whole are largely replicated on a field-by-field basis and across time.

Our next finding indicates a powerful relationship between combinations of prior work and ensuing impact. Figure 2 presents the probability of a “hit” paper, conditional on the combination of its referenced journal pairs. Hit papers are operationalized as those in the upper 5th percentile of citations received across the whole data set, as measured by total citations through 8 years after publication (the supplementary materials consider alternative definitions of hit papers). The vertical axis shows the probability of a hit paper conditional on a 2 × 2 categorization indicating the paper’s (i) “median conventionality” (an indicator for whether the paper’s median z score is in the upper or lower half of all median z scores) and (ii) “tail novelty” (an indicator for whether the paper’s 10th-percentile z score is above or below zero).

Fig. 2 The probability of a “hit” paper, conditional on novelty and conventionality.

This figure presents the probability of a paper being in the top 5% of the citation distribution conditional on two dimensions: whether a paper exhibits (i) high or low median conventionality and (ii) high or low tail novelty, as defined in the text. Papers that combine high median conventionality and high tail novelty are hits in 9.11 out of 100 papers, a rate nearly double the background rate of 5%. Papers that are high on one dimension only (high median conventionality or high tail novelty but not both) have hit rates about half as large. Papers with low median conventionality and low tail novelty have hit rates of only 2.05 out of 100 papers. The sample includes all papers published in the WOS from 1990 to 2000. The supplementary materials show similar findings when considering (i) all other decades from 1950 to 2000; (ii) hit papers defined as the top 1 or 10% by citations; and (iii) analyses controlling for field and other observable differences across papers, hinting at a universality of these relationships for scientific work. The difference in the hit probabilities for each category is statistically significant (P < 0.00001). The percentage of WOS papers in each category are: 6.7% (green bar), 23% (gold bar), 26% (red bar), and 44% (blue bar).

Papers with “high median conventionality” and “high tail novelty” display a hit rate of 9.11 out of 100 papers, or nearly twice the background rate of 5 out of 100 papers. All other categories show significantly lower hit rates. Papers featuring high median conventionality but low tail novelty displayed hit rates of 5.82 out of 100 papers, whereas those featuring low median conventionality but high tail novelty display hit rates of 5.33 out of 100 papers. Finally, papers low on both dimensions have hit rates of just 2.05 out of 100.

Further analyses suggest a universality of these relationships for scientific work across time and fields. We considered the same relationships for different time periods (fig. S4), for different definitions of high-impact papers (fig. S5), and for each of 243 fields of science (fig. S6 and table S2). These analyses confirmed the findings above. Thus, novelty and conventionality are not opposing factors in the production of science; rather, papers with an injection of novelty into an otherwise exceptionally familiar mass of prior work are unusually likely to have high impact.

Collaboration is often claimed to produce more novel combinations of ideas (1014), but the extent to which teams incorporate novel combinations across the universe of fields is unknown. Team-authored papers were more likely to show atypical combinations than were single- or pair-authored papers. Figure 3A shows that the distribution of 10th-percentile z scores shifted significantly leftward as the number of authors increased [Kolmogorov-Smirnov (K-S) tests indicate solo versus pair P = 0.016, pair versus team P = 0.001, team versus solo P < 0.001]. Papers written by one, two, or three or more authors showed high tail novelty in 36.1, 39.8, and 49.7% of cases, respectively, indicating that papers with three or more authors showed an increased frequency of high tail novelty over the solo-author rate by 37.7%.

Fig. 3 Authorship structure, novelty, and conventionality.

Team-authored papers are more likely to incorporate tail novelty but without sacrificing a central tendency for high conventionality. Papers introduce tail novelty (a 10th-percentile z score less than 0) in 36.2, 39.9, and 49.7% of cases for solo authors, dual authors, and three or more authors, respectively (A). K-S tests confirm that the distributions of tail novelty are distinct (solo versus pair P = 0.016, pair versus team P = 0.001, team versus solo P < 0.001). In contrast, each team size shows similar distributions for median conventionality [(B), K-S tests indicate no statistically significant differences]. These findings suggest that a distinguishing feature of teamwork, and teams’ exceptional impact, reflects a tendency to incorporate novelty.

Teams were neither more nor less likely than single authors or pairs of authors to display high median conventionality. Figure 3B indicates no significant statistical difference in the median z-score distributions (K-S tests indicate solo versus pair P = 0.768, pair versus team P = 0.417, team versus solo P = 0.164). Teams thus achieve high tail novelty more often than solo authors. Yet, teams were not simply more novel but rather displayed a propensity to incorporate high tail novelty without giving up a central tendency for high conventionality.

In our final analysis, we examined the interplay between citation, combination, and collaboration using regression methods (Fig. 4). Papers were binned into 11 equally sized categories of median conventionality. A separate regression was run for each category of median conventionality and each team size, with field fixed effects. The supplementary materials detail the regression methodology and present additional confirmatory tests (figs. S7 to S10).

Fig. 4 Novel and conventional combinations in the production of science.

(A to C) The interplay between tail novelty, median conventionality, and hit paper probabilities shows remarkable empirical regularities. First, high tail novelty papers have higher impact than low tail novelty papers at (i) any level of conventionality and (ii) regardless of authorship structure. Second, increasing median conventionality is associated with higher impact up to the 85th to 95th percentile of median conventionality, after which the relationship reverses. Third, larger teams obtain higher impact given the right mix of tail novelty and median conventionality. Nonetheless, at low levels of median convention and tail novelty, even teams have low impact, further emphasizing the fundamental relationship between novelty, conventionality, and impact in science.

There were three primary findings. First, high tail novelty papers had higher impact than low tail novelty papers, an impact advantage that occurred at any level of conventionality and regardless of authorship structure. Second, peak impact occurred in the 85th to 95th percentile of median conventionality, an exceptionally high level. This peak and its position appeared irrespective of tail novelty/no tail novelty or authorship structure. These generic features suggest fundamental underlying rules relating combinations of prior work to the highest-impact science.

Finally, Fig. 4 indicates that at virtually all mixes of tail novelty and median conventionality, larger teams were associated with higher impact. Thus, whereas teams incorporated the highest impact mixes more frequently (Fig. 3), teams also tended to obtain higher impact for any particular mix (Fig. 4). Nonetheless, despite teams’ advantage in citations across virtually all fields of science (12), even teams had low impact at low levels of median conventionality and tail novelty.

Our analysis of 17.9 million papers across all scientific fields suggests that the highest-impact science draws on primarily highly conventional combinations of prior work, with an intrusion of combinations unlikely to have been joined together before. These patterns suggest that novelty and conventionality are not factors in opposition; rather, papers that mix high tail novelty with high median conventionality have nearly twice the propensity to be unusually highly cited.

These findings have implications for theories about creativity and scientific progress. Combinations of existing material are centerpieces in theories of creativity, whether in the arts, the sciences, or commercial innovation (24, 610, 16). Across the sciences, the propensity for high-impact work is sharply elevated when combinations of prior work are anchored in substantial conventionality, not novelty, while mixing in a left tail of combinations that are rarely seen together. In part, this pattern may reflect advantages to being within the mainstream of a research trajectory, where scientists are currently focused, while being distinctive in one’s creativity. Combinations of prior work also relate to “burden of knowledge” theory, which emphasizes the growing knowledge demands on scientists (4, 17, 21). New articles indexed by the WOS now exceed 1.4 million per year across 251 fields, encouraging specialization and challenging scientists’ capacity to comprehend new thinking across domains. The finding that teams preserve high conventionality yet introduce tail novelty suggests that teams help meet the challenge of the burden of knowledge by balancing domain-level depth with a capacity for atypical combinations.

This methodology considered paper and journal pairings but can be applied at the level of disciplines, papers, or topics within papers, allowing the examination of combinations of prior work at different resolutions in future studies of creativity and scientific impact. Beyond science, links between novelty and conventionality in successful innovation also appear. E-books retain page-flipping graphics to remind the reader of physical books, and blue jeans were designed with a familiar watch pocket to look like conventional trousers. From this viewpoint, the balance between extending technology with atypical combinations of prior ideas while embedding them in conventional knowledge frames may be critical to human progress in many domains. Future research questions also arise from our findings. Science is dynamic, with research areas shifting and new fields arising. Although we find that the regularities relating novelty, conventionality, and impact persist across time and fields, understanding how research trajectories shift and how new fields are born are questions that measures of novelty and convention may valuably inform. At root, our work suggests that creativity in science appears to be a nearly universal phenomenon of two extremes. At one extreme is conventionality and at the other is novelty. Curiously, notable advances in science appear most closely linked not with efforts along one boundary or the other but with efforts that reach toward both frontiers.

Supplementary Materials

www.sciencemag.org/content/342/6157/468/suppl/DC1

Data and Methods

Supplementary Text

Figs. S1 to S11

Tables S1 to S3

References (27, 28)

References and Notes

  1. Acknowledgments: Sponsored by the Northwestern University Institute on Complex Systems and by the Army Research Laboratory under Cooperative Agreement Number W911NF-09-2-0053 and Defense Advanced Research Projects Agency grant BAA-11-64, Social Media in Strategic Communication. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory or the U.S. government. All our summary statistics and programs are freely available on request. Our access to the WOS comes through a contract with Thomson Reuters that forbids redistribution of their database; researchers who desire the raw data on which to run our analytics can obtain it via a paid subscription to Thomson Reuters.
View Abstract

Navigate This Article