Review

# The Impact of Structural Genomics: Expectations and Outcomes

See allHide authors and affiliations

Science  20 Jan 2006:
Vol. 311, Issue 5759, pp. 347-351
DOI: 10.1126/science.1121018

## Abstract

Structural genomics (SG) projects aim to expand our structural knowledge of biological macromolecules while lowering the average costs of structure determination. We quantitatively analyzed the novelty, cost, and impact of structures solved by SG centers, and we contrast these results with traditional structural biology. The first structure identified in a protein family enables inference of the fold and of ancient relationships to other proteins; in the year ending 31 January 2005, about half of such structures were solved at a SG center rather than in a traditional laboratory. Furthermore, the cost of solving a structure at the most efficient SG center in the United States has dropped to one-quarter of the estimated cost of solving a structure by traditional methods. However, the efficiency of the top structural biology laboratories—even though they work on very challenging structures—is comparable to that of SG centers; moreover, traditional structural biology papers are cited significantly more often, suggesting greater current impact.

## Comparison of Citations

Several structural biologists have suggested that one measure of the level of interest in a scientific field is the number of published papers in the field, and the impact of a scientific report may on average be roughly estimated by the number of subsequent citations. We examined the number of citations to the primary reference in each PDB entry for the 104 SG structures deposited between 1 September 2001 and 31 August 2002 (table S12). As of November 2005, 34 of the 104 structures remain unpublished and thus have no citations. The mean number of citations for the 104 structures was 11.0 and the median number was 4. Severalfactorsbiasthisanalysis: Thetwo most cited references (with 107 and 61 citations, respectively) describe the overall work of a center rather than individual structures, and each was the primary reference for two PDB entries. Also, there were several additional cases in which multiple structures shared the same primary reference, often a functional study, and these were cited more on average than other references. For comparison, we randomly selected 104 non-SG structures solved in the same time period, of which all but six had been published (table S13). Like the SG structures, several shared primary references. The 104 structures had a mean of 21.0 citations and a median of 11.5 citations. Thus, publications of SG structures have significantly fewer citations than publications of structures from non-SG laboratories [P < 0.0001 in a two-tailed Mann-Whitney test (25)]. For SG structures, novelty did not appear to correlate with the citation rate (8). Among non-SG structures, novel structures were cited more often than non-novel structures, as traditional structural biologists solved structures likely to have immediate impact on established biochemical research communities.

## Discussion

Structural genomics has been extremely successful at increasing the scope of our structural knowledge of protein families. SG efforts worldwide account for nearly half of the protein families for which the first representative was reported solved during the most recent year of our study (February 2004 to January 2005). Despite the pace of SG, the quality of SG structures has been found to be similar to that of non-SG structures (26). The difference in output between the most efficient center and the average is striking.

The fraction of structures solved that are novel could be improved at all SG centers. The specific focus of a center may not be entirely compatible with the goal of producing novel structures; for example, a center focusing on medically relevant proteins may need to target multiple members of a family of therapeutic importance. Also, work on a target is not always abandoned when a detectably homologous structure is solved elsewhere, because finishing a near-complete structure may be a worthwhile use of resources. Finally, a structure may not be considered novel because the preceding structure was solved elsewhere but not reported immediately. Rapid reporting of the sequences of newly solved structures could reduce wasted effort at SG centers by at least 4 to 8% (the minimal level of redundancy observed across all SG centers), saving millions of dollars per year in the United States alone.

Relative to other structural biology laboratories, SG centers have published relatively few papers describing their structures, and these papers have a lower average number of citations. This finding suggests that publication is a bottleneck not easily adapted to high-throughput environments. Currently, our estimated costs per citation are similar between SG and non-SG structural biology laboratories, in contrast to other areas in which SG has shown greatly improved efficiency. Although SG centers are reporting results through channels other than traditional publications (27), such as public websites and centralized databases (9), it is un-clear whether structures reported in this manner will individually have the same scientific impact as those reported in traditional publications. Highly cited publications often describe detailed studies of protein function, and such studies were not funded at the PSI centers in the pilot phase; however, PSI structures may be used as a starting point for such studies. Ultimately, the cumulative impact of SG, by providing comprehensive structural information covering the majority of proteins, is likely to be greater than the sum of the impact of the individual structures (as was the case for genome sequencing projects).

Finally, the cost estimates suggest a strategy for direction of future structural biology resources. New families predicted to be tractable with high-throughput methods could have basic structural characterization attempted by SG centers because of the substantial cost savings. These families should be prioritized according to significance, for example, family size or biological role (28, 29). Non-SG structural biology could focus on hypothesis-driven research into the function or mechanism of individual proteins, the characterization of particularly challenging proteins and complexes, and other research that is currently impractical to conduct using high-throughput methods. Leading-edge structural biology studies often rely on integration of data from multiple length and time scales, for which most steps are not currently amenable to high-throughput experiments (30). During PSI phase 2, considerable resources will be spent on specialized centers aimed at developing technology for high-throughput solution of more challenging structures, such as membrane proteins, eukaryotic proteins, and small protein complexes, which we hope will lead to further gains in efficiency. We view SG and traditional structural biology as playing complementary roles. Structural genomics offers an efficient means to comprehensively survey protein families; by structurally characterizing proteins whose importance is not yet understood, it provides a foundation for the next generation of biomedical research. On the other hand, non-SG structural biology focuses on proteins whose significance is already appreciated, delving deep into particularly rewarding areas to provide immediate scientific impact.

Supporting Online Material

Materials and Methods

References

Figs. S1 to S4

Tables S1 to S14

View Abstract