Molecular Evolution of Protein Atomic Composition

See allHide authors and affiliations

Science  13 Jul 2001:
Vol. 293, Issue 5528, pp. 297-300
DOI: 10.1126/science.1061052


Living organisms encounter various growth conditions in their habitats, raising the question of whether ecological fluctuations could alter biological macromolecules. The advent of complete genome sequences and the characterization of whole metabolic pathways allowed us to search for such ecological imprints. Significant correlations between atomic composition and metabolic function were found in sulfur- and carbon-assimilatory enzymes, which appear depleted in sulfur and carbon, respectively, in both the bacterium Escherichia coliand the eukaryote Saccharomyces cerevisiae. In addition to genetic instructions, genomic data thus also provide paleontological records of environmental nutrient availability and of metabolic costs.

A widely accepted principle is that protein evolution is mainly determined by constraints on activity, specificity, folding, and stability (1–4). But other constraints may come into play, in particular nutritional constraints, which have thus far received little scrutiny. Indeed, the elements used in the construction of proteins are not only funneled through metabolic pathways but are also subject to geochemical cycles at the surface of Earth. Thus, we previously proposed that metabolic flows and geochemical budgets might be constraints that were imprinted on protein evolution (5).

To assess the hypothesis that nutritional constraints might have influenced the evolution of protein structure, we computed the atomic composition of enzymes involved in elemental assimilation processes in the two model microorganisms E. coli and S. cerevisiae. For both species, almost complete sets of biochemically characterized enzymes together with their cognate sequences are available, allowing us to analyze species representing the prokaryotic and eukaryotic kingdoms. We began by compiling complete protein sequence data sets from the nonredundantSaccharomyces Genome Database (SGD;, 6305 protein sequences) and the Colibri E. coli database (, 4116 protein sequences).

We first investigated sulfur usage in sequences of proteins involved in the assimilation of sulfur in S. cerevisiae and E. coli. In both organisms, reductive assimilation of sulfate is achieved through roughly equivalent sets of biochemical reactions (6, 7). However, some differences were apparent. For instance, the reactions leading to the incorporation of reduced sulfur into a carbon chain differ between the two organisms; homocysteine is an intermediate of cysteine synthesis in yeast but not in E. coli (8). In addition, in some particular steps such as sulfate activation and sulfite reduction, the reactions proceed with different mechanisms and therefore the structures of the cognate catalysts are highly divergent (9, 10). The protein sets used for the statistical analyses include, for both organisms, the inorganic sulfur transporters, the enzymes required for the de novo synthesis of methionine and cysteine from sulfate, and the transcriptional activators specifically required for the expression of the corresponding genes (6, 7, 11). These sets comprise 20 and 23 proteins for S. cerevisiae and E. coli, respectively.

To compare elemental composition in proteins, we first determined the quantile distribution of atoms in proteins. This approach relies on that developed by Karlin and collaborators for the analysis of amino acid distribution in large protein data sets (12, 13). In this study, the quantity Q(x) (quantile point) of a given element for a given set of proteins indicates the fraction of proteins in which the averaged number of that atom per residue side chain is at most x (14).

For both S. cerevisiae and E. coli, we compared the quantile distribution of the sulfur metabolism protein set to the quantile distribution of the total protein set (Fig. 1). In both cases, despite differences in sulfur metabolism between the two organisms, the proteins involved in sulfur amino acid biosynthesis contain fewer sulfur atoms than do the total protein sets. In both the sulfur-assimilatory protein sets and the total protein sets, the distribution of sulfur content follows a bell-shaped distribution that is approximately Gaussian (Fig. 1). We thus assessed statistical significance with Student's ttests. Two-tailed P values (0.0038 for S. cerevisiae, 0.0089 for E. coli) confirm that, in both cases, the differences observed between the distributions are significant.

Figure 1

Quantile representations of sulfur usage inS. cerevisiae and E. coli. For both species, the averaged number of sulfur atoms found in residue side chains for each protein was calculated, and the totality of all these frequencies was described by a histogram. The quantile distributions are the cumulative representation of these histograms. For each total protein set, the quantiles were calculated so as to display the distribution by a 13-dot graph. For the sulfur-assimilatory protein sets, the quantiles were indicated for each new value of the distribution.

Biases of chemical composition of proteins could result from constraints that are not related to element usage or metabolism, such as structural and functional constraints or molecular phylogeny (15). For instance, not all amino acids show the same propensity to form secondary structures such as α helices, β sheets, or turns, and binding of substrates is often mediated by hydrophobic or charged amino acids. Also, amino acid usage of bulk proteins is known to correlate with DNA base composition (16,17). These alternative hypotheses can be dismissed in the biases we observed in the sulfur-assimilatory pathways: The involved proteins belong to diverse catalytic classes, encompass a wide variety of tertiary structures, and process widely diverging substrates (7). In addition, the base composition of the corresponding genes does not statistically deviate from that of the organism (18), arguing against the possibility that the observed biases result from coding constraints. Moreover, S. cerevisiae and E. coli genomes have different G+C contents (40% and 52% GC in the coding sequences, respectively), and despite this, sulfur depletion was observed in the sulfur-assimilatory pathway of both organisms. It therefore seems reasonable to postulate that a bias against sulfur-containing amino acids occurred in response to nutritional constraints such as environmental sulfur scarcity.

To further test this hypothesis, we next examined proteins involved in sulfur metabolism in mammals. Mammals are unable to assimilate inorganic sulfur compounds, and therefore all their protein sulfur atoms derive from methionine and cysteine (19). Thus, mammals are not specifically deprived of sulfur without also lacking other essential nutrients, and sulfur atom avoidance would be expected to be less marked in mammalian sulfur-metabolizing enzymes. Mammals express only a few enzymes that are the functional equivalents of microbial sulfur-assimilatory enzymes (19). Sequence counts reveal that most of the mammalian enzymes indeed contain three to five times as many sulfur atoms as do their microbial counterparts (18). Sequence alignment of the two enzymes catalyzing the conversion of homocysteine into cysteine illuminates this effect. Mammalian cystathionine-β-synthase (CBS) and cystathionine-γ-lyase share more than 40% identical residues with their yeast homologs. However, the two yeast enzymes contain a total of 8 sulfur atoms, whereas 50 and 42 sulfur atoms are used to construct the rat and human enzymes, respectively (Fig. 2) (20). Despite these atomic differences, the human CBS enzyme can substitute in vivo for the S. cerevisiae CBS protein (21). Thus, in mammals, where no sulfur-specific nutritional constraint occurs, no sulfur depletion is observed in sulfur-metabolizing enzymes.

Figure 2

Sulfur usage in cysteine biosynthetic enzymes. The human, rat, and yeast cystathionine-γ-lyase were aligned using the Clustal V program. Identical residues are indicated in gray boxes; sulfur-containing residues are in black boxes. Number of sulfur atoms relative to the length of protein is indicated. Single-letter abbreviations for amino acid residues: A, Ala; C, Cys; D, Asp; E, Glu; F, Phe; G, Gly; H, His; I, Ile; K, Lys; L, Leu; M, Met; N, Asn; P, Pro; Q, Gln; R, Arg; S, Ser; T, Thr; V, Val; W, Trp; and Y, Tyr.

Sulfur is unique among the elemental constituents of proteins because it is used in only two amino acids: methionine and cysteine. To determine whether metabolic imprinting in atomic composition of proteins is a more general mechanism, we analyzed carbon usage in carbon-assimilatory proteins of both S. cerevisiae andE. coli (22–24). For each organism, we computed the averaged number of carbon atoms present within the side chains of residues of each protein, calculated quantile distributions of carbon, and made comparisons between the set of proteins involved in carbon assimilation and the total protein set. In both cases, carbon-assimilating proteins show a depleted amount of carbon atoms in their side chains relative to the side chains of the total protein sets (Fig. 3). The distribution of carbon atoms, like that of sulfur atoms, was approximately Gaussian, and Student'st tests gave two-tailed P values (0.0031 forS. cerevisiae, 0.0068 for E. coli), indicating significant differences between the distributions. By comparison, quantile carbon representations (Fig. 3) and statistical tests (P = 0.364 and 0.438, respectively) demonstrate that no carbon usage deviation occurs in the sulfur-assimilatory protein sets.

Figure 3

Quantile representations of carbon usage inS. cerevisiae and E. coli. The quantiles were calculated and are displayed as in Fig. 1.

Finally, we compared the averaged number of nitrogen atoms present within the nitrogen-assimilatory S. cerevisiaeproteins to that in the total S. cerevisiae protein set. The yeast nitrogen-assimilatory protein set consists of the proteins involved in the conversion of ammonia, urea, allantoate, and proline (25, 26). (No similar analysis was done for E. coli because all the enzymes involved in ammonia production from a number of amino acids have not been unambiguously determined.) As was the case for sulfur- and carbon-assimilating proteins, quantile representation shows that yeast nitrogen-assimilating proteins contain a decreased number of nitrogen atoms in their residue side chains, whereas no significant deviation in nitrogen usage was observed for the set comprising the yeast sulfur-assimilatory proteins, used as a control (Fig. 4). Again, the two-tailed Pvalue (0.0331) in a Student's t test suggests that the observed difference does not occur by chance. However, both the quantile and Student's t test show that nitrogen bias, although significant, is less pronounced than the biases observed for the sulfur and carbon atoms in their cognate assimilatory enzymes.

Figure 4

Quantile representations of nitrogen usage inS. cerevisiae. The quantiles were calculated and are displayed as in Fig. 1.

Taken together, our results conclusively demonstrate the systematic occurrence of atomic biases in assimilatory proteins of two highly divergent microorganisms. This suggests that the elemental composition of biological polymers has been more generally subjected to ecological constraints than was previously thought, and that metabolic costs are among the variables optimized by natural selection. A simple explanation for the biases we report here is that the chemical structure of proteins performing the assimilation of a given element should evolve so as to respond to a sudden and transitory shortage by incorporating the smallest amount of that element. Thus, the impoverishment of sulfur- and carbon-assimilatory proteins in their respective elements can be interpreted as an imprint of variations in the nutritional availability of these elements during the natural history of S. cerevisiae and E. coli; by contrast, the enrichment of mammalian cystathionine-converting enzymes in sulfur can be interpreted as an imprint of a steady abundance of sulfur amino acids in the diet. It is likely that oligotrophic organisms would adapt to the permanent scarcity of an element by the diminution of the content of that element in all proteins, and not only in their assimilatory proteins for that element. Given that the proliferation of most organisms in their natural habitats is limited by a nutritional resource and that organisms have adapted to starvation over many generations, we anticipate that it will be possible to retrieve geochemical, ecological, and metabolic data from genome sequences using straightforward statistical methods.

  • * To whom correspondence should be addressed. E-mail: thomas{at}


View Abstract

Stay Connected to Science

Navigate This Article