Whole-Genome Analysis of Photosynthetic Prokaryotes

See allHide authors and affiliations

Science  22 Nov 2002:
Vol. 298, Issue 5598, pp. 1616-1620
DOI: 10.1126/science.1075558


The process of photosynthesis has had profound global-scale effects on Earth; however, its origin and evolution remain enigmatic. Here we report a whole-genome comparison of representatives from all five groups of photosynthetic prokaryotes and show that horizontal gene transfer has been pivotal in their evolution. Excluding a small number of orthologs that show congruent phylogenies, the genomes of these organisms represent mosaics of genes with very different evolutionary histories. We have also analyzed a subset of “photosynthesis-specific” genes that were elucidated through a differential genome comparison. Our results explain incoherencies in previous data-limited phylogenetic analyses of phototrophic bacteria and indicate that the core components of photosynthesis have been subject to lateral transfer.

Photosynthesis is an essential biological process in which solar energy is transduced into other forms of energy that are available to all life. Primary production by photosynthetic organisms supports all ecosystems, with the noted exceptions of deep-sea hydrothermal vents and subsurface communities. Oxygen, one of the by-products of photosynthesis by cyanobacteria and their descendants (including algae and higher plants), transformed the Precambrian Earth and made possible the development of more complex organisms that use aerobic metabolism (1, 2). Understanding the origin and evolution of the process of photosynthesis is, therefore, of considerable interest.

All available evidence suggests that (bacterio)chlorophyll-based photosynthesis arose within the bacterial domain of the tree of life and was followed by subsequent endosymbiotic transfer into eukaryotes. Accurate dates for appearance of the first photosynthetic organisms are not known. Substantial information, including biomarkers, stromatolites, and paleosols, as well as data from molecular evolution studies, indicates that oxygenic (oxygen-evolving) photosynthesis arose by 2500 million years ago (2–5). On the basis of phylogenetic analyses and the well-detailed complexity of the photosynthetic machinery, mechanistically simpler anoxygenic (non–oxygen-evolving) photosynthesis almost certainly preceded and was ancestral to oxygenic photosynthesis (1,6). Therefore the cyanobacteria, as ancient as they appear to be, were probably preceded by a diverse group of more primitive phototrophs. The supposed progeny of those early phototrophs are still found throughout diverse ecosystems and may provide key evidence toward unraveling the early origins of photosynthesis.

There are five known bacterial phyla with photosynthetic members. These phyla are widely distributed within the bacterial domain and include the cyanobacteria (the only oxygenic group), proteobacteria (purple bacteria), green sulfur bacteria, green filamentous bacteria, and the Gram-positive heliobacteria. With respect to traditional ribosomal-based phylogenies, the distribution of photosynthesis is markedly paraphyletic (7, 8). There have been a number of different hypotheses proposed to resolve the disparate phylogenetic distribution of these organisms (6, 9–11). However, in the absence of conclusive data, none of these proposals has won unanimous acceptance. On the basis of genomic comparisons presented here, we propose that horizontal gene flow has played a major role in the evolution of bacterial phototrophs and that many of the essential components of photosynthesis have been among these horizontally transferred genes.

A crucial early step of any sequence-based analysis is the selection of genes for phylogenetic comparison, which should minimize the inclusion of potentially error-causing paralogs or nonhomologous genes. Here this was done by carrying out whole-genome BLAST comparisons of all proteins for every possible pairing of organisms that make up the sample. Putative orthologs were required to have BLAST scores with expectation values for chance similarity below a preset threshold. Sets of orthologous sequences were then compiled from genes that are reciprocal best BLAST hits across all of the genomes compared, therefore, given a set of orthologs from each of the five genomes, each individual ortholog returns all of the other four as a top-scoring BLAST hit when searching that particular genome (12). These computationally intensive procedures aim to avoid the erroneous results that can arise from comparing paralogous or nonhomologous genes [for methodology, application, and further discussion, see (13–15)]. Even with these rigorous ortholog selection requirements, we were able to perform phylogenetic analyses on nearly 200 sets of orthologous genes, providing a previously unattainable look into the early evolution of photosynthetic organisms.

With the use of the above methods, we found a total of 188 orthologs common to the genomes of Synechocystis sp. PCC6803 (cyanobacteria), Chloroflexus aurantiacus (green filamentous bacteria), Chlorobium tepidum (green sulfur bacteria),Rhodobacter capsulatus (proteobacteria), andHeliobacillus mobilis (heliobacteria). These genes encompass a broad range of functions, including housekeeping genes involved in protein synthesis, DNA replication and transcription, and manufacture of structural components of the cell, as well as the genetic components of various metabolic or biosynthetic pathways common to all the organisms. We individually evaluated each set of orthologs using maximum likelihood to determine which of the 15 possible five-taxa tree topologies provided the best fit to the observed sequence data. Posterior probabilities were calculated from log likelihood values with the use of an approach developed by Strimmer and von Haesler (16). Figure 1 shows all 15 possible topologies as well as the percentage of the 188 sets of protein-coding genes for which the given topology was the most probable. Also shown in Fig. 1 are example functional annotations, some of which are frequent choices for phylogenetic inference, listed by their corresponding topology those genes supported. The most unexpected result from this analysis is the distinct lack of unanimous support for a single topology. Plurality support is seen for the three trees (5, 10, and 15) that group together Synechocystis sp., C. aurantiacus, and H. mobilis separate from a distinctR. capsulatus and C. tepidum cluster. The data suggest that even strongly supported phylogenies and highly conserved genes from these organisms often show very different evolutionary histories.

Figure 1

Distribution of orthologs among the 15 possible unrooted trees. The tree at top gives branching order for the photosynthetic organisms listed in the center grid for each of the 15 possible five-taxa trees. Bars show the percentage of 188 sets of orthologs that chose a particular tree topology as most likely. Examples of genes supporting each topology, based onSynechocystis annotations, are shown at right and include 16S and 23S trees constructed from ribosomal DNA sequences from these genomes.

Orthologs from each data set were further stratified by their putative functional assignments on the basis of cluster of orthologous groups (COG) categories (12, 14, 17) (fig. S1, table S4). It might have been expected that, for example, genes functioning in information processing would as a subset show preference for a single topology (18). However, the results indicate that even at this level of grouping-by-function no unanimous support for a particular topology is seen. Additionally, because branch length information is necessarily disregarded when segregating orthologs by most likely topology, we reexamined branch lengths for every tree constructed and tabulated distances determined by maximum likelihood analysis of the individual sets of orthologous genes. This step incorporated another level of stringency into the overall analysis, because potentially error-causing cases in which one or more orthologs displayed anomalously long branch lengths could be recognized and eliminated. We observed a positive correlation between overall number of substitutions per site and posterior probability score for the most likely tree, indicating that genes that are less diverged are more likely to map to an explicit topology (19). The shortest between-taxa distances were recovered from each 5 × 5 pairwise distance matrix generated during phylogenetic reconstruction. In 117 cases, the shortest between-taxa distance favored clustering one of the three possible pairings ofH. mobilis, Synechocystis, andChloroflexus, whereas the C. tepidumR. capsulatus cluster was favored in only 8 cases. Overall averaged estimates of substitutions per site corroborate these findings, with the lowest number of substitutions per site betweenChloroflexus and H. mobilis, followed bySynechocystis and H. mobilis. Averaged substitutions per site for the C. tepidumR. capsulatus grouping were second highest overall. These results imply an overall close relationship between H. mobilis,Synechocystis, and Chloroflexus (though the relationship between the latter two is not as strong, on average) and reveal that the C. tepidumR. capsulatusgrouping that is frequently observed when unrooted topologies are considered becomes less relevant when estimated distances between these two organisms are taken into account.

Subsequently, we set out to identify genes that play an essential role in phototrophy and whose evolution might be tightly linked to the advent and development of photosynthesis. The biochemical machinery comprising the cogwheels of photosynthesis has been continually refined over billions of years since the emergence of the first bacterial phototrophs. In some notable cases, genes within this process have originated from nonphotosynthetic genes that were incorporated by various genetic processes, including gene recruitment, gene duplication and fusion, and possibly motif shuffling (6, 9). In other cases, gene origins have been masked by eons of evolution at the primary sequence level, so some homologs are detected only in other photosynthetic organisms. These so-called “photosynthesis-specific” (PS-specific) genes emerge as an obvious focus of interest in attempting to understand the evolution of photosynthesis; however, it remains unclear how extensive the set of PS-specific genes is. Therefore, we have constructed a simple method for finding members of this group.

Finding PS-specific genes can be approximated by finding all genes shared within the subset of photosynthetic organisms and then subtracting from this set those genes found in nonphotosynthetic organisms (12). In principle, this method for identification of pathway-specific genes can be applied to other groups of organisms whose genomes have been sequenced, giving a differential comparison between organisms that share a pathway and those that are missing it. Although there are obvious cases where this method will result in false negatives due to organism-specific photosynthetic proteins, even this first-order approach gives some interesting insights.

In performing this analysis on the above set of five photosynthetic genomes and a group of six taxonomically diverse, nonphotosynthetic bacteria and archaea, we found only a small set of PS-specific proteins (Fig. 2) (tables S1 to S4). Relaxing our constraints to include putative “photosynthesis-related proteins” (PS-related) — defined as missing in no more than one of the photosynthetic genomes or present in no more than one of the nonphotosynthetic genomes — notably increases the size of this set with the caveat of potentially increasing the number of false positives. Genes found in all 11 bacterial and archaeal genomes are predominantly housekeeping genes that function in nucleic and amino acid transport and metabolism as well as in translation and ribosomal structure (but not in transcription or DNA replication). PS-specific and PS-related genes function primarily in energy production (12). However, no single majority topology was observed in the phylogenetic trees from either of these functional subsets.

Figure 2

Distribution of 3169 genes fromSynechocystis by occurrence in five photosynthetic and six nonphotosynthetic bacterial and archaeal genomes, ranging from genes present in all 11 genomes to those only found inSynechocystis. Proposed categories are circled in red, and number of genes in each proposed category is shown in parentheses.

A second, more exhaustive method was then undertaken in which we compared the five photosynthetic organisms to an additional six photosynthetic and 50 nonphotosynthetic organisms from publicly available genome projects (Table 1). This comparison did not require a single key organism (such asSynechocystis) as with the above analysis, but rather it found homologous genes and gene families from the overlap and differences of a large set of photosynthetic and nonphotosynthetic genomes (12). Homologs found in this extensive analysis corroborate most of the findings from the restricted data set, and add several significant hits to the overall list and subtract some false positives. The function and topology supported by several genes at the top of these lists are congruent with recent phylogenetic analysis of pigment biosynthesis genes (6), though they differ from the ribosomal-based organismal phylogenies and plurality topologies in Fig. 1. These results bolster the idea that the evolution of photosynthetic genes has been disconnected from divergence and speciation in these organisms, confirming the extensive role that horizontal gene flow has played in prokaryote evolution. An additional caveat is that many genes from the PS-related set are either hypothetical or completely unknown, complicating attempts to understand the context under which many of these genes have evolved and making them candidates for further analysis. One possibility is that some elements of the photosynthetic apparatus, or factors involved in its assembly or stability, remain unknown.

Table 1

Putative function and pathway or functional category of PS-specific and PS-related genes, and number of genomes each gene is found in (tables S1 to S4 and fig. S1). Main PS includes the five photosynthetic lineages compared in the text, other PS includes six additional phototrophic bacteria, and non-PS includes 50 nonphotosynthetic organisms. Question marks indicate unidentified functional categories.

View this table:

Previous phylogenetic analyses of photosynthetic bacteria have necessarily used a limited subset of genes to infer relationships among these organisms, often resulting in incongruent results (6, 7, 10, 11). New whole-genome data have allowed us to make an extensive comparison of representatives of each of the five known groups of photosynthetic bacteria and may help to reconcile multiple lines of disparate phylogenetic evidence centered on them. In line with other recent whole-genome analyses, horizontal gene transfer (HGT) appears to be an integral aspect of prokaryote evolution (20–23), and genetic components of the photosynthetic apparatus have crossed species lines nonvertically. Rather than confounding the overall picture, as is often the case in data-limited studies where HGT is apparent, in the context of whole genome comparisons HGT can further refine and resolve the history of an organism. For example, multiple lines of phylogenetic evidence, supported in part by our analysis, have placed the Gram-positive firmicutes, which include H. mobilis, as a sister phylum to the cyanobacteria (8, 15,24). However, the close relationship of either of these groups with Chloroflexus has not previously been noted. The placement of Chloroflexus at the base of the bacterial radiation using 16S ribosomal RNA has been the basis for its designation as the earliest phototroph (7, 25). Taking into consideration our results that indicate extensive lateral gene transfer raises the possibility that Chloroflexus has acquired phototrophy, perhaps largely through lateral gene transfer. This idea is bolstered by the close phylogenetic and, to a lesser degree, phenotypic relatedness of Chloroflexus andChlorobium, evident in their highly similar pigment biosynthesis genes and light-harvesting chlorosome structures. In contrast, other components of these two bacteria, including the photosynthetic reaction centers, are markedly different; thus, other components might have been inherited vertically or through HGT from other phototrophs. These ideas suggest further tests of estimating times of divergence and lateral gene transfer for these and the other photosynthetic bacteria compared here. For all the demonstrated evolutionary complexity and antiquity of these bacteria, mapping the early events in the evolution and distribution of photosynthesis stands as a formidable but exciting challenge.

Supporting Online Material

Materials and Methods

Fig. S1

Tables S1 to S4

  • * These authors contributed equally to this paper.

  • To whom correspondence should be addressed. E-mail: blankenship{at}


View Abstract

Navigate This Article