Research Article

Molecular Evidence for the Early Evolution of Photosynthesis

See allHide authors and affiliations

Science  08 Sep 2000:
Vol. 289, Issue 5485, pp. 1724-1730
DOI: 10.1126/science.289.5485.1724


The origin and evolution of photosynthesis have long remained enigmatic due to a lack of sequence information of photosynthesis genes across the entire photosynthetic domain. To probe early evolutionary history of photosynthesis, we obtained new sequence information of a number of photosynthesis genes from the green sulfur bacterium Chlorobium tepidum and the green nonsulfur bacterium Chloroflexus aurantiacus. A total of 31 open reading frames that encode enzymes involved in bacteriochlorophyll/porphyrin biosynthesis, carotenoid biosynthesis, and photosynthetic electron transfer were identified in about 100 kilobase pairs of genomic sequence. Phylogenetic analyses of multiple magnesium-tetrapyrrole biosynthesis genes using a combination of distance, maximum parsimony, and maximum likelihood methods indicate that heliobacteria are closest to the last common ancestor of all oxygenic photosynthetic lineages and that green sulfur bacteria and green nonsulfur bacteria are each other's closest relatives. Parsimony and distance analyses further identify purple bacteria as the earliest emerging photosynthetic lineage. These results challenge previous conclusions based on 16S ribosomal RNA and Hsp60/Hsp70 analyses that green nonsulfur bacteria or heliobacteria are the earliest phototrophs. The overall consensus of our phylogenetic analysis, that bacteriochlorophyll biosynthesis evolved before chlorophyll biosynthesis, also argues against the long-held Granick hypothesis.

The advent of photosynthesis is one of the central events in the early development of life on Earth. The origin and evolution of photosynthesis, however, have long remained unresolved. Studies have demonstrated that photosynthetic eukaryotes acquired photosynthetic properties from endosymbiosis with cyanobacteria (1). This observation, coupled with the fact that no Mg-tetrapyrrole–based photosynthesis has been found in Archaea, supports the notion that photosynthesis is a bacterially derived process (2). To obtain insight into the early evolution of photosynthesis, it is essential to conduct detailed phylogenetic analysis of many photosynthesis genes from each of the five known photosynthetic bacterial lineages. However, a paucity of photosynthesis gene sequences across the entire spectrum of photosynthetic bacteria has required that previous analyses rely on the use of nonphotosynthesis genes, which have given conflicting results for the evolution of photosynthesis and of photosynthetic organisms. For example, phylogenetic analysis of small-subunit rRNA suggests that green nonsulfur bacteria are the earliest evolving photosynthetic lineage (3). In contrast, using portions of the Hsp60 and Hsp70 heat shock proteins as markers, Gupta et al.(4) concluded that heliobacteria are the earliest evolving photosynthetic lineage and that this lineage subsequently diverged to green nonsulfur bacteria, cyanobacteria, green sulfur bacteria, and purple bacteria, in that order. The conflicting trees derived from such studies indicate that extrapolating the evolution of photosynthesis from nonphotosynthesis gene trees may be invalid.

Another problem arises when only a single set of photosynthesis genes is used for phylogeny. Previous attempts to analyze the evolution of photosynthesis using photosynthetic reaction center apoproteins failed to construct a phylogeny that includes all five photosynthetic bacterial lineages, because anoxygenic photosynthetic bacteria contain only one type of photosynthetic reaction center (type I or type II), whereas cyanobacteria contain both types of reaction center. Though the two types of reaction centers share significant structural similarities (5), their sequences have diverged to such an extent that it is virtually impossible to perform a statistically meaningful phylogenetic analysis using reaction center apoproteins as markers. Thus, tracking the evolution of photosynthesis must instead rely on analysis of an alternative set of photosynthesis genes present in all known photosynthetic lineages. Here, we have attempted to resolve the issue of the early evolutionary path of photosynthesis by obtaining a large number of photosynthesis gene sequences from the green sulfur bacterium Chlorobium tepidum and the green nonsulfur bacterium Chloroflexus aurantiacus, representatives of the two main photosynthetic lineages for which only a few photosynthesis genes have previously been sequenced so far. The additional sequence information on Mg-tetrapyrrole biosynthesis has allowed the first detailed phylogenetic reconstruction of photosynthesis genes for all photosynthetic lineages.

New photosynthesis gene sequences. Using a combination of functional complementation, degenerate polymerase chain reaction (PCR), inverse PCR, and automated DNA-sequencing techniques, we obtained eight DNA segments totaling 57,534 base pairs (bp) of genomic DNA containing 50 open reading frames (ORFs) from C. tepidum, and six DNA segments with a total of 40,766 bp of DNA containing 38 ORFs fromC. aurantiacus (Fig. 1). Functions of 53 ORFs from both species were assigned with a relatively high degree of confidence based on sequence homology analysis (Table 1). Among these, 22 ORFs from C. tepidum and nine ORFs from C. aurantiacus were identified as encoding proteins involved in various steps of photosynthesis, with a total of 23 genes dedicated to bacteriochlorophyll biosynthesis (Table 1). The remaining photosynthesis genes are divided between those encoding proteins involved in carotenoid biosynthesis (one gene), protoporphyrin IX biosynthesis (two genes), carbon fixation (one gene), light-harvesting (one gene), and photosynthetic electron transfer (three genes). Another 22 ORFs were functionally assigned to products not directly related to photosynthesis (Table 1). There are also 10 ORFs from C. tepidum and four ORFs from C. aurantiacus that match conserved hypothetical proteins in the database that have no known functions. A total of three ORFs from C. tepidum and 10 ORFs from C. aurantiacus were found to be unique to the two green bacteria, having no database matches.

Figure 1

Linear representation of newly identified photosynthesis genes and their flanking sequences from C. tepidum and C. aurantiacus. The predicted protein coding regions are colored by biological role: green boxes for bacteriochlorophyll and carotenoid biosynthesis genes, red for porphyrin biosynthesis genes, magenta for genes encoding electron transfer proteins, cyan for genes involved in carbon fixation, yellow for nonphotosynthesis genes, and gray for genes of conserved hypothetical proteins with no known functions; white boxes indicate unknown ORFs. Arrows represent the orientations of transcription. Cosmid insert sequences identified from functional complementation of bacteriochlorophyll-deficient strains of Rhodobacter capsulatus (25) are delineated by brackets.

Table 1

The predicted functions and database similarity matches of the ORFs of Chlorobium tepidum andChloroflexus aurantiacus identified in this study (26). Abbreviations of organisms of best database matches are shown in brackets (the first being the best match for C. tepidum, the second being best match for C. aurantiacus): Ar, Acidiphilium rubrum;Bs, Bacillus subtilis; Cf,Chloroflexus aurantiacus; Cl, Chlorobium limicola; Crv, Chlorella vulgaris;Ct, Chlorobium tepidum; Cv,Chlorobium vibrioforme; Ec, Escherichia coli, Ef, Enterococcus faecalis; Hi, Haemophilus influenzae;Hm, Heliobacillus mobilis; Hs,Homo sapiens; Mt, Methanobacterium thermoautotrophicum; Pb, Plectonema boryanum; Ps, Pseudomonas sp. KWI-56;Rg, Rubrivivax gelatinosus; Ro,Rhodothermus obamensis; Rs, Rhodobacter sphaeroides; Sc, Saccharomyces cerevisiae;Sy, Synechocystis sp. PCC 6803; Th,Thermus thermophilus; Tm, Thermotoga maritima; Tt, Thermoanaerobacterium thermosulfurigenes; and Xc, Xanthomonas campestris.

View this table:

Unlike purple bacteria and heliobacteria, which both have large, tightly linked photosynthesis gene clusters (6), there are no major photosynthesis gene clusters in C. tepidum or C. aurantiacus. However, there are small clusters of two to three photosynthesis genes that are fully conserved in linkage among the four anoxygenic photosynthetic bacterial lineages (Fig. 2). A notable feature is that these gene groups tend to encode products that show physical interactions with each other and make up subunits of a single enzyme or protein complex. The interactions of some of the protein pairs shown in Fig. 2 have already been experimentally confirmed. These include BchI/ChlI and BchD/ChlD (7); BchD/ChlD and BchH/ChlH (8), which are subunits of Mg-chelatase; BchN/ChlN and BchB/ChlB (9), subunits of light-independent protochlorophyllide reductase; and PetB and PetC, subunits of the cytochrome bc1 complex (10). The coupling of gene order with functional and physical interactions at the encoded protein level has been previously observed in the genomes of a number of divergent nonphotosynthetic bacterial and archaeal species (11). In addition, we have observed a partial conservation of gene linkage between bchMand bchE, bchE and bchJ, andbchJ and bchG. These gene pairs tend to encode enzymes catalyzing two consecutive steps in the biosynthetic pathway.

Figure 2

Comparison of linkage of photosynthesis genes among anoxygenic photosynthetic bacteria. Conserved small clusters of two to three genes shown here tend to encode proteins that physically interact with each other. Directions of transcription for each gene are indicated by arrowed boxes. Intergenic regions of unspecified lengths are indicated by “//”.

Phylogenetic analysis. The newly obtained green bacterial photosynthesis genes allowed us to perform the first detailed evolutionary analysis of (bacterio)chlorophyll biosynthesis genes from all photosynthetic lineages. For this analysis, phylogenetic trees were constructed for both protein and DNA sequences using neighbor-joining (NJ), maximum parsimony (MP), and maximum likelihood (ML) methods (12). In all cases, the largest number of taxa available was sampled to avoid systematic errors such as long-branch attraction. Our ML analysis of gene sequences employed a method that takes into account site-to-site variability in evolutionary rates. Failure to address this phenomenon can result in long-branch attraction artifacts (13). In addition, in our rooted phylogenetic analysis, we used only conserved and reliably aligned sequence regions from the outgroup sequences in order to minimize potential phylogenetic reconstruction artifacts derived from the use of distant outgroups. In order to assess the stability of the ingroup tree topology, which could be influenced by the addition of outgroup lineages due to long branch attraction, we analyzed the phylogenetic trees with and without chosen outgroups: No alteration of the ingroup topology was found. Our nucleotide sequence analyses included only the first and second codon positions, in order to avoid potential substitutional saturation and compositional bias at the third position.

A phylogenetic tree of a gene encoding one of the three subunits of light-independent protochlorophyllide reductase, bchB/chlB, was rooted with its close homologs nifK andnifD, which encode corresponding subunits of nitrogenase, and nifE and nifN, which are responsible for the nitrogenase FeMo cofactor biosynthesis (14). Strong structural and functional similarities between bchB/chlB and thenifD/nifK gene products have been previously demonstrated (9). All three treeing methods, NJ, MP, and ML, yielded the same tree topology with bchB/chlB and thenif genes forming two distinct monophyletic groups, suggesting their evolution from an ancient gene duplication event (Fig. 3A). Thus, the nif genes can indeed be effectively used as outgroups to root the photosyntheticbchB/chlB subtree. In the bchB/chlB subtree, the purple bacterial taxa form a monophyletic group that was placed as the most basal lineage among the photosynthetic ingroup with strong bootstrap support (89%, 91%, and 98% for NJ, MP, and ML methods, respectively). The next-diverging lineage consists of a green bacterial clade with both C. aurantiacus and C. tepidum as sister taxa. The next-diverging lineage, the heliobacteriumHeliobacillus mobilis, branches before the divergence of the cyanobacteria/plant lineages. The bchB/chlB DNA tree is completely consistent with the BchB/ChlB protein tree (15).

Figure 3

Photosynthesis phylogeny based on multiple photosynthesis gene markers. (A) Phylogenetic tree for the bchB/chlB gene (with first and second codon positions) using close homologs nifD,nifK, nifE, andnifN from Bacteria and Archaea as outgroups. The phylogeny was derived using NJ, MP, and ML analyses. All three methods resulted in identical tree topology. Only the branch lengths for the MP tree are shown (length, 6601 steps; CI, 0.42; RI, 0.47). Bootstrap values >40% for NJ (first), MP (second), and ML (third) are indicated near the base of each branch. Major photosynthetic groups to which the ingroup species belong are indicated on the right of the taxon names. (B) Phylogenetic trees of bchH (with first and second codon positions) using cobN from Bacteria and Archaea and a putative Ni-chelatase in Archaea (indicated by a star) as outgroups. The three phylogenetic methods resolved slightly differently at the basal node for the photosynthetic ingroup. The MP tree (length, 10674 steps; CI, 0.52; RI, 0.54) is shown on the left, and the alternative topology from the NJ and ML analyses is shown on the right of the taxon names. (C) (Left panel) Phylogenetic tree for concatenated bchI/chlI, bchD/chlD,bchH/chlH, bchL/chlL, bchN/chlN,bchB/chlB, and bchG/chlG sequences common to all photosynthetic lineages (8504 characters for the first and second codon positions). The taxon named “Plants” is the concatenated sequence of chlI, chlD, and chlH fromNicotiana tabacum; chlL, chlN, andchlB from Pinus thunbergii; and chlGfrom Arabidopsis thaliana. A concatenated homologous sequence for bchI, bchD, cobN,nifH, nifD, nifK, and bchG from M. thermoautotrophicum is used as outgroup to all the photosynthetic lineages. The MP and NJ trees (left) differed slightly from the ML tree (right) at the basal node. (Right panel) The parsimony/distance gene phylogeny is thought to correlate with the structural divergence of (bacterio)chlorophyll pigments, for which key positions of difference are highlighted.

A similar phylogenetic tree topology was observed when the MP method was used for the bchH gene encoding the H subunit of Mg-chelatase (Fig. 3B, left). The tree is rooted with a close homolog,cobN, encoding a subunit of Co-chelatase. Another closely related gene encoding a putative Ni-chelatase (indicated by a star) in methanobacteria was also included in the outgroup. For the photosynthetic ingroup, purple bacteria again form a moderately supported (59% bootstrap value) basal group. However, the ML and NJ analysis shows a different topology at this basal node for the photosynthetic bacterial ingroup: purple bacteria and green bacteria cluster together to become a sister group to the heliobacterial/oxygenic clade (Fig. 3B, right). The protein BchH/ChlH tree topology derived from all three methods was similar to that of the DNA tree inferred by the ML and NJ methods (15). In all cases, green sulfur and green nonsulfur bacteria consistently are sister groups.

For seven other photosynthesis gene trees (bchL/chlL,bchN/chlN, bchI/chlI, bchD/chlD,bchG/chlG, bchM/chlM, and bchJ/chlJ) and protein trees (BchL/ChlL, BchN/ChlN, BhI/ChlI, BchD/ChlD, BchG/ChlG, BchM/ChlM, and BchJ/ChlJ), they typically either resolve into the same topology as observed for the bchB/chlB tree, which places purple bacteria as the most basal group, or into the alternative topology as the bchH/chlH tree, which places purple and green bacteria as sister groups (Table 2).

Table 2

Results of phylogenetic analysis of Mg-tetrapyrrole biosynthesis genes and enzymes using three analysis methods. NJ, distance-based neighbor joining; MP, maximum parsimony; ML, maximum likelihood. PB, purple bacterial most-basal phylogeny as inbchB/chlB; PG, purple bacteria and green bacteria as sister group or bifurcated basal phylogeny as in bchH/chlH; OT, other phylogenies without any common pattern.

View this table:

In light of observed incongruence among some of the individual gene/protein trees, we carried out an additional analysis combining multiple photosynthesis genes or proteins into a single large data set. This approach is thought to improve the resolution of phylogenetic reconstruction by avoiding the biases that can result from analysis of individual genes or proteins (16). For this purpose, homologs of all (bacterio)chlorophyll biosynthesis genes (gene products) shared by all photosynthetic bacterial and plant lineages were individually aligned and concatenated into a single large data set that was used to infer a phylogenetic tree (17). The combined DNA data set (bchI/chlI, bchD/chlD,bchH/chlH, bchL/chlL, bchN/chlN,bchB/chlB, and bchG/chlG) contains 8504 characters, and the combined protein data set (BchI/ChlI, BchD/ChlD, BchH/ChlH, BchL/ChlL, BchN/ChlN, BchB/ChlB, and BchG/ChlG) contains 3967 characters. The corresponding homologs of the above genes or gene products from Methanobacterium thermoautotrophicum, which are thought to be involved in cofactor F430 biosynthesis, Co chelation, and nitrogen reduction, were concatenated and used as an outgroup. Though the number of taxa in the combined data set is limited, rendering it more sensitive to the long-branch attraction, the simultaneous use of various genes with different functions makes it unlikely that a single taxon would have very high evolutionary rates for all the genes used. As shown in Fig. 3C (left side of left panel), the purple bacteria basal topology seen in most of the individual data sets is strongly supported by the MP and NJ analysis (100% bootstrap value). The tree also has 100% bootstrap support for all other nodes. The result obtained from the combined DNA data set is also consistent with that for the protein data set in the MP and NJ analyses. This confirms the conclusions that (i) green sulfur and green nonsulfur bacteria are each other's closest relatives that are rooted intermediately between heliobacteria and purple bacteria, and (ii) that heliobacteria are closest to the last common ancestor of all oxygenic lineages. Though generally congruent with the MP and NJ trees, the phylogeny derived from the ML analysis (Fig. 3C, right side of left panel) has a nearest-neighbor interchange in the node between the purple bacterial lineage and the green bacterial lineage, resulting in a topology similar to that of the ML tree forbchH/chlH (Fig. 3B, right). In view of the overall consensus of the individual and combined gene/protein analyses, purple bacteria are shown to receive the most support as the basal lineage for the photosynthetic bacteria. However, unambiguous phylogenetic determination of this branching order may have to await larger sampling of photosynthetic bacterial taxa and larger collection of photosynthesis gene sequences.

The consensus phylogenetic relationship based on the MP and NJ analysis is also supported by biochemical evidence such as structural divergence of photosynthetic pigments. For example, bacteriochlorophyll g synthesized by heliobacteria closely resembles chlorophyll a of cyanobacteria. This is consistent with the close relationship between heliobacteria and cyanobacteria determined by the phylogenetic analysis for Mg-tetrapyrrole biosynthesis genes (Fig. 3C, right panel). Indeed, the only difference between chlorophyll a and bacteriochlorophyll g ring structure is an isomerization of a double bond at ring II that occurs spontaneously in bacteriochlorophyll g upon exposure to air (19). Bacteriochlorophyll c, a light-harvesting pigment unique to green sulfur and green nonsulfur bacteria, is structurally more distantly related to chlorophyll a, because there are alterations in several functionally important side groups (Fig. 3C, right panel). In our analyses, green sulfur and green nonsulfur bacteria, which are grouped together, are also phylogenetically more distant from cyanobacteria than are heliobacteria. The structure of bacteriochlorophyll a, synthesized by both purple and green bacteria but more exclusively by purple bacteria, is most divergent relative to chlorophyll a, with alterations in both the macrocyclic ring and several side groups (Fig. 3C). These differences in bacteriochlorophyll structures are consistent with our phylogenetic placement of purple bacteria as the most divergent lineage relative to cyanobacteria on the basis of MP and NJ analysis.

Implications for early evolution of photosynthesis. Our conclusion that chlorophyll a biosynthesis evolved from a more complex bacteriochlorophyll biosynthesis pathway argues against the oft-cited Granick hypothesis for the evolution of chlorophylls (20). Granick proposed that chlorophyll a, which requires fewer biosynthetic steps in its production, evolved before bacteriochlorophylls. The molecular phylogeny for the bch/chl genes in this, and in other studies (21), strongly argues that cyanobacteria were late-evolving. If so, cyanobacteria-based chlorophyll a biosynthesis is a recent development in the course of evolution of photosynthetic pigments, which may have occurred by a shortening of the bacteriochlorophyll biosynthetic pathway. Because chlorophyll a absorbs more energetic wavelengths of light than do bacteriochlorophylls, it may be that there was a selection to use shorter wavelengths, which would provide the energy needed to drive oxidation of water.

It is of particular interest that our phylogenetic analysis of the Mg-tetrapyrrole biosynthesis genes and enzymes consistently placed green nonsulfur bacteria and green sulfur bacteria as closest relatives. The close relationship between these two different phyla is also supported by the fact that they both synthesize bacteriochlorophyll c–containing chlorosomes for light harvesting. This is in striking contrast to the different photosystem types contained in each lineage, with green nonsulfur bacteria having the type II photosynthetic reaction center and green sulfur bacteria having the type I reaction center. Phylogenetic analysis of the reaction center core polypeptides (22) shows that the reaction center apoprotein phylogenetic trees are incongruent with the pigment biosynthesis protein trees. As mentioned above, it is practically unfeasible to build a tree to compare both type I and type II apoproteins. Our further analysis (23) suggests that the evolution of pigment biosynthesis is a limiting or better determining factor for studying the evolution of photosynthesis, for which a linear phylogenetic representation of all photosynthetic lineages can be established.

It should be emphasized that there is a conceptual difference between the evolution of photosynthesis and the evolution of photosynthetic organisms. The former involves only a limited number of genes for this bioenergetic process, whereas the latter involves the whole genome, the evolution of which is often represented by strictly vertically inherited genes such as the small-subunit rRNA gene. The phylogeny of the small-subunit rRNA gene does not necessarily reflect the phylogeny of the genes for specific metabolic pathways. Indeed, detailed comparison of the phylogenetic trees of the 16S rRNA gene versus Mg-tetrapyrrole biosynthesis genes shows incongruence at the deep branches (3), suggesting that horizontal gene transfers of the photosynthesis genes may have taken place during the evolution of Bacteria. Because the Mg-tetrapyrrole biosynthesis genes are shown to have largely co-evolved, these genes may have been transferred as an entity during early radiation from ancient purple bacteria, which are known to have all of their photosynthesis genes present in a tightly linked “photosynthesis gene cluster” (6). Since the prevalence of lateral gene transfer among prokayotic microorganisms has been overwhelmingly demonstrated [see reviews in (24)], the gene phylogenies in prokaryotes may indicate only the evolution of specific metabolic processes rather than the evolution of the whole genome.

  • * To whom correspondence should be addressed. E-mail: cbauer{at}


View Abstract

Navigate This Article