Genomic Footprints of a Cryptic Plastid Endosymbiosis in Diatoms

See allHide authors and affiliations

Science  26 Jun 2009:
Vol. 324, Issue 5935, pp. 1724-1726
DOI: 10.1126/science.1172983

Green for Diatoms

Diatoms account for 20% of global carbon fixation and, together with other chromalveolates (e.g., dinoflagellates and coccolithophorids), represent many thousands of eukaryote taxa in the world's oceans and on the tree of life. Moustafa et al. (p. 1724; see the Perspective by Dagan and Martin) have discovered that the genomes of diatoms are highly chimeric, with about 10% of their nuclear genes being of foreign algal origin. Of this set of 1272 algal genes, 253 were, as expected, from a distant red algal secondary endosymbiont, but more than 1000 of the genes were derived from green algae and predated the red algal relationship. These protist taxa are important not only for genetic and genomic investigations but also for their potential in biofuel and nanotechnology applications and in global primary productivity in relation to climate change.


Diatoms and other chromalveolates are among the dominant phytoplankters in the world’s oceans. Endosymbiosis was essential to the success of chromalveolates, and it appears that the ancestral plastid in this group had a red algal origin via an ancient secondary endosymbiosis. However, recent analyses have turned up a handful of nuclear genes in chromalveolates that are of green algal derivation. Using a genome-wide approach to estimate the “green” contribution to diatoms, we identified >1700 green gene transfers, constituting 16% of the diatom nuclear coding potential. These genes were probably introduced into diatoms and other chromalveolates from a cryptic endosymbiont related to prasinophyte-like green algae. Chromalveolates appear to have recruited genes from the two major existing algal groups to forge a highly successful, species-rich protist lineage.

Diatoms are well-studied members of the putative supergroup Chromalveolata [fig. S1 and supporting online material (SOM) text] and comprise unicellular, photosynthetic, dominant taxa in the marine phytoplankton. Diatoms are central to understanding oceanic primary production and biogeochemistry (1). Much effort is currently being expended to develop some taxa as models for genetic and genomic research as well as sources for biofuel (2) and nanotechnology (3). We conducted a phylogenomic analysis of the diatom proteome using complete genome data from Thalassiosira and Phaeodactylum. This procedure identified 2423 and 2533 (2423/2533) Phaeodactylum and Thalassiosira genes, respectively (this order of results is used throughout the paper and SOM), that are derived from red or green algal sources. Contrary to the expectation of the chromalveolate hypothesis (4), however, >70% of these genes are of green (not red) lineage provenance (Fig. 1, table S1, and fig. S7). This green gene contribution constitutes ≈16% of the diatom proteome. Two of the major topological classes that were uncovered are shown in fig. S2. The first class (fig. S2A) contains 442/442 trees in which both red and green algae are present, but there is robust bootstrap support for the green algae plus diatom (and other chromalveolates) clade. Of these trees, 144/133 show the green algal and diatom sequences to diverge within the Plantae kingdom, which is composed of green algae and plants, glaucophytes, and red algae (5, 6). An example is phytoene desaturase (fig. S2A), which is an early enzyme in plastid carotenoid biosynthesis. It was previously reported that 5 of the 16 genes in this photoprotective pathway are of green algal origin (7), and their occurrence in chromalveolates probably ensures a high photosynthetic efficiency under fluctuating light (8). The remaining 298/309 trees indicate an independent origin of the gene in the donor green algae, with respect to other Plantae. A second major class of trees shows an independent gene origin in prasinophytes relative to other green algae and plants, before being transferred to diatoms and other chromalveolates (Fig. 2). The absence of red algal homologs in some trees in this class may be explained by gene loss in the reduced nuclear genome of the red algal representative in our database, Cyanidioschyzon merolae (SOM text). An example tree from this class is a member of the isoprenylcysteine carboxyl methyltransferase superfamily (fig. S2B). We found four genes that encode the following gene products: naphthoate synthase (GenBank GI number 219114006), heme oxygenase (GenBank GI number 219117865), pyruvate dehydrogenase (GenBank GI number 219119135), and GUN4-like protein (GenBank GI number 219127880), which are retained in red algal plastid genomes but absent from this red-derived organelle genome in diatoms. These sequences are present in the diatom nucleus but are of green algal derivation. This suggests that red plastid–encoded genes were lost if green homologs were already present in the host nucleus.

Fig. 1

Diatom genes of a red or green algal origin that were identified using phylogenomic analysis of complete genome data. Each bar represents the total number of algal genes in the corresponding diatom species. The “gene families” bar indicates the total number of transferred genes in both diatoms after clustering the data into gene families through single-linkage hierarchical clustering. The “unresolved” category indicates that red and green algae are sisters of each other in the tree and monophyletic with diatoms (and other chromalveolates).

Fig. 2

Phylogenetic distribution of diatom genes of green algal origin among Viridiplantae. (A) Schematic tree that illustrates well-accepted phylogenetic relationships within the green lineage. (B) Venn diagram depicting the distribution of diatom green genes of prasinophyte origin. These genes support a specific sister-group relationship between prasinophytes and diatoms (and other chromalveolates). The two broad categories of gene sharing are as follows: (i) the gene is exclusive to prasinophytes (470/541), and (ii) the gene is shared with other Viridiplantae. (C) Venn diagram depicting the distribution of all diatom green genes among Viridiplantae. Here, 192/177 genes are of chlorophyte origin, whereas 145/170 genes are apparently derived from streptophytes. It should be noted that these are provisional values and will be affected by the strength of the phylogenetic signal in any given protein or the absence of data from particular groups; that is, some apparently streptophyte-specific diatom green genes may simply be explained by the loss of the genes in other Viridiplantae (such as prasinophytes).

To identify the putative sources of the diatom green genes, we examined their distribution among the green lineages (Viridiplantae). The Viridiplantae comprise two well-supported phyla, the Chlorophyta (most green algae, such as Chlamydomonas, in the core chlorophytes and the prasinophytes) and the Streptophyta (charophyte green algae and all land plants; Fig. 2A). The prasinophytes include the world’s smallest eukaryotes (the picoeukaryote Ostreococcus; cell diameter ≈1 μm), which are part of a morphologically diverse group of paraphyletic lineages diverging at the base of the Chlorophyta (9). We found that 637/716 diatom green genes (36/41%) trace their origin to the prasinophytes in our data set (Micromonas and Ostreococcus; Mamiellales clade) of which 167/175 are shared with other Chlorophyta (71/67 genes; Chlamydomonas and Volvox) or Streptophyta (23/40 genes; Arabidopsis, Oryza, Physcomitrella, and Zea) or by both phyla (73/68 genes; Fig. 2B). These 167/175 genes have a putative ancient origin in Viridiplantae. Streptophyte- and core chlorophyte–specific donors account for 192/177 and 145/170 genes, respectively (Fig. 2C). Many of these genes may ancestrally have been present in the Viridiplantae and lost by prasinophytes and/or other green lineage members, whereas the remainder represent independent horizontal gene transfers (HGTs) into streptophytes and core chlorophytes. In spite of the reduced nuclear genome of the prasinophytes in our study (≈9000 protein-encoding genes) as compared to the larger genomes of core chlorophytes and streptophytes (≈15,000 and ≈30,000 protein-encoding genes, respectively), 470/541 genes are shared exclusively between prasinophytes and diatoms (Fig. 2, B and C), of which 462/502 (98 and 93%) are present in expressed sequence tag (EST) libraries from Phaeodactylum and Thalassiosira. Because of their specific affiliation with picoprasinophytes, these genes are unlikely to represent missing sequences from Cyanidioschyzon. This diatom green gene set may therefore be gene recruitments via HGT in picoprasinophytes that were later transferred to the diatom (chromalveolate) nucleus. These sequences could hold clues to the evolution of prasinophyte green algae and their great success in different aquatic environments (10).

The fourfold higher abundance of green versus red genes in diatoms raises questions about the timing of the transfer of the green genes and whether these sequences were introduced via a single or multiple endosymbioses, or by unprecedented levels of HGT in chromalveolates. In order to address this issue, we determined the distribution of diatom green genes among chromalveolates using complete genome data. Here, the distinction between gene origins via endosymbiotic gene transfer (EGT) (11) versus HGT reflects whether the genes can be traced back to a point source (prasinophyte-like algae) and are found in most if not all chromalveolates, versus sporadic gene origin in particular lineages and from multiple different sources, respectively. Neither of these outcomes is proof but rather argues for or against one hypothesis. Using this approach, we find that 85% of the green genes can be traced back to the ancestor of both diatoms and other Stramenopiles (Fig. 3). Diatoms share 46/55 green genes with the obligate parasites apicomplexans and 54/63 genes with the plastid-lacking ciliates. Analysis of genome data from the distantly related photosynthetic coccolithophorid Emiliania huxleyi, which is a haptophyte sister to cryptophytes (fig. S1), identified >400 green genes shared with diatoms. The inclusion of ESTs from dinoflagellates and cryptophyte algae shows that even when these partial data are used, 10 and 3% of the diatom green genes are shared with these groups, respectively (fig. S3). Given these results, we suggest that despite extensive gene losses among nonphotosynthetic lineages such as ciliates and apicomplexans, the most likely explanation is that a large proportion of the diatom green genes is of an ancient provenance and predate the split of cryptophytes and haptophytes from other chromalveolates.

Fig. 3

The distribution of diatom green genes among different chromalveolates. The value for each major chromalveolate lineage represents the number of proteins that satisfy two phylogenetic criteria: (i) monophyly of diatoms and the chromalveolate lineage in question, and (ii) monophyly of this clade with Viridiplantae. The category “other Stramenopiles” includes the pelagophyte Aureococcus anophagefferens and the oomycetes Phytophthora capsici, P. ramorum, and P. sojae, which have complete nuclear genome data available. Data from the remaining taxa in this category are organelle- or EST-derived.

Taken together, our results provide evidence of a prasinophyte-like endosymbiont in the common ancestor of chromalveolates. As discussed above, prasinophytes are an anciently diverged paraphyletic group of green algae (12) that was present early on in chromalveolate evolution. In the fossil record, prasinophytes are widely distributed by the Early Cambrian (13). These cells may well have been an abundant prey source for the chromalveolate ancestor. The alternative explanation of chromalveolate polyphyly would imply an unprecedented number of independent gains (≈400) of the same green genes by diatoms and haptophytes. Therefore, our results provide strong support for a shared evolutionary history for these disparate chromalveolate lineages. In substantiated cases of serial endosymbiosis, the most recent endosymbiont provides the plastid, whereas the nuclear genome bears the footprints of past events. The dinoflagellates provide several independent examples of this phenomenon with the replacement of the broadly distributed red algal (peridinin-containing) plastid in different taxa with one of green, cryptophyte, or diatom origin (14, 15). Therefore, the presence of a red algal–derived plastid in most photosynthetic chromalveolates is most easily explained by the green algal endosymbiosis having predated the red algal capture (7, 16, 17).

A different interpretation of our green gene data is that these sequences did not derive from EGT and HGT but rather support a bona fide sister-group relationship between chromalveolates and green algae. Under this scenario, the chromalveolate ancestor contained a plastid of primary endosymbiotic origin [cyanobacterial (18)] that was shared with the green lineage and subsequently replaced by one of secondary (red algal) derivation. Although possible, this scenario is highly implausible because it not only argues against Plantae monophyly, which has been supported by recent phylogenomic studies (6, 19), but more importantly, demands that the vast majority of chromalveolate nuclear genes with nonplastid functions (actin and tubulins) be directly related to Viridiplantae. Although most single- and multigene trees clearly demonstrate Viridiplantae monophyly (20), they do not, however, support a specific affiliation between greens and chromalveolates. There is no reason to expect that this phylogenetic signal would have been lost from chromalveolate genomes while being retained by Viridiplantae. Therefore, given the known proclivity of endosymbiosis to drive intracellular gene transfer (21, 22) and the absence of evidence for a specific phylogenetic relationship between Viridiplantae and chromalveolates, outside of the 16% reported here, we suggest that the green “footprint” in chromalveolates (although substantial) probably reflects a combination of EGT and HGT rather than a host affiliation.

The rise to prominence in the oceans by diatoms and other chromalveolates such as dinoflagellates and haptophytes after the end-Permian mass extinction (250 million years ago) has been interpreted as the victory of red plastid lineages over the predominant green plastid taxa such as prasinophytes. Changing nearshore ocean chemistry is thought to underlie this globally important phenomenon (13, 23). In contrast to current thinking, our findings show that chromalveolates were already green before they acquired the red plastid. Although ancient, these two endosymbioses that were supplemented by subsequent HGTs supplied chromalveolates such as diatoms [≈100,000 extant species (24)] with the genetic potential to become some of the most ecologically successful and dominant marine primary producers on our planet.

Supporting Online Material

Materials and Methods

SOM Text

Figs. S1 to S7

Tables S1 and S2


  • * These authors contributed equally to this work.

References and Notes

  1. D.B. was supported by grants from NSF and NIH (EF 04-31117 and R01ES013679, respectively). A.M. was supported by an Institutional National Research Service Award (T 32 GM98629) from NIH. U.G.M. thanks the Deutsche Forschungsgemeinschaft (grant SFB-TR1) for support. We thank T. Mock for providing tiling array–generated diatom transcripts and J. E. DeReus at the High Performance Computing Facility at the University of Iowa for technical support.
View Abstract

Navigate This Article