The Selaginella Genome Identifies Genetic Changes Associated with the Evolution of Vascular Plants

See allHide authors and affiliations

Science  20 May 2011:
Vol. 332, Issue 6032, pp. 960-963
DOI: 10.1126/science.1203810


Vascular plants appeared ~410 million years ago, then diverged into several lineages of which only two survive: the euphyllophytes (ferns and seed plants) and the lycophytes. We report here the genome sequence of the lycophyte Selaginella moellendorffii (Selaginella), the first nonseed vascular plant genome reported. By comparing gene content in evolutionarily diverse taxa, we found that the transition from a gametophyte- to a sporophyte-dominated life cycle required far fewer new genes than the transition from a nonseed vascular to a flowering plant, whereas secondary metabolic genes expanded extensively and in parallel in the lycophyte and angiosperm lineages. Selaginella differs in posttranscriptional gene regulation, including small RNA regulation of repetitive elements, an absence of the trans-acting small interfering RNA pathway, and extensive RNA editing of organellar genes.

Selaginella moellendorffii, like all lycophytes, has features typical of vascular plants, including a dominant and complex sporophyte generation (Fig. 1, A and B) having vascular tissues with lignified cell types. Lycophytes also share traits with nonseed plants, most notably the release of haploid spores (Fig. 1C) from the sporophyte and a gametophyte generation that develops independently of the sporophyte. Because the lycophytes are an ancient lineage that diverged shortly after land plants evolved vascular tissues (Fig. 2A) (1), we sequenced the Selaginella genome to provide a resource for identifying genes that may have been important in the early evolution of developmental and metabolic processes specific to vascular plants.

Fig. 1

Selaginella morphology. (A) The diploid sporophyte body. Bar, 10 mm. (B) A shoot with two ranks of microphylls (“leaves”) and strobili. Each microphyll of a strobilus has either a mega- or a microsporangium where mega- or microspores are produced. Bar, 2 mm. (C) An orange microspore on top of a dark megaspore. These single-celled haploid spores represent the beginning of the independent haploid gametophyte generation. The microgametophye produces motile sperm and the megagametophyte eggs. Bar, 0.1 mm.

Fig. 2

(A) Phylogeny of plants. Taxa in red have sequenced genomes. (B) Gene family gains (+) and losses (−) mapped onto the plant phylogenetic tree. The minimum numbers of gene families present in the ancestors of different plant lineages are circled.

The Selaginella genome was sequenced by whole-genome shotgun sequencing (2). The assembled genome size (212.6 Mbp) is twice that determined by flow cytometry (3), indicating that the assembled genome includes two haplotypes of ~106 Mbp that are 98.5% identical at the nucleotide level. A deduced haplotype has 22,285 predicted protein-coding genes, of which 37% are supported by expressed sequence tag sequences, and 58 microRNA (miRNA) loci (2, 4). The Selaginella genome lacks evidence of an ancient whole-genome duplication or polyploidy (2), unlike all other sequenced land-plant genomes (57). Gene density in Selaginella and Arabidopsis, which has a slightly larger genome size, is very similar (2), and both genomes have gene-poor regions rich in transposable elements (TEs) and other repetitive sequences (2). Although fewer genes and smaller introns (2) contribute to a genome size smaller than Arabidopsis, this is offset by a greater proportion of TEs in Selaginella (37.5% versus 15% in Arabidopsis) (2). Long terminal repeat retrotransposons are the most abundant TEs, occupying one-third of the Selaginella genome (2).

Plant TEs and MIRNA loci are important sources of small RNAs (sRNAs) that function to epigenetically regulate TE and gene activity (8). Several observations suggest that some aspects of epigenetic or posttranscriptional gene regulation in Selaginella are unique among plants. For one, the proportion of sRNAs 23 to 24 nucleotides (nt) in length is extraordinarily small in the Selaginella sRNA population (2) compared to angiosperms (9). Nearly three-quarters of the Selaginella sRNAs (4) map to MIRNA loci and are predominantly 21 nt in length (2). In angiosperms, 24-nt siRNAs, which are generated primarily from TEs, function to silence TE activity through the RNA-dependent DNA methylation pathway (1012) and accumulate massively in specific cells of the female gametophyte (13). Because the Selaginella sRNA population was generated from sporophytic tissues, the 24-nt siRNA pathway may only be deployed during gametophyte development in Selaginella. A second distinction is the absence of DCL4, RDR6, and MIR390 loci in Selaginella, which are required for the biogenesis of trans-acting siRNAs (tasiRNAs) in angiosperms (2). Their absence suggests that tasiRNA-regulated processes in angiosperms, including leaf polarity (14) and developmental phase changes in the sporophyte (15, 16), are regulated differently in Selaginella, and possibly reflects the independent origins of foliar organs in the lycophyte and angiosperm lineages (17, 18). Finally, the Selaginella plastome sequence reveals an extraordinarily large number of RNA-edited sites (2), as do other lycophyte organellar genomes (19, 20). This coincides with an exceptionally large number of PPR genes in Selaginella (>800) (2), some of which guide RNA editing events in angiosperms (21).

Because Selaginella is a member of a vascular plant lineage that is sister to the euphyllophytes, we used comparative and phylogenetic approaches to identify gene origins and expansions coinciding with evolutionary innovations and losses in land plants. To identify such genes without regard to function, we compared the proteomes of the green alga Chlamydomonas, the moss Physcomitrella, Selaginella, and 15 angiosperm species; identified gene families that are related by homology by hierarchical clustering (2); and then mapped them onto a phylogenetic tree (Fig. 2B). The 3814 families with gene members present in all plant lineages define the minimum set of genes that were likely to be present in the common ancestor of all green plants and their descendants and include genes essential for plant function. The transition from single-celled green algae to multicellular land plant approximately doubled the gene number with the acquisition of 3006 new genes. The transition from nonvascular to vascular plant is associated with a gain of far fewer new genes (516) than the transition from a basal vascular plant to a basal euphyllophyte whose descendants include the angiosperms (1350). These numbers show that the evolution of traits specific to euphyllophytes or angiosperms required the evolution of about three times more new genes than the transition from a plant having a dominant gametophyte and simple, leafless, and nonvascularized sporophyte (typified by modern bryophytes) to a plant with a dominant, vascularized, and branched sporophyte with leaves.

In a second approach, we analyzed the phylogenies of genes known to function in Arabidopsis development (2). We identified 424 monophyletic groups of developmental genes, each group containing putatively all genes descended from a common land-plant ancestral gene (table S6). Selaginella and Physcomitrella genes are present in 377 (89%) and 356 (84%) of the 424 land-plant orthologous gene groups, respectively, indicating that the common ancestor of land plants had most of the gene families known to direct angiosperm development. Conspicuous expansions of families within different lineages resulted in different numbers of land-plant orthologs in each genome (table S6). The 27 vascular plant-specific orthologous groups likely represent genes associated with developmental innovations of vascular plants. Among them are genes regulating the meristem (CLV1 and CLV2), hormone signaling (GID1 and CTR1), and flowering (TFL2 and UFO). Homologs of genes involved in the specification of xylem (NST and VND) (22) and phloem (APL) (23) in Arabidopsis are present in Physcomitrella and Selaginella, suggesting that the developmental programs for patterning and differentiation of vascular tissues were either present in, or co-opted from, preexisting genetic programs in the ancestral land plant. The 43 groups lacking genes from Physcomitrella and Selaginella (table S6) likely identify genes that were necessary for euphyllophyte or angiosperm developmental innovations. Among this group are genes that regulate light signaling (FAR1, MIF1, OBP3, and PKS1), shoot meristem development (AS2 and ULT1), hormone signaling and biosynthesis (BRI1, BSU1, ARF16, ACS, and ACO), and flowering (HUA1, EMF1, FT, TFL1, and FD). Altogether, these results suggest that the evolutionary transitions from a nonvascular plant to a vascular angiosperm included the stepwise addition of components of some developmental pathways, especially those regulating meristem and hormone biology, as previously noted for the gibberellin signaling pathway (24, 25).

Genes involved in secondary metabolism were also investigated because plants synthesize numerous secondary metabolites that they use to interact with their environment. Three gene families involved in their biosynthesis, including those encoding cytochrome P450-dependent monooxygenases (P450s), BAHD acyltransferases (BAHDs), and terpene synthases (TSs), were analyzed. The largest of these in Selaginella is the P450 family, accounting for 1% of its predicted proteome (table S7) (2). All three families show similar evolutionary trends, with the inferred ancestral vascular plant having a small number of genes that radiated extensively but independently within the lycophyte and angiosperm lineages (figs. S6 to S13). BAHD and TS genes, which are known to be involved in the biosynthesis of volatile odorants, are apparent only in seed plants (figs. S12 to S13), likely reflecting the coevolution of seed plants with animals that pollinate flowers or disperse seeds. The independent diversification of these gene families plus the large number of Selaginella genes suggest that Selaginella not only has the potential to synthesize a repertoire of secondary metabolites that rivals the angiosperms in complexity, but that many of them are likely to be unique. Some have been shown to be of pharmaceutical value [e.g., (26)].

We have used the compact Selaginella genome sequence to uncover genes associated with major evolutionary transitions in land plants. Understanding their functions in Selaginella and other taxa, as well as acquiring the genome sequences of other informative taxa, especially charophytes, ferns, and gymnosperms, will be key to understanding the evolution of plant form and function.

Supporting Online Material

SOM Text

Figs. S1 to S14

Tables S1 to S8


References and Notes

  1. Details are given in the supporting materials on Science Online.
  2. Acknowledgments: Selaginella sequences were deposited at GenBank with the accession numbers GL377566 to GL378322.1 and HM173080. Genome sequencing and analysis were performed by the U.S. Department of Energy (DOE), Joint Genome Institute, supported by the Office of Science of the U.S. DOE, Contract DE‐AC02‐05CH11231 (I.V.G., U.H., D.L., E.L., S.L., T.M., R.O., D.R., A.S., J.S., H.V.S.). Support was provided by NSF 0844413 (J.A.B.); Japan Society for the Promotion of Science (M.H., T.N., T.F., K.M., T.M., M.S.); Ministry of Education, Culture, Sports, Science, and Technology, Japan (M.H., T.N., T.F.); NSF 0515435 and Australian Research Council FF0561326 (J.L.B.); NSF 0519970 (M.G.); NSF 0638595 (C.D.); NSF 0922742 (V.A.A.); The Lewis B and Dorothy Cullman Program (B.A.A., A.L.); NSF 1020443 (B.A.A.); Natural Sciences and Engineering Research Council of Canada (NSERC) 2982 (N.W.A.); NIH GM84051 (M.J.A.); NSERC (E.I.B.); NIH T32 GM007757 and NSERC PGS-D (M.S.B.); NSF 0607123 (J.L.B.); Life Sciences Research Foundation (NDB); National Human Genome Research Institute (NHGRI) HG004164 (M.D.); German Science Foundation (Deutsche Forschungsgemeinschaft, DFG) DR 430/4-2 (I.D.); Czech Ministry of Education 21620828 (M.E.); Jeffress Memorial Trust J-938 (E.M.E.); NSF 0744800 (M.E. and M.P.); DOE DE-FG02-04ER15542 (W.B.F. and S.L.); The Danish Council for Independent Research, Technology and Production Sciences 009-066624/274-09-0314 (J.H.); The Villum Kann Rasmussen Foundation (J.H., I.S., B.P., P.U., W.W.); NIH T32-HG00035, NSF 1020660, and NSF 1036466 (K.G.K.); NSF 0735191 (E.L.); NHGRI HG004164 (G.M); U.S. Department of Agriculture (USDA) DE-FG02-08ER64630 (T.P.M.); NSF 0228660 (R.G.O.); The Danish Council for Strategic Research 09-063090 (B.L.P. and P.U.); Marie Curie FP6 RTN ZOONET (B.P.); DFG RE 837/10-2, BMBF FRISYS 0313921 (S.A.R.); Bundesministerium fuer Bildung und Forschung, Germany, GABI-FUTURE grant 0315046 (D.M.R.); USDA NRI 2007-35318-18389 (A.W.R.); NSF 0421604 (C.S.); NIH GM065383 (D.E.S.); Burgundy Regional Council 20100112095254682-1 (D.W.); and DFG RE 837/10-2 (A.D.Z.). K. Wall, D. Hurley, and S. Hentel provided computational assistance.
View Abstract

Navigate This Article