Special Perspectives

Synteny and Collinearity in Plant Genomes

See allHide authors and affiliations

Science  25 Apr 2008:
Vol. 320, Issue 5875, pp. 486-488
DOI: 10.1126/science.1153917

Abstract

Correlated gene arrangements among taxa provide a valuable framework for inference of shared ancestry of genes and for the utilization of findings from model organisms to study less-well-understood systems. In angiosperms, comparisons of gene arrangements are complicated by recurring polyploidy and extensive genome rearrangement. New genome sequences and improved analytical approaches are clarifying angiosperm evolution and revealing patterns of differential gene loss after genome duplication and differential gene retention associated with evolution of some morphological complexity. Because of variability in DNA substitution rates among taxa and genes, deviation from collinearity might be a more reliable phylogenetic character.

Eukaryotic genomes differ in the degree to which genes remain on corresponding chromosomes (synteny) and in corresponding orders (collinearity) over time (1). For example, most eutherian (placental mammal) orders have incurred only moderate reshuffling of chromosomal segments since descent from common ancestors ∼130 million years ago (2). Indeed, karyotype evolution along major vertebrate lineages appears to have been slow since an inferred whole-genome duplication occurred ∼500 million years ago (3). Accordingly, accurate identification of orthologs across eutherian taxa is relatively routine, and deduction of synteny and collinearity is often straightforward with best-in-genome criteria (4), identifying one-to-one best matching chromosomal regions in pairwise genome comparisons.

Angiosperm (flowering plant) genomes fluctuate remarkably in size and arrangement even within close relatives, with recurring whole-genome duplications occurring over the past ∼200 million years accompanied by wholesale gene loss that has fractionated ancestral gene linkages across multiple chromosomes (5). Angiosperm genome sizes span more than 1000-fold (6), with much of the difference between some well-studied genomes in heterochromatin (7). Additionally, the reshuffling of short DNA segments by mobile elements nearly eliminates large-scale collinearity in heterochromatic regions (7).

Despite recurring whole-genome duplications, angiosperm chromosome numbers are more static than genome size, mostly within a range of less than 50-fold (6). Condensation of two chromosomes into one is known in many lineages; a particularly striking case involved the demonstration that n = 10 (chromosome number) members of the Sorghum genus are ancestral to n = 5 members of the genus (8). Indeed, Sorghum bicolor (sorghum) and Zea mays (maize) have the same chromosome number (n = 10), although maize has been through a whole-genome duplication since their divergence (9), whereas the most recent duplication in sorghum is shared with all other cereals (10). The occurrence of several condensations may explain why single arms of several maize chromosomes (10 and 5) correspond to entire sorghum chromosomes (6 and 4) (11).

Fully sequenced genomes promise to improve deductions of correspondence, toward a unified framework for comparative evolutionary analysis. In angiosperms, analysis of synteny and paleo-polyploidy are inextricably intertwined because comparative genomics in angiosperm sequences require strategies to mitigate the effects of genome duplication and fractionation. For example, Arabidopsis thaliana (thale cress) has undergone three paleo-polyploidies, including two doublings (5) and one tripling (12), resulting in ∼12 copies of its ancestral chromosome set in a ∼160-Mb genome. Further complicating the comparison of A. thaliana to other angiosperms are an additional 9 to 10 chromosomal rearrangements in the past few million years since its divergence from A. lyrata (rock cress) and Capsella rubella (pink shepherd's purse), including condensation of six chromosomes into three, bringing the chromosome number from n = 8 to n = 5 (13).

Other eudicot genomes show less-complicated genome architectures than Arabidopsis. Although still controversial, the two most recent paleo-polyploidies affecting Arabidopsis [α and β, following the usage in (5)] now appear to have occurred within the crucifer lineage (12, 14). Populus trichocarpa (poplar) underwent a duplication specific to its own salicoid lineage (15) and shares only one of the three paleo-polyploidies (γ) affecting Arabidopsis. Vitis vinifera (grape) (12) and Carica papaya (papaya) (14), the latter within the same taxonomic order (Brassicales) as Arabidopsis, each have only γ and no subsequent polyploidies (Fig. 1). Indeed, the absence of the β event in Carica (14) argues against an alternative interpretation on the basis of an analysis of a second Vitis genome (16), which suggested that β occurred in a common ancestor of Arabidopsis-Populus.

Fig. 1.

Idealized gene tree that contains multiple orthologs and paralogs in Populus, Arabidopsis, Carica, and Vitis. For illustration purpose, this has assumed equal evolutionary rates along all branches and no gene loss following polyploidy. The polyploidy events are represented as black circles and labeled α and β within the Arabidopsis lineage (5), salicoid duplication p in Populus (15), and γ, which is shared by all four species (12, 14).

Synteny can be identified through the clustering of neighboring matching gene pairs; however, differences in gene density and tandem gene arrays among species may cause statistical artifacts. Collinearity, a more specific form of synteny, requires common gene order. Collinearity and synteny have traditionally been identified by looking for one-to-one (pairwise) conservation between species. To take better advantage of new genomic resources as they become available, multiway collinearity analyses are needed, with progressive alignments accompanied by statistical evaluation and iterative refinement (4). In angiosperms, such multiple alignments offer the further advantage of helping to unravel the consequences of genome duplications.

One partial solution for inferring ancestral gene orders in angiosperms has been a bottom-up approach, in which the most recently duplicated segments are interleaved to generate hypothetical intermediates that are further recursively merged (5). However, this approach requires an additional cycle of deductions for each duplication event and compounds any errors. An alternative top-down approach requires only one cycle of deduction by simultaneously searching for and aligning all structurally similar segments across multiple genomes and subgenomes. The top-down approach should be more sensitive because it can incorporate transitive homology (17), in which segments A and B have undergone reciprocal gene loss and no longer show correspondence to each other but both correspond with a third segment, C. Relationships among such degenerated duplicated regions, easily missed by a bottom-up approach, can often be resolved by comparison to another genome that does not have the duplication or that underwent independent gene loss. Such comparisons have clarified synteny among yeast species (18).

Top-down analyses show a high degree of collinearity between Arabidopsis, Carica, and Populus (14). For example, we identified three branches each containing orthologous segments from up to four Arabidopsis, one Carica, and two Populus genomic region(s), suggesting paleo-hexaploidy in a common ancestor of these species (Fig. 2A). Applying these methods to the Vitis (grape) genome validated the reconstructed order and inferred triplicated structure of a common Arabidopsis-Carica-Populus ancestor. Vitis is a eudicot outside of the two eurosid clades that contain Arabidopsis-Carica (eurosids II) and Populus (eurosids I) (19), therefore providing an independent lineage suitable to test the gene order alignments. Paleo-hexaploidy (triplication) has also been suggested over 94.5% of the Vitis genome (12). When the Arabidopsis-Carica-Populus consensus is aligned to Vitis, the two independently inferred triplication patterns correspond closely (Fig. 2B). Thus, top-down gene order alignment revealed genome triplication that eluded prior detection in Arabidopsis (5) and Populus (15) and also supported the conclusion that the triplication occurred in a common ancestor of Vitis, Arabidopsis, Carica, and Populus (12).

Fig. 2.

Typical view of multiple collinear regions among several eudicot genomes. Triangles represent individual genes and their transcriptional orientations. Genes with no syntenic matches to the selected regions are not plotted. (A) Alignment among Arabidopsis (green), Carica (magenta), and Populus (blue) chromosomal regions. The whole alignment reveals four distinct duplications, illustrated in Fig. 1. The regions are grouped into three consensus γ-subgenomes (Con γA, γB, γC) on the basis of parsimony. Aligned genes within each γ subgenome are merged into an inferred order by consensus. (B) The inferred γ partitions are validated with the Vitis genome (red) because each γ subgenome clustered in (A) has only one closely matching Vitis chromosomal region.

The emerging unified framework for comparative evolutionary analysis of angiosperm genes and genomes will improve in power and precision as more genomes are sequenced. However, the current framework remains bipolar because we can identify extensive synteny and collinearity within core eudicots and grasses, respectively, but much less between the two groups because of longer evolutionary distance and more genome rearrangements. Collinear orthologs between rice (Oryza sativa) and the four core eudicots account for only ∼15% of Oryza genes distributed over about half of the genome. The longest Oryza-Arabidopsis collinear segment contains 23 orthologous gene pairs but is improved twofold, to 47, by incorporating Vitis. Additional monocot sequences from noncereal genomes such as Musa acuminata (banana) or Ananas comosus (pineapple), along with sequences of basal eudicots such as Eschscholzia california or Papaver somnifera (California or opium poppy) and Aquilegia formosa (columbine), and basal angiosperms such as Amborella trichopoda (no common name), may further improve detection of collinearity and synteny across major angiosperm clades.

Pan-angiosperm genome comparisons show correlated patterns of gene retention and loss in paleo-polyploid lineages. Alignments of multiple descendant chromosomes after polyploidy events reveal cases in which ancestral genes were deletion-resistant, consistently being preserved in syntenic subgenomes (20). Such preferential conservation of genes from particular families such as MADS-box genes (21) and other transcription factors may contribute to increasing morphological complexity (22). The opposite case is that of gene functional groups for which members have been consistently restored to one copy after multiple polyploidy cycles, suggesting that there are advantages in having only single copies of these genes (20).

Because of variability in DNA substitution rates among plants, deviation from collinearity might be a more reliable phylogenetic character. DNA substitution rates can be highly variable among seed plant lineages, with extreme cases showing 100-fold variation within the same genus on the basis of a study of mitochondrial genes (23). Analysis of rare changes (when compared to DNA substitutions) in genomic structure—such as specific rearrangements of gene order, insertions, or deletions—provides an informative and robust way to resolve relationships among many lineages (24). In retrospect, early inferences on polyploidy in angiosperms and vertebrates were initially confused by gene phylogenies but later resolved with synteny (12, 25).

Improved synteny and collinearity alignments emerging from top-down approaches applied to multiple genomes and subgenomes are a potential foundation for reconstruction of the ancestral state(s) of angiosperm genomes. Consensus gene orders within syntenic blocks can be approximated on the basis of top-down alignments. Ordering among the syntenic blocks themselves on the macrolevel is more difficult; however, several combinatorial algorithms exist to reconstruct ancestral genomes under a most-parsimonious rearrangement scenario (26). The resulting orders would reveal not only shared but also divergent genes inserted into novel locations, underlining lineage-specific changes. Additional genome sequences will improve power to resolve gene orders at the microlevel and also contribute to identifying functionally important DNA, such as the evolutionarily constrained elements among 28 vertebrate genomes (4).

References and Notes

View Abstract

Navigate This Article