Research Article

The Evolutionary Landscape of Alternative Splicing in Vertebrate Species

See allHide authors and affiliations

Science  21 Dec 2012:
Vol. 338, Issue 6114, pp. 1587-1593
DOI: 10.1126/science.1230612


How species with similar repertoires of protein-coding genes differ so markedly at the phenotypic level is poorly understood. By comparing organ transcriptomes from vertebrate species spanning ~350 million years of evolution, we observed significant differences in alternative splicing complexity between vertebrate lineages, with the highest complexity in primates. Within 6 million years, the splicing profiles of physiologically equivalent organs diverged such that they are more strongly related to the identity of a species than they are to organ type. Most vertebrate species-specific splicing patterns are cis-directed. However, a subset of pronounced splicing changes are predicted to remodel protein interactions involving trans-acting regulators. These events likely further contributed to the diversification of splicing and other transcriptomic changes that underlie phenotypic differences among vertebrate species.

Vertebrate species possess diverse phenotypic characteristics, yet they share similar repertoires of coding genes (1). Evolutionary changes in transcriptomes underlie structural and regulatory differences associated with species-specific characteristics. For example, species-dependent mRNA and noncoding RNA (ncRNA) expression patterns have been linked to mutational changes in cis- and trans-acting regulatory factors, as well as to phenotypic differences (25). However, because organ-dependent mRNA expression levels within individual species have been largely conserved during vertebrate evolution (6, 7), it seems unlikely that changes in gene expression (GE) account for the majority of phenotypic diversity among vertebrates.

Through the variable use of cis-acting RNA elements in exons and flanking introns that are recognized by trans-acting factors, different pairs of splice sites in primary transcripts can be selected in a cell type–, condition-, or species-specific manner (815). Changes in alternative splicing (AS) may therefore represent a major source of species-specific differences (1625). Here, we describe a genome-wide investigation of AS differences among physiologically equivalent organs from vertebrate species spanning the major tetrapod lineages.

Evolution of alternative splicing complexity. High-throughput RNA sequencing (RNA-Seq) data were collected from whole brain, forebrain cortex, cerebellum, heart, skeletal muscle, liver, kidney, and testis from human, chimpanzee, orangutan, macaque, mouse, opossum, platypus, chicken, lizard, and frog (26). For each species, we considered all internal exons as potential cassette AS events and created nonredundant databases of splice junction sequences formed by inclusion or skipping of each exon. RNA-Seq reads were mapped to the junction databases to determine “percent spliced-in” (PSI) values, and also to representative transcript sequences from each gene to estimate GE levels, represented as “corrected reads per kilobase transcript model per million mapped reads” (cRPKM) values (26). Orthology relationships between genes and exons were established to enable direct cross-species comparisons.

The relative proportions of orthologous exons detected to undergo AS in each sample were determined. Equal numbers of reads were randomly sampled from each RNA-Seq data set to control for coverage differences (26). AS detection is approximately twice as frequent in all analyzed primate organs as in the equivalent organs from mouse and other species (Fig. 1A and fig. S1). Moreover, there is an overall decline in AS frequency as the evolutionary distance from primates increases. These differences are significant (P < 10−10, Mann-Whitney U tests), are robust to different methods of AS frequency detection, and are independent of the variability in AS detection rates between individuals within the same species (Fig. 1A and fig. S1). Genes with the highest AS complexity in human are significantly enriched in cytoskeleton-associated functions (P < 0.03) (table S1), which suggests that AS-directed diversification of the cytoskeleton may have been a driving force in the evolution of increased cellular complexity in vertebrate species.

Fig. 1

Profiling of alternative splicing (AS) in vertebrates. (A) Relative proportions of exons undergoing AS in each sample, as measured by detection of middle exon skipping in random exon triplets, where the three exons are represented by orthologs in the analyzed species (y-axis units relative to the sample with lowest AS frequency). See fig. S1, A and B, for a more detailed version; see table S5 for details on samples, including replicates, and RNA-Seq data sets. (B) Percentage of common AS events between human and other species. (C) Symmetrical heat map of Spearman correlations from PSI profiles. For each sample, PSI values for the 1550 orthologous exons in the 11 analyzed species were estimated. See fig. S4A for a more detailed version. (D) Symmetrical heat map of Pearson correlations from gene expression (GE) profiles. For each sample, mRNA expression [log cRPKM values (26)] of 1809 analyzed orthologous genes in the 11 analyzed species were estimated. Key as in (C). See fig. S4B for a more detailed version. (E) Heat map of PSI values for 41 conserved cassette alternative exons. Rows, exons; columns, samples. Key as in (C). See fig. S11B for a more detailed version. Data are hierarchically clustered (complete method, Euclidean distance) for heat maps in (C) to (E).

Rapid evolution of organ-specific alternative splicing. We next compared AS profiles across organs and species. Approximately half of alternatively spliced exons among species separated by ~6 million years of evolution are different (Fig. 1B and figs. S2A and S3A). When clustering organ AS profiles on the basis of how overall PSI values correlate in pairwise comparisons, the samples segregate by individual species (Fig. 1C and fig. S4A). This is in contrast to clustering samples on the basis of how their overall GE levels correlate, where the samples segregate according to tissue type (Fig. 1D and fig. S4B) (6, 7). Principal components analysis confirms that species type and tissue type are the primary sources of variability underlying the overall AS and GE patterns, respectively (figs. S5 and S6). The species-dependent clustering of AS profiles is also observed when analyzing subsets of alternative exons associated with a wide range of splice site strengths, exon length, increased magnitudes of PSI change between tissues, increased read coverage, and also when using independently validated (16) PSI differences (figs. S7 to S10). These results indicate that overall organ-specific AS patterns have evolved at a much more rapid rate than organ-specific GE patterns.

However, when restricting the clustering analysis to all (n = 41) orthologous exons that are alternatively spliced in four species (human, mouse, chicken, and frog) representing the main tetrapod lineages, the samples segregate by tissue type, with similar results obtained with the larger data set of tissues and species (Fig. 1E and fig. S11, A to H). The 41 exons, on average, display a wider range of inclusion levels across the samples, indicating that they have a higher degree of regulatory potential (fig. S11I). Consistent with this observation, they are also associated with elevated exonic and flanking intronic sequence conservation, implying that they are under increased selection pressure to maintain binding sites for regulatory trans-acting factors (26). Therefore, although overall AS patterns of multiple organs distinguish vertebrate species, a small subset of exons that undergo AS in multiple species spanning ~350 million years of evolution display conserved patterns of regulation that reflect organ type.

We next investigated the relative rates at which AS and GE have evolved. From pairwise comparisons of PSI values in homologous tissues, we observe an overall increase of PSI divergence from human with evolutionary time (Fig. 2A). In contrast to results from analyzing GE divergence (Fig. 2, C and D) (6, 7), AS levels in testis have not diverged more rapidly than in other tissues (Fig. 2B), and AS events detected in neural tissues display the slowest rate of divergence (P < 10−6, Mann-Whitney U tests). A significantly higher proportion (27% more on average) of neural AS events are conserved between vertebrate species than are AS events specific to other organs (P < 0.002, Mann-Whitney U test; fig. S2B). These AS events are enriched in genes associated with synaptic transmission, axon guidance, neural development, and actin cytoskeleton reorganization (table S2), indicating that AS regulation of these processes is a highly conserved feature of vertebrate nervous system development.

Fig. 2

Different rates of AS and GE divergence. (A) Pearson correlations between human and other species when comparing PSI values pairwise for conserved tissue-specific alternative exons in each tissue. For each pair of samples, correlation is analyzed for PSI values for all exons undergoing AS in both samples. (B) Comparisons of total tree lengths (bootstrapping, 100 replicates) of PSI trees for conserved tissue-specific alternative exons from six tissues in seven species (human, chimp, macaque, mouse, opossum, platypus, chicken). Statistically significant differences between the neural and each of the other tissues are indicated (Mann-Whitney U tests). Full PSI trees for each tissue are shown in fig. S3B. (C) Pearson correlations between human and other species when comparing GE pairwise (cRPKM, log scale; 1809 orthologous genes in the 11 analyzed species) in each organ. (D) Comparisons of total tree lengths (bootstrapping, 100 replicates) of expression trees from analyzing six organs from seven species as in (B). Statistically significant differences between testis and each of the other tissues are indicated (Mann-Whitney U tests). Full expression trees for each tissue are shown in fig. S3C.

Evolution of vertebrate splicing codes. To investigate mechanisms underlying the divergence in organ AS profiles, we used a splicing code derived from mouse data (27, 28) to compare cis-regulatory elements that are predictive of tissue-dependent splicing patterns of five organs (brain, heart, skeletal muscle, liver, and kidney) from the four representative species analyzed above. The splicing code achieved high classification rates when predicting organ-specific AS patterns from sequence alone. The average true positive rate [AUC, area under the receiver operating characteristic (ROC) curve] ranges from 68 to 70% for heart exon skipping to 80 to 88% for brain exon inclusion (Fig. 3, A and B, and fig. S12). An exception was a subset of approximately 200 exon-skipping AS events in human brain, which were predicted less well (AUC = 62%) (fig. S12) (26).

Fig. 3

Inference and comparative analysis of vertebrate splicing codes. (A) ROC curves of splicing code predictions for brain-dependent exon inclusion in human, mouse, chicken, and frog. ROC curves for other tissues are shown in fig. S12. (B) AUC in percent for each of the ROC curves in (A). (C) Left: Heat map of splicing code features with significant prediction scores associated with brain-specific exon inclusion in each species (red for P < 0.05, Mann-Whitney U test); statistically significant features in common with all significant mouse features associated with brain-specific exon inclusion are shown. Right: Proportions of features significantly associated with mouse brain-specific exon inclusion also significant for brain-specific exon inclusion in other species. Error bars: 95% confidence intervals, Pearson’s χ2 proportion tests. Analyses of predictive features for other tissues are shown in fig. S12. (D) Region-specific distribution of code features significantly associated with tissue-specific exon inclusion or exclusion in all four species, as determined by their region-specific enrichment or by their predictive power as inferred by the splicing code. Splicing factors associated with code features are in square brackets. Significantly enriched features are indicated by hollow arrows; features both significantly enriched and predictive are indicated by bold arrows. Arrows are colored according to the organ with which the features are associated (refer to key).

A comparison of the most strongly predictive cis-elements accounting for each species’ organ-dependent splicing patterns revealed that these significantly overlap in most tissues (Fig. 3C and fig. S12). Such cis-elements may represent features comprising an ancestral vertebrate splicing code (Fig. 3D). However, the overlap between sets of splicing code features used in a given pair of species decreases with increased evolutionary distance (Fig. 3C and fig. S12; see also below). Thus, organ-dependent AS patterns appear to be generally controlled by significantly overlapping cis-regulatory codes, although progressive divergence in these codes likely also contributes to AS differences.

Species-specific alternative splicing is primarily cis-directed. The extent to which evolutionary change in cis-regulatory codes (versus trans-acting factors) accounts for species-dependent AS differences is not known. To address this, we used a mouse strain, Tc1, carrying the majority of human chromosome 21 (HsChr21) (29). We compared PSI values of exons from HsChr21 transcripts expressed in multiple organs (brain, liver, heart, testis) from the Tc1 strain, with PSI values of the identical exons in the corresponding human organs. We also compared PSI values for the orthologous mouse exons between wild-type and Tc1 mouse strains. For all comparisons, we analyzed a comprehensive set of 13 HsChr21 and orthologous mouse exons that were detected, using RNA-Seq data, to have undergone AS in at least one of the two species.

All analyzed HsChr21 exons that are alternatively spliced in human for which the orthologous exons in mouse are constitutively spliced are also alternatively spliced in the Tc1 mouse; likewise, all HsChr21 constitutively spliced exons for which the orthologous exons in mouse are alternatively spliced are also constitutively spliced in the Tc1 mouse (P < 0.02, both comparisons; one-sided Fisher’s exact tests; Fig. 4, A and B, and fig. S13A). For the orthologous exons that are alternatively spliced in human and mouse, we observe a significantly higher correlation between their inclusion levels in Tc1 mouse and normal human organs relative to the correlation observed when comparing inclusion levels of all orthologous human and mouse exons (r = 0.89 versus r = 0.52, P < 0.0008, one-sided Z-test; fig. S13B) (26).

Fig. 4

Human-specific AS is preserved in a mouse trans-acting environment. (A) Scatterplot comparing RT-PCR–estimated PSI values for HsChr21 exons in human tissues (squares) or for the orthologous exons in the corresponding wild-type mouse tissues (triangles) (x axis), and the same HsChr21 exons in Tc1 mouse tissues (y axis) [13 different pairs of orthologous exons, 31 total pairs of orthologous AS events (26)]. Data points are colored according to whether the represented splicing events are alternative in human and constitutive in mouse (red), constitutive in human and alternative in mouse (blue), or alternative in both species (black) (26). Dark blue: exons with PSI > 95% and < 100% in human and PSI < 50% in mouse. Identity line is in dashed gray. P values for one-sided Fisher’s exact tests are indicated. (B) RT-PCR experiments measuring PSI levels for pairs of orthologous human and mouse exons using species-specific primer pairs (26), for exons alternative in one species and constitutive or near-constitutive (PSI > 95%) in the other. Human Chr21 exons were analyzed in Tc1 mouse tissues and normal human tissues; the orthologous mouse exons were analyzed in corresponding tissues from both the Tc1 and wild-type mouse strains. Quantification of PSI levels, human gene names (followed by the exon number, when more than one exon for the same gene was studied), and tissues are indicated. Red and yellow dots indicate exon-included and exon-skipped isoforms, respectively. Note that each set of species-specific primers amplifying the orthologous splice isoforms generates size-distinct RT-PCR products.

Collectively, the results indicate that changes among predominantly conserved cis-regulatory elements are sufficient to direct the majority of the species-specific AS patterns, at least in human and mouse. However, because our results also indicate that vertebrate splicing codes diverged with increasing evolutionary distance, and because specific subsets of AS events could not be reliably predicted using the splicing code, changes in trans-acting factors likely also contributed to evolutionary differences in AS.

Species-classifying alternative splicing events in trans-acting regulatory factors. To investigate which splicing changes during evolution likely had the greatest functional impact, we identified AS events that best discriminate or “classify” species (26). These AS events have relatively large and widely expressed PSI differences between species or lineages, and were validated at a high rate by reverse transcription polymerase chain reaction (RT-PCR) assays (r = 0.90, n = 180; fig. S14). Gene Ontology enrichment analysis reveals that the corresponding genes are functionally diverse, with “nucleic acid binding” among the most frequently represented categories (Fig. 5A, fig. S14, and tables S3 and S4). Consistent with a major role for conserved cis-regulatory elements governing species-dependent AS profiles (Figs. 3 and 4), the species-classifying AS events are significantly underrepresented in exons that overlap nucleic acid binding domains, relative to other classes of alternative exons in the same genes (P < 0.04, Pearson’s χ2 proportion test; Fig. 5C).

Fig. 5

Characterization of species-classifying AS events. (A) Domain diagrams and RT-PCR experiments measuring PSI levels for representative species-classifying AS events. Predicted highly disordered protein regions are depicted by red bars above the domain diagrams. Human gene names are indicated, as are the locations of the classifying alternative exon, species, and organs (B, brain; L, liver; K, kidney; H, heart; M, muscle) in which the validations were performed, as well as quantification of PSI levels. Red and yellow dots indicate exon-included and exon-skipped isoforms, respectively. RT-PCR assays were performed as in Fig. 4. (B) Comparisons of the average number of PTBP1-related cis-features associated with mammalian-classifying and nonclassifying species-dependent AS events, and the corresponding classes of species-dependent AS events in chicken and frog. Error bars: 95% confidence intervals, Pearson’s χ2 proportion tests. Statistically significant differences are indicated. (C) Comparison of the proportion of residues in different types of coding exons in proteins with nucleic acid–binding domains that overlap folded protein domains. Classes of alternative exons (red bars) analyzed are (i) “species-classifiers,” AS events that discriminate species; (ii) “conserved AS,” exons detected as AS in at least two of the analyzed species; and (iii) “species-specific AS,” exons that are alternatively spliced in only one of the analyzed species. For each class of alternative exon, distal constitutive exons (separated from the alternative exon by at least two exons) in the same genes are analyzed (yellow bars). “Background constitutive” refers to all exons that are constitutively spliced in all tissues (white bars). Error bars: 95% confidence intervals, Pearson’s χ2 proportion tests. Statistically significant differences between species-classifying AS events and each of the other classes of AS events are indicated. (D) Proportion of exons in proteins with nucleic acid binding domains that preserve the reading frame when included/skipped in transcripts (3n exons, exons with nucleotide length multiple of three nucleotides). Types of exons analyzed are as in (C). Error bars: 95% confidence intervals, Pearson’s χ2 proportion tests. Statistically significant differences between species-classifying AS events and the other species-specific AS events are indicated. (E) Proportion of predicted intrinsically disordered amino acids in different classes of exons in proteins with nucleic acid binding domains, as described in (C). Error bars denote SE of the associated distributions. Statistically significant differences between species-classifying AS events and the other species-specific AS events are indicated (Student’s t test).

An interesting example of a species-classifying AS event is the activity-modulating exon 9 of the splicing regulator PTBP1, located between RNA recognition motifs 2 and 3 in this protein (30, 31). This exon is skipped in mammalian organs but is fully included in chicken and frog organs (Fig. 5A). Consistent with the possibility of a more variable and extensive regulatory role for PTBP1 in mammalian-specific AS, we observe significant enrichment of putative PTBP1 binding sites in sequences surrounding mammalian-specific AS events relative to sequences surrounding chicken- and frog-specific AS events (P < 0.002, one-sided Fisher’s exact test; Fig. 5B).

The species-classifying AS events in genes associated with other functions are also significantly underrepresented in exons that overlap folded protein domains (P < 0.0002, Pearson’s χ2 proportion test; fig. S15A). However, the species-classifying AS events are significantly enriched in frame-preserving exons relative to nonclassifying species-specific AS events (P ≤ 0.05, Pearson’s χ2 proportion test; Fig. 5D and fig. S15B), and they are also enriched in protein regions predicted to be disordered. Disordered regions generally reside on protein surfaces and are known to play critical roles in ligand interactions and cell signaling (32, 33), and recent work has shown that tissue-regulated exons enriched in predicted disordered sequences often function in remodeling protein-protein interactions (PPIs) (34, 35). The species-classifying AS events thus possess multiple features of functionally important exons.

Discussion. Our results show that organs of primate species have significantly higher cassette exon AS frequencies than do organs of other vertebrate species. Moreover, overall organ AS profiles more strongly reflect the identity of a species than they do organ type. This contrasts with organ-dependent differences in mRNA expression, which are largely conserved throughout vertebrate evolution [this study and (6, 7)]. We propose that the rapid divergence in AS patterns in vertebrate organs may have played a more widespread role in shaping species-specific differences than did changes in mRNA expression.

Our work offers conclusive evidence that the reassortment of splicing code features can account for the majority of AS differences between vertebrate species. Consistent with this observation, species-classifying exons identified in this study are often found in genes encoding trans-acting regulators but are underrepresented in the nucleic acid binding domains of these proteins. Instead, they are highly enriched in disordered regions, which are known to function in signaling and in mediating PPIs. Because these AS changes affect trans-acting factors involved in gene regulation, they represent an additional mechanistic basis for the remarkable diversification in AS and other transcriptomic changes associated with phenotypic change among vertebrate species.

Supplementary Materials

Materials and Methods

Supplementary Text

Figs. S1 to S15

Tables S1 to S9

References (3678)

References and Notes

  1. See supplementary materials on Science Online.
  2. Acknowledgments: We thank H. Han, D. O’Hanlon, M. Lukk, and the Cambridge Research Institute Biorepository, Genomics, and Bioinformatics Units for expert assistance; V. Tybulewicz, E. Fisher, R. Cohen, J. Wade, D. Simpson, and J. Gurdon for tissues and mouse strains; and T. Hughes, C. Ouzounis, F. Roth, and past and present members of the Blencowe laboratory for helpful discussions and comments on the manuscript. Supported by grants from the Canadian Institutes of Health Research (B.J.B. and B.J.F.); Canadian Cancer Society (B.J.B.); Genome Canada (through the Ontario Genomics Institute) and the Ontario Research Fund (B.J.B., B.J.F. and others); Natural Sciences and Engineering Research Council of Canada (B.J.F.); EMBO YIP and ERC starting Grants (D.T.O.); CIHR postdoctoral and Marie Curie IOF fellowships (N.L.B.-M.); an SNF postdoctoral fellowship (C.K.); an NSERC studentship (S.G.); and a Human Frontiers Science Program Organization long-term fellowship (M.I.). RNA-Seq data sets analyzed in this study can be accessed through GEO (GSE41338, GSE30352) and ArrayExpress (E-MTAB-513).
View Abstract

Navigate This Article