Darwinian Selection on a Selfing Locus

See allHide authors and affiliations

Science  17 Dec 2004:
Vol. 306, Issue 5704, pp. 2081-2084
DOI: 10.1126/science.1103776

This article has been retracted. Please see:


The shift to self-pollination is one of the most prevalent evolutionary transitions in flowering plants. In the selfing plant Arabidopsis thaliana, pseudogenes at the SCR and SRK self-incompatibility loci are believed to underlie the evolution of self-fertilization. Positive directional selection has driven the evolutionary fixation of pseudogene alleles of SCR, leading to substantially reduced nucleotide variation. Coalescent simulations indicate that this adaptive event may have occurred very recently and is possibly associated with the post-Pleistocene expansion of A. thaliana from glacial refugia. This suggests that ancillary morphological innovations associated with self-pollination can evolve rapidly after the inactivation of the self-incompatibility response.

The shift from outcrossing to self-fertilization (selfing) is classically regarded as one of the most prevalent evolutionary transitions in flowering plants (1). The extent of selfing and outcrossing can have profound effects on the levels and partitioning of genetic diversity in plant populations, the persistence of deleterious genetic polymorphisms, the allocation of resources within plants, and the diversification of floral morphology (24). Charles Darwin proposed the earliest model for the evolution of self-fertilization, the reproductive assurance model, which suggests selfing in plants can be evolutionarily advantageous when pollinators or mates are scarce in spite of inbreeding depression (5, 6). Darwin's model also underlies Baker's Rule, which notes that colonizing species that disperse over long distances are generally self-compatible (7).

Arabidopsis thaliana is a predominantly self-pollinating plant with an outcrossing rate estimated at ∼1% (8). In the Brassicaceae, the sporophytic self-incompatibility system enforces outcrossing by preventing pollen from germinating and developing on the stigma of a pistil from the same plant. Inactivation of at least one of the components of this system was a necessary step in the evolution of selfing in A. thaliana, because an active self-incompatibility response would prevent efficient self-pollination. The self-incompatible recognition system in the Brassicaceae is controlled by the S (Sterility) locus, which comprises a gene complex containing at least two functional genes. The SRK/Aly13 gene encodes a transmembrane Ser/Thr receptor kinase expressed in the stigma, whereas the SCR/SP11 gene encodes a small Cys-rich protein found in pollen coats that acts as a ligand to the SRK receptor protein (9, 10). Studies have shown that both SRK and SCR are highly polymorphic, with allelic lineages maintained trans-specifically in several species in the Brassicaceae (1113). Specific interactions between SCR and SRK alleles results in frequency-dependent selection that maintains a large number of alleles, as well as suppressed recombination to ensure the integrity of specific allelic interactions.

Both SRK and SCR have been shown to be pseudogenes in A. thaliana, located in an ∼10-kb region of chromosome IV (14). The pseudogene SRKSRK) in A. thaliana is expressed in stigmas but has been reported to contain a premature stop codon (14). Three distinct A. thaliana SCR-like pseudogenes linked to ΨSRK have also been identified in A. thaliana. ΨSCR1 is located ∼700 base pairs (bp) upstream of ΨSRK and encodes a truncated open reading frame without three of eight conserved Cys residues believed to be required for the structural integrity of the SCR protein (14, 15). ΨSCR2 and ΨSCR3 are located ∼22 bp apart and ∼8.5 kb upstream of ΨSRK (16). These two alleles are highly truncated, do not encode long open reading frames, and share only patches of sequence similarity with the SCR signal sequence and 5′ untranslated region; given the very close proximity of these two pseudogenes, they are referred to together as ΨSCR2/3 (14). A recent transgenic study using the A. lyrata SRK and SCR genes demonstrates that both are necessary and sufficient for reestablishing the self-incompatible response in selfing A. thaliana and that the rest of the genes required to express the pollen rejection response remain largely intact in this species (16). This key result indicates that the ΨSCR and/or ΨSRK pseudogenes represent a selfing locus in A. thaliana, permitting self-fertilization to evolve sometime in the ∼5 to 6 million years since the divergence of this species from A. lyrata (17).

We sequenced alleles of ΨSCR1 in 21 A. thaliana ecotypes across the Eurasian range of this species (18). Only four nucleotide polymorphisms were observed across 881 silent sites. The level of silent-site nucleotide diversity (π) for this pseudogene was 0.0012 (Fig. 1 and Table 1), which is one-sixth the mean π of 0.007 for A. thaliana nuclear genes (19). In contrast, SP11/SCR in the self-incompatible species Brassica oleracea is highly polymorphic, with silent-site nucleotide diversity π equal to 0.321 (11). In A. lyrata, the sister species to A. thaliana, two alleles of SCR are known and also display high levels of nucleotide divergence (14). The elevated levels of nucleotide polymorphism in related self-incompatible species are consistent with the action of frequency-dependent selection acting on this gene in these outcrossing taxa, whereas the low level of nucleotide variation in ΨSCR1 suggests that the genomic region may have been the target of positive directional selection associated with the transition to selfing in A. thaliana.

Fig. 1.

Gene genealogy of A. thaliana (A) ΨSCR1 and (B) ΨSRK alleles. The three different ΨSRK haplotype groups are indicated as Hap A, Hap B, and Hap C. The scale for the branch lengths of the ΨSRK genealogy is an order of magnitude greater than that for ΨSCR1.

Table 1.

Variation at the pseudo–self-incompatibility genomic region. Position of the genes along the chromosome IV sequence (GenBank accession no. NC_003075) is given in Mb; alignment length is the length of the sequenced region; π represents silent-site estimates of nucleotide diversity.

Gene Position Alignment length (bp) Number of silent sites π
U-box gene 11.3562-11.3573 617 161View inline 0.0606
ΨSCR2/3 11.3753-11.3754 854 823View inline 0.0024
ΨSCR1 11.3822-11.3831 883 881View inline 0.0012
ΨSRK 11.3839-11.3871 2003 444View inline 0.1382
ARK3 11.3889-11.3932 834 384 0.0316
  • View inline* Intron sites are either unalignable (ΨSRK) or not present (the U-box gene), and only exon regions were analyzed. Moreover, ΨSRK is expressed, and we excluded putative nonsynonymous sites.

  • View inline Because these are pseudogenes, all sites are considered silent, excluding gaps.

  • We calculated the joint likelihood of the time since the completion of a putative selective sweep (T in 2Ne generations, where Ne is the effective population size) and the selection coefficient (β = 4Nes, where s is a selection parameter) for the ΨSCR1 locus by simulating coalescent genealogies consistent with the mutational pattern found at the pseudogene (18, 20). The joint likelihood surface (Fig. 2) demonstrates that the level of variation at ΨSCR1 is most consistent with a very recent selection event (T ∼ 0) and a selection coefficient β greater than 20. We can use a likelihood ratio test to determine whether variation at ΨSCR1 is consistent with neutrality by comparing the likelihood of β = 0 maximized over T to that of the likelihood maximized over β and T. The first quantity corresponds to the null hypothesis that the pseudogene mutation is nonneutral, whereas the second corresponds to the alternative hypothesis that the mutation is nonneutral. The analysis indicates that we can reject a model in which a neutral mutation reaching fixation explains the low levels of variation at ΨSCR1 (likelihood ratio test statistic = 5.42, P < 0.01 using χ2 approximation, P < 0.02 using simulations).

    Fig. 2.

    The joint likelihood surface of the time since the end of the selective sweep (T = 2Ne) and the strength of selection (β = 4Nes).

    These results are consistent with recent directional selection acting on the ΨSCR1 pseudogene in A. thaliana. It is possible, however, that positive selection may be acting on one of the other closely linked self-incompatibility pseudogenes and that the observed effects on ΨSCR1 arise from genetic hitchhiking (21). In order to examine this possibility, we determined allelic variation at both ΨSCR2/3 and ΨSRK.

    The ΨSCR2/3 locus is located ∼8.5 kb upstream of ΨSCR1 (14). Alleles of this pseudogene have nine single nucleotide polymorphisms across 823 silent sites in our sample of ecotypes, with silent-site π for this pseudogene equal to 0.0024 (Table 1). Although this region also has low levels of nucleotide variation, a likelihood ratio test cannot reject neutrality for ΨSCR2/3 (P < 0.1).

    The SRK self-incompatibility pseudogene is located immediately downstream of ΨSCR1 in the A. thaliana Col-0 ecotype (Table 1 and Fig. 3) (14). Three distinct allele lineages or haplogroups of ΨSRK were identified (Fig. 1), and the total nucleotide diversity estimate π was 0.078, whereas the synonymous-site π was 0.138. Members from all haplogroup classes remained transcriptionally active (18) despite the presence of disruptive mutations in most, but not all, alleles (Fig. 4). Although the three ΨSRK haplogroups are highly divergent, several lines of evidence demonstrate that they are all located in the same physical position in the genome and are thus allelic to each other (supporting online text).

    Fig. 3.

    Genealogies for genes across the pseudo–S locus of A. thaliana. Reduced nucleotide variation at the ΨSCR1 and ΨSCR2/3 allele trees results in short branch lengths, whereas the other three genes in the region have long internal branches. Positions of the genes are depicted according to the chromosome IV sequence (GenBank accession no. NC_003075). Retro denotes sequence with homology to a copia-like retrotransposon (left retrotransposon, At4g21360; right retrotransposon, At4g21363).

    Fig. 4.

    Expression and disruptive mutations of the ΨSRK gene. (A) Reverse transcription polymerase chain reaction analysis of ΨSRK for ecotypes containing haplogroup A (Col-0), B (Cvi-0), and C (Kr-0, Kas-1, and Ita-0) alleles. The ACT8 gene was amplified as a control. Both cDNA (c) and genomic DNA (g) were used for the amplification. (B) The haplogroup A and B mutations are shown above and below the ΨSRK gene diagram, respectively. Examination of the ΨSRK gene sequence from 21 ecotypes revealed multiple, independent gene-disruptive mutations in most alleles. The exon 4 premature stop mutation previously identified in Col-0 was found in 13 of the 17 haplogroup A alleles but not in the remaining four haplogroup A alleles or in haplogroups B or C. This indicates that the previously identified stop codon mutation (14) is not solely responsible for the evolutionary transition to selfing in A. thaliana. Three of the 13 haplogroup A alleles that have the exon 4 stop codon mutation also have an additional 1-bp frameshift deletion in exon 1. Of the four haplogroup A alleles that do not contain the stop mutation in exon 4, two possess a 5-bp frameshift deletion in exon 5 that alters the sequence of the encoded kinase region. The Cvi-0 ecotype (haplogroup B) has a splice site mutation at the end of intron 2, resulting in a frameshift. Analysis of the cDNA revealed no obvious inactivating mutation for ecotypes Ita-0, Kas-0, or Kr-0, all members of haplogroup C. The lengths of the exons and introns are depicted based on the Cvi-0 gene structure.

    Variation at ΨSRK has been affected by directional selection at ΨSCR1. The nucleotide diversity level at ΨSRK was higher than at the neighboring ΨSCR1 but was reduced relative to the ancestral diversity still observed in the sister species A. lyrata. Synonymous-site nucleotide diversity at ΨSRK (π = 0.138) has been reduced to ∼38% of that observed within A. lyrata (π = 0.36) (13). As many as 10 different haplotypes were observed in a global sampling of A. lyrata (13), whereas we found only three A. thaliana ΨSRK haplogroups. Furthermore, one haplogroup (haplogroup A) consisting of nearly identical haplotypes predominated, with a frequency of 81% among the sequenced alleles (Fig. 1). This level and pattern of polymorphism is consistent with incomplete hitchhiking of ΨSRK to the ΨSCR1 sweep (22, 23) and also suggests that recombination must be present in the A. thaliana pseudo–S region.

    Although recombination suppression has been observed in the S alleles of A. lyrata SCR1 and SRK, parameteric and nonparametric methods indicate that significant levels of recombination are present on the A. thaliana pseudo–S region (24) (supporting online text and table S2). We estimated the population recombination rate ρ = 2Ner, where r is the recombination rate, for the genomic region encompassing the A. thaliana self-incompatibility pseudogenes and the flanking ARK3 and U-box protein-encoding gene. The estimate of ρ for the entire surveyed region is 16 and is significantly different from zero (P < 0.001). This is a conservative estimate, because the ancestral linkage disequilibrium among ancient polymorphisms still segregates, in part, within ΨSRK (but not between the ΨSCR1 and ΨSRK alleles). Estimates of recombination with ΨSCR1 and ΨSRK combined were also calculated to evaluate whether recombination could decouple variation between these two pseudogenes (table S2); the estimate is still significantly different from zero (P < 0.02). Linkage disequilibrium is no longer required to maintain allelic interactions between ΨSCR1 and ΨSRK, and it appears that recombination has evolved after the origin but before the global fixation of the pseudogene allele at SCR, resulting in differences in the evolutionary histories we observe among genes in this region.

    These results indicate that the transition to selfing in A. thaliana arose as a consequence of positive selection on a pseudogene allele of SCR and not at SRK. The levels of nucleotide variation at this and other genes in and around the A. thaliana pseudo–self-incompatibility region allow us to define the physical limits of selection in the genome. The higher level of nucleotide variation at ΨSRK indicates that the selective sweep at ΨSCR1 is bounded at the 3′ end by the intergenic region between these two pseudogenes. This is confirmed by analysis of the ARK3 kinase gene located ∼2 kb downstream of the ΨSRK sequence (Fig. 3) (14) and of the U-box protein-encoding gene located ∼34 kb upstream of ΨSCR1 (Fig. 3) (14); both have elevated levels of nucleotide variation (Table 1). If we assume the distances between genes as determined in the Col-0 A. thaliana sequence, the adaptive sweep at ΨSCR1 affects a large genomic region between ∼10 and ∼35 kb in length.

    The selective sweep associated with the fixation of the ΨSCR1 selfing allele appears to have occurred very recently. The 95% confidence intervals of time since the adaptive sweep at the ΨSCR1 selfing locus spans 0 to ∼0.32 million years ago. In this period, glacial-interglacial climate changes occurred in 100,000-year cycles (25), and plant and animal species experienced expansions and contractions of their distributions. A. thaliana is thought to have experienced this typical pattern of geographic range distribution in Eurasia, through population expansion by colonization from refugia after glacial retreats (26, 27). The statistical estimate of the timing of the transition to selfing (T maximal at 0) is compatible with a model of post-glacial expansion of A. thaliana ∼17,000 years ago, when this species is thought to have expanded from Mediterranean and Central Asian refugia after the Pleistocene (26).

    The recent origin of selfing in a time scale coincident with recent post-Pleistocene expansion suggests that self-pollination in A. thaliana evolved in line with Darwin's reproductive assurance model (5), because such an expansion in species range would presumably be accompanied by scarcities in outcrossing mates. These findings also provide the molecular underpinnings for the adherence of this species to Baker's Rule (7), because the evolution of selfing facilitates the ability of this species to colonize habitats over long distances. This is the first demonstration that the molecular evolution of selfing alleles is driven by positive directional selection.

    The evolution of self-fertilization in plants is associated with several physiological and morphological changes, including the relative position and timing of maturation of the stamens and stigma (3). For example, flowers of A. thaliana are the smallest in the genus (petal length, 3 to 4 mm), a feature which reduces the costs associated with outcrossing, compared to all other sister species in the genus (4 to 10 mm) (28). It is likely, however, that the inactivation of the self-incompatibility genes represent the first step in the evolution of selfing in A. thaliana, because any changes in floral morphology that promote self-pollination will be deleterious if plants remain self-incompatible. Our results indicate that subsequent adaptations in floral morphology correlated with the evolution of self-pollination can quickly evolve after the loss of self-incompatibility, allowing for the rapid establishment of the selfing syndrome in colonizing plant species. Indeed, at least one inflorescence developmental gene, TFL1, shows evidence of a recent selective sweep (29), and analysis of other genes underlying morphological and physiological correlates of selfing should also reveal signatures of recent directional selection. These findings support the contention that adaptations such as those associated with key mating system innovations can occur very rapidly and allow species to exploit new habitats.

    Supporting Online Material

    Materials and Methods

    SOM Text

    Figs. S1 to S3

    Tables S1 and S2

    References and Notes

    References and Notes

    Stay Connected to Science

    Navigate This Article