Special Reviews

Evolutionary Dynamics of Plant R-Genes

See allHide authors and affiliations

Science  22 Jun 2001:
Vol. 292, Issue 5525, pp. 2281-2285
DOI: 10.1126/science.1061337


Plant R-genes involved in gene-for-gene interactions with pathogens are expected to undergo coevolutionary arms races in which plant specificity and pathogen virulence continually adapt in response to each other. Lending support to this idea, the solvent-exposed amino acid residues of leucine-rich repeats, a region of R-genes involved in recognizing pathogens, often evolve at unusually fast rates. But within-species polymorphism is also common in R-genes, implying that the adaptive substitution process is not simply one of successive selective sweeps. Here we document these features in available data and discuss them in light of the evolutionary dynamics they likely reflect.

Genetic variation for disease resistance is characteristic of almost all species. In both mammals and plants, gene families involved in pathogen recognition pathways—the major histocompatibility complex (MHC) and plant Resistance (R) genes—contain loci that segregate for large numbers of alleles, some of which are highly divergent one from another. The antiquity of alleles in both humans (1, 2) and plants (3) (but not Drosophila) (4) provides evidence for their evolutionary maintenance by some form of balancing selection.

In plant R-genes, polymorphism is often associated with loci that are present as tandem arrays of multiple copies. Because of promiscuous genetic exchange, paralogs within these clusters exhibit complex evolutionary relationships (5). Polymorphism in gene copy number is common in clusters and is also present in single-copy loci as the presence or absence of a locus. Copy number dynamics, intergenic exchange, and allelic diversity are all likely to be evolutionary responses to the same selective pressures for disease resistance. R-gene evolution, therefore, has both a vertical component across generations and a horizontal component throughout the genome, and each is likely to be shaped by natural selection for resistance.

The presence of ancient and many segregating alleles atR-gene loci is perplexing because disease resistance is thought to involve an evolutionary arms race between host and pathogen (6, 7). A classic arms race is one that entails a series of selective sweeps as novel R-gene alleles, capable of recognizing pathogenicity determinants [called avirulence (Avr) factors] that previously avoided detection in a plant population, spread to high frequency. Support for these evolutionary dynamics centers on the common observation that amino acids evolve at a faster rate in functionally important regions ofR-gene proteins than the corresponding rate of synonymous change (8–12). But according to the population genetics theory of selective sweeps, the rapid turnover of new R-gene specificity should cause a reduction in the age and number of alleles at a locus.

The longevity and high allelic diversity of someR-loci are inconsistent with a classic arms race and instead suggest a microevolutionary mechanism that promotes the maintenance of stable polymorphism (3). This review focuses onR-genes involved in “gene-for-gene” interactions (13) and, in particular, the class of R-genes containing leucine-rich repeat (LRR) regions. Our goal is to highlight features of R-gene variation and evolution that will be important in modeling the evolutionary dynamics of disease resistance. The striking parallels between mammalian MHC and plantR-gene variation suggest that understanding the evolutionary dynamics of disease resistance in plants will have applicability in other organisms, including humans.

Adaptive Divergence

Evidence for adaptive evolution is now commonplace in comparative studies of R-gene sequences. With few exceptions, studies draw results from the comparison of paralogs on a single chromosome. Although members of R-gene clusters often display evidence for intergenic exchange, paralogs can also be considerably diverged one from another, suggesting that exchange is no longer an active process contributing to the generation of new allelic variation.

Detecting adaptive evolution involves comparing amino acid substitution rates (K a) to synonymous substitution rates (K s) in the same gene. Under the assumption that synonymous changes approximate the neutral rate of molecular evolution, values of the ratioK a:K s greater than 1 provide evidence for positive selection for amino acid substitution. Detection of positive selection often depends on considering only those sites predicted to be important in recognition. Perhaps the most influential studies of this kind focused on class I and II MHC genes (14, 15) and revealed positive selection as a force driving antigen recognition site evolution.

In plant R-gene products, several studies pinpoint LRR domains as the major determinants of recognition specificity forAvr factors (16). LRR regions are receptor domains for specific recognition of pathogen elicitors (17) and may be involved in direct protein-protein interactions with Avr gene products of the pathogen (18). On the basis of a model of LRR protein structure, solvent-exposed residues framed by aliphatic residues are predicted to be the amino acids involved in making these direct contacts (17). The framed, solvent-exposed residues often exhibit strikingly fast rates of evolution, but other regions within the LRRs can also be seen to evolve by positive selection (11, 19), and specificity can reside outside of the LRR (20, 21). Amino acids in the LRR may also influence the interaction with host factors, thereby modulating resistance through a mechanism other than recognition (22). Experimental investigation of functional differences amongR-gene alleles has yet to be exploited fully as a tool for deciphering R-gene structure and function.

Adaptive divergence among R-gene paralogs has been investigated in tomato (8), rice (23), lettuce (11), and Arabidopsis (9,10, 12). Without exception, complex loci reveal high rates of amino acid replacement changes in the exposed residues of domain 2 (the framed region) of the LRR (Table 1), almost always being more than twice the rate of synonymous substitution. Adaptive evolution in the LRR is consistent with an evolutionary arms race in that, under this model, pathogens should impose selection to continually alter recognition specificity.

Table 1

Adaptive evolution among paralogs and between orthologs. Nonallelic comparisons of R-gene loci.K a:K s is indicated for domain 2 (the framed region) of the LRR, and the complexity indicates the number of paralogs in each accession. Comparisons of Rpp8 and Rps5 involve unpublished data of the authors (39). Rps2 combines publishedA. thaliana sequence (31) and unpublishedA. lyrata sequence of Mauricio (40).

View this table:

Rates of evolution of single-copy R-loci can be determined from orthologous comparisons between species. Estimates have been obtained for three genes, Rpm1, Rps2, andRps5, and all involve comparisons between Arabidopsis thaliana and its congener, A. lyrata. For each of theseR-loci, amino acid replacement changes have accumulated considerably more slowly than synonymous changes (Table 1). In interpreting these results, it is important to realize that although large values of K a:K sprovide strong evidence for adaptive evolution, small values do not strongly indicate its absence.K a:K s ratios represent the confluence of constraint, genetic drift, and adaptive evolution; the lower K a:K s ratios found in Rpm1, Rps2, and Rps5 are compatible with adaptive evolution, albeit at a slower rate than that seen for complex loci.

Origin of New Alleles

Does intergenic exchange in multicopy clusters facilitate adaptive host response to pathogen pressure? In principle, the clustering of related R-genes can increase the opportunity for genetic exchange among paralogs that could act as reservoirs of mutational variation (16). Evidence supporting this idea comes from experimental investigations of both spontaneous mutants (24, 25) and chimeric alleles (20) that confer novel specificity and also from differences in rates of evolution in single- versus multicopy R-genes (described above). But whether R-genes in complex loci are subject to different selection pressures than single R-genes is debated (16, 26). Furthermore, little is known about baseline rates of cluster origination, expansion and contraction, and genetic exchange among R-genes within clusters.

To begin exploring these issues, we analyzed the A. thalianagenome sequence (27). R-loci are physically arranged in the Arabidopsis genome as 49 singleR-loci and 32 clusters of 2 to 12 R-genes. Evolutionary analysis of 182 known and putative R-genes generated 20 clusters of related genes, 15 of which contain LRR regions (28). We additionally identified two sets of physically dispersed single R-gene loci (28). These data allowed us to ask whether clustered R-loci exhibit faster evolution than single R-genes (Fig. 1).

Figure 1

Evolutionary analysis of Arabidopsis known and putative R-genes. The ratioK a:K s* is plotted against cluster size for 15 LRR-containing complex R-loci and two sets of related single R-genes (28).K a:K s*, calculated as inTable 2, is shown for all pairs of R-genes at a locus with 0.01 < K s < 1. Complex loci are named by chromosome position in Mb (known R-locus if any). Cluster sizes for complex loci reflect numbers of R-genes that may be available for genetic exchange. The line indicatesK a:K s = 1.

Positive selection appears to be common in R-genes belonging to evolutionary clusters; we find at least one pairwise comparison withK a:K s greater than 1 for the exposed residues of domain 2 in 11 of 17 sets. Furthermore, evidence for positive selection is seen at all cluster sizes, including isolated R-genes, suggesting that selection acts similarly regardless of locus complexity. Nevertheless, cluster size and average rates of adaptive evolution are weakly, positively correlated (P = 0.051; one-tailed test), but the correlation explains only a small proportion of the overall variance in rates (R 2 = 0.17).

More striking is the observation that closely relatedR-genes are uncommon in the Arabidopsis genome: Divergence averages Ks = 0.46 across sets, and only three sets (I-22.3, V-16.4, and IV-8.7 = Rpp5) exhibit any pairwise divergence with K s < 0.1. As already noted, R-gene paralogs may be largely independently evolving entities, and, if so, they would be compatible with a birth/death model for their evolution (29). The rarity of evolutionary clusters with young R-genes indicates low rates of R-gene duplication, loss, and genetic exchange. This suggests that, where observed (e.g., Rpp5), natural selection may be the driving force favoring (otherwise) rare recombinants. Our observation that clusters with the most closely related paralogs show the highestK a:K s ratios suggests selection for novel alleles generated through duplication or recombination.

Allelic Polymorphism

R-gene polymorphism is an important component of variation for resistance to pathogens. Does this allelic variability represent transient polymorphism arising during the adaptive spread of novel resistance alleles, or does it represent evolutionarily stable polymorphism? This question can be answered by investigating the genealogies of resistance and susceptibility alleles. Under a selective sweep scenario, the alleles segregating at a locus must be descendants of the allele that most recently swept to high frequency. If the last selective sweep occurred recently, as is likely for rapidly evolvingR-genes, then alleles segregating at a locus should be very closely related to one another and should be nearly identical in sequence. In contrast, long-lived polymorphism for resistance and susceptibility produces alleles with more ancient common ancestries, and regions tightly linked to the site(s) under selection will show relatively greater nucleotide divergence (30).

Levels of nucleotide polymorphism at synonymous sites, or in noncoding regions surrounding an R-gene, can be used to test these alternatives because accumulated changes here reflect the ages of alleles. Studies designed to determine ages of alleles have been published for only two loci, Rps2 and Rpm1, both of which confer resistance in A. thaliana toPseudomonas syringae pathovars. A polymorphism study ofRps2 (31) found that resistance alleles were genetically similar whereas susceptibility alleles could be widely divergent from each other and from the resistance alleles. The deepest node in the gene tree separates resistance and susceptibility alleles, and these alleles differ by 29 synonymous changes and 11 amino acid replacement changes. Overall, these alleles differ by about 3% at synonymous sites, a value only slightly greater than that seen at other loci in this species. The shape of the gene tree, however, led the authors to posit a balanced polymorphism at the locus.

A stronger case for balancing selection is seen at Rpm1(3), a polymorphism for the presence or absence of the entire locus. Analysis of the junction region flanking this deletion reveals about 10% divergence between resistant and susceptible lineages, suggesting that the origin of the polymorphism dates to around the speciation event separating A. thaliana andA. lyrata. If a resistance allele is selectively deleterious in the absence of the pathogen it recognizes (32), then it is possible for a deletion (i.e., loss of function mutation) to be a balanced polymorphism (3). A similar pattern of divergence in the region flanking the insertion/deletion of Rps5 inArabidopsis has also been observed (33).

Polymorphism data for five additional R-gene loci are summarized in Table 2. Considerable differences are seen in the average pairwise synonymous divergence between alleles. Rpp13, a single-copy locus in A. thaliana conferring resistance to Peronospora parasitica, shows over 9% divergence at synonymous sites in the LRR region (but not the rest of the gene). Similar divergence is seen at each of the three loci comprising Rpp1. Alleles at all four of these loci, therefore, are candidates for balanced polymorphism.

Table 2

Polymorphism among alleles of disease resistance genes. Numbers of silent and replacement (Rep.) changes andK a:K s ratios. LRR domain 2 corresponds to the framed region of the protein (17). Only exposed residues are included in the analyses of domains 1 to 3 (19), and domain 4 includes structural residues throughout the LRR. The rest of gene excludes the LRR.K s*, which isK s calculated for the entire LRR region, provides better estimates than that for each domain. Synonymous mutation rates do not vary significantly among domains.K a:K s calculations excluded codons mutated at all three positions and included multiple hits correction. Supplemental information is available atScience Online (38).

View this table:

Moderate divergence is found among alleles of the FlaxL locus and among alleles of the Rpp8 locus inArabidopsis. Selection for variation is suggested by the fact that both loci are segregating for many (functionally distinct in the case of L) alleles. Rps4 is distinguished by its near absence of polymorphism between two resistance alleles and a susceptible allele. Only a single amino acid polymorphism (and no synonymous differences) is present in the LRR region, whereas six synonymous differences separate resistance and susceptibility alleles in the 5′ TIR and NBS domains. Ten additional amino acid replacement mutations are spread throughout the rest of the gene, but only two of these distinguish resistance and susceptibility alleles. The relative lack of divergence between functionally distinct alleles suggests that they have descended recently from a common ancestor, and this may be an indication of a recent selective sweep.

Overall, R-gene alleles show a wide range of ages, with some loci harboring old alleles that may be the product of balancing selection (Rpm1, Rps2, Rpp1, andRpp13) and others showing more modest levels of divergence. Of these latter alleles, Rpp8 and L segregate for a large number of alleles, a pattern inconsistent with an arms race. Allelic diversity at Rps4 has not been surveyed. The polymorphism data, therefore, indicate that a simple arms race model involving repeated selective sweeps may apply to, at most, a small complement of R-loci.

Unusual Relation Between Polymorphism and Adaptive Divergence

As is evident in Table 2, allelic divergence (within species) in amino acid sequence can be considerable, especially in the LRR region, and this divergence can be associated with highK a:K s ratios between pairs of alleles. Thus, the general finding of adaptively driven divergence among paralogs is also applicable to variants segregating within a locus.

What is most revealing about the divergence of alleles from the perspective of evolutionary dynamics is that these adaptive variants coexist with other alleles. Overall, there is a strong tendency for loci whose alleles have the largestK a:K s ratio (i.e., the most rapid adaptive evolution) to have the youngest alleles, as indicated by the smaller synonymous divergence in the non-LRR regions of the alleles. One dramatic example is provided by the FlaxL locus, where 13 alleles representing 12 different functional specificities differ one from another by an average of 40 amino acid replacements in domain 2 of the LRR. However, they differ little at synonymous sites (full LRR K*s = 0.029), which indicates that these alleles arose from common ancestral alleles in the relatively recent past. At the other extreme,Rpp13 and Rpp1 display less markedK a:K s ratios but appear to have considerably older alleles, with synonymous divergence (K s for the entire LRR) among alleles of 9 to 18%, more reminiscent of the extraordinary age of MHC alleles in humans.

Mutational Mechanisms

The possibility of elevated rates of mutation in domain 2 of the LRR has been suggested (8, 19), and if this was occurring, it would bias estimates of the age of alleles. We do not believe this to be the case. First, K a andK s calculations overestimate the rate of synonymous substitutions and underestimate the rate of nonsynonymous substitutions whenever K a is greater thanK s (34). Second, the number of synonymous mutations in domain 2 is not significantly greater than that seen in the other domains (Table 2). Third, we find no evidence for elevated rates of divergence between species in the LRR domains. It is therefore likely that excess synonymous mutations in the LRR indicate a relatively greater genealogical antiquity of this region. This point is reinforced by the data for Rpm1(3) and Rps5, for which the high mutational divergence is centered on the sequences flanking the insertion/deletion site rather than on the LRR.

Evolutionary Mechanisms

The most striking feature to emerge in the available data is the similarity in the patterns of evolved differences seen among alleles at individual loci and between genes belonging to evolutionary clusters. In particular, we are struck by the presence ofR-gene alleles and paralogs, representing a very wide range of evolutionary ages, undergoing rapid adaptive evolution. Furthermore, rates of adaptive evolution appear greatest between closely related R-genes, suggesting that genetic exchange has contributed to the production of new adaptive alleles. Clearly, selection plays a profound role in R-gene dynamics. However, a classic arms race involving a succession of adaptive variants may be a poor metaphor for R-gene dynamics because alleles are not young and loci are not monomorphic, as predicted by this model. This raises the question of how polymorphisms are maintained in the face of adaptive evolution, an issue that has received attention with respect to MHC in a series of reviews (35) that explore the potential roles of frequency-dependent selection and overdominance.Arabidopsis's high selfing rate (36) suggests that, at least for this species, overdominance cannot be a potent evolutionary force. Instead, it is likely that frequency-dependent selection favoring novel alleles when rare is responsible for the pattern we see at R-gene loci.

The spread of a novel resistance allele in the host plant should open the door for pathogens carrying an ancestral Avr gene to outbreak, because this novel R-allele would have reduced the frequency of the ancestral R-allele in populations. Such frequency-dependent selection requires persistence of pathogens carrying the ancestral Avr gene in a refuge or isolated population as a novel resistance allele spreads. This situation may lead to cycling (3, 37). Alternatively, fixation may occur if the pathogen is extremely virulent, the pathogen refuge is not sufficiently effective, or environmental stochasticity is strong. Demographic and ecological details are thus likely to influence the outcome of these interactions and may explain the observed variation in the ages of alleles among R-genes. The current picture of ubiquitous polymorphism may not be general, however, because the existing data are biased by the use of polymorphism to identify and clone these genes. These issues are likely to be quickly resolved as additional molecular population genetic and evolutionary data become available. What will then be needed most is ecological work to better understand short-term disease dynamics and theoretical work, as there is an almost complete absence of models exploring the age of alleles under different scenarios of adaptive evolution.

  • * To whom correspondence should be addressed. E-mail: jbergels{at}midway.uchicago.edu


Stay Connected to Science

Navigate This Article