Supplemental Data

Full Text
Evolutionary Dynamics of Plant R-Genes
Joy Bergelson, Martin Kreitman, Eli A. Stahl, and Dacheng Tian

Supplementary Material

Additional Information for Analysis in Figure 1 and Footnote (32)

We obtained accession numbers and genome sequence positions for the 182 Arabidopsis thalianaR-gene homologs in the NIBSLRRS database ( R-gene peptide and cds DNA sequences were extracted from the MIPS Arabidopsis thaliana Database ( 14 January 2001 release (mirrored at fasta33 local similarity searches (Pearson & Lipman 1988) were conducted querying each sequence against a database of them all, for both peptide sequences and coding sequences (cds's).

We identified R-gene clusters with less than 1 Mb between first and last R-gene start codons (chosen so that all known R-loci could be identified), and with cds fasta expectations less than 0.1. These criteria include 4953 of the 16471 pairwise sequence comparisons, and local nucleotide identities were as low as 52% (note that fasta expectations do not correspond to P-values here, since many of the sequences are related). Our criteria thus include evolutionary relatedness at the DNA sequence level in a definition of complex R-loci. Based on them, the genome of Arabidopsis ecotype Columbia contains 49 single R-genes and 32 complex R-loci. Cluster sizes (frequencies) are 2(13), 3(6), 4(4), 5(1), 6(1), 7(2), 8(2), 9(1), 11(1), 12(1). This distribution of R-gene cluster sizes is similar to that reported in the Arabidopsis genome sequence release (Arabidopsis Genome Initiative 2000).

cds alignments of clustered R-genes were generated from peptide sequence clustalw alignments (Thompson et al. 1994), and were checked by eye for frameshifts by comparison with cds clustal alignments and for regions of non-homology. We identified 20 clusters representing all of the complex R-loci in the genome containing R-genes sufficiently related at the DNA level for meaningful Ka and Ks calculation, having < 50% nucleotide differences for the entire coding sequence. In addition, we identified 4 sets of single R-genes with fasta expectations less than 0.1 that were more closely related to each other than to R-genes in any of the 32 clusters.

We used results of HMMer Pfam analysis (available at to identify LRRs in R-gene peptide sequence alignments. The Pfam LRR consensus sequence and those of Jones & Jones (1997) were used to examine HMMer-identified LRRs by eye; LRRs identified in any sequence in an alignment were accepted, and recognizable LRRs not identified in any sequence were ignored. The LRR region was defined as from the start of the first LRR to the end of the last LRR, and residues of identified LRRs were categorized into domains as in Table 2. LRRs were identified in 15 of the 20 evolutionary R-gene clusters, and in 2 sets of single R-genes: Set A (accession numbers At1g10920, At1g53350, AT5g35450, AT5g48620) and Set B (At1g33560, At1g50180, AT3g26470, AT4g33300, AT5g04720, AT5g47280). Set A as defined included Rpp8 (AT5g43470), which exists as a single R-gene in Columbia but as a two R-gene cluster in many ecotypes (McDowell et al. 1998, authors unpublished); Rpp8 was excluded from Set A for all analyses.

Ka and Ks were calculated as in Table 2 (see Table 2 and below).

Additional Information on Methods in Table 2.

Genbank nucleotide sequence accession numbers are AF209730-32 for Rpp13; AF098962-64, AL138641 and AL138652 for Rpp1; X87851 for Rpm1; AF093638-49 and U27081 for L. For Rps2, we used U14158 and reconstructed other sequences from Caceido et al. 1999. Analyses of Rpp8 and Rps5 included AF089710-11 (Rpp8) and AF074916 (Rps5), plus authors' unpublished data. Sequences were aligned manually in SequencherTM 3.0. Ka:Ks was calculated for pairwise comparisons using DnaSP 3.0 (Rozas & Rozas 1999); comparisons with Ks < 0.01 were excluded to avoid inflating Ka:Ks ratios.

Additional References not in Paper

J. D. Thompson, D. G. Higgins, T. J. Gibson, Nucleic Acids Res. 22, 4673-4680 (1994).

W. R. Pearson, D. J. Lipman, Proc. Natl. Acad. Sci. U.S.A. 85, 2444-2448 (1988).

J. Rozas, R. Rozas, Bioinformatics 15, 174-175 (1999).