Gene Families from the Arabidopsis thaliana Pollen Coat Proteome

See allHide authors and affiliations

Science  29 Jun 2001:
Vol. 292, Issue 5526, pp. 2482-2485
DOI: 10.1126/science.1060972


The pollen extracellular matrix contains proteins mediating species specificity and components needed for efficient pollination. We identified all proteins >10 kilodaltons in the Arabidopsispollen coating and showed that most of the corresponding genes reside in two genomic clusters. One cluster encodes six lipases, whereas the other contains six lipid-binding oleosin genes, includingGRP17, a gene that promotes efficient pollination. Individual oleosins exhibit extensive divergence between ecotypes, but the entire cluster remains intact. Analysis of the syntenic region inBrassica oleracea revealed even greater divergence, but a similar clustering of the genes. Such allelic flexibility may promote speciation in plants.

Because self-recognition systems must adapt to the evolution of target molecules, they include some of the most rapidly changing proteins known. Unusual levels of genetic divergence are seen in mate recognition in plants, algae, abalone, and primates (1–4); immune responses in animals (5); and pathogen defense in plants (6). Rapid divergence of molecules controlling mate recognition in flowering plants is essential, considering diversification of most angiosperms occurred only 90 to 130 million years ago. Here, we describe the protein components of the A. thaliana pollen coat and show they display remarkable variability.

The complex extracellular pollen coating of many flowering plants uses proteins and lipids to interact selectively with receptive female stigma cells (7–10). This coating facilitates communication in plants with dry stigmas, providing a function similar to the lipid-rich exudate on the surface of wet stigmas (8). Identification and characterization of the most abundantArabidopsis pollen coat protein, the lipid-binding oleosin GRP17 (glycine-rich protein), demonstrated its role in initiating pollination (9). To discern the role of otherArabidopsis pollen coat proteins, we purified and sequenced peptides from each protein >10 kD in size and used the completed genome sequence to progress to the corresponding genes (11).

Peptide sequencing revealed the identity of all the detectable coat proteins with mobilities >10 kD; protein identity was verified using mass spectroscopy in some cases (Fig. 1) (12). The depletion of these proteins in pollen coat mutants indicated they are extracellular (13). Comparisons of peptides to the Arabidopsisgenomic sequence showed two proteins that corresponded to putative receptor kinases with extracellular domains, one that matched a potential EF-hand Ca2+ binding protein, two that fit genomic sequence for lipase proteins (extracellular lipase EXL4 and EXL6), and five that corresponded to oleosins (GRP19, GRP16, GRP17, GRP18, and GRP14). The kinases are similar to each other (BLASTP value E < 10−55) and to many other Arabidopsiskinases. The EXL proteins share characteristics with the proline/hydroxyproline–rich glycoproteins (HRGP), a loosely defined family found in plant reproductive tissues (14), including similarities to the HRGPs APG and CEX (BLASTP values E < 10−70), high proline content, and linkage to carbohydrate moieties (12). Each EXL also contains a predicted family II lipase domain; these domains can act in extracellular environments to perform acyl transfer reactions (15). Pollen coat peptides <10 kD have been characterized in other species (7), but Arabidopsisproteins of this size were not detected by our methods (12). Together, the lipases and oleosins make up >90% of the detectable pollen coat protein. The potential for lipid interactions—the oleosins affecting the size and character of lipid droplets and the lipases altering lipid composition—implicate these proteins as mediators of pollen coat behavior.

Figure 1

Purified pollen coat proteins and their identity. Coomassie stained SDS-PAGE with corresponding GenBank ID. Asterisk, EXL6 protein; d, dilute protein sample; c, concentrated protein sample.

Genes organized in clusters can promote the generation of adaptive allelic diversity through gene duplication and rearrangement events (16–18). The five GRP genes are in a chromosome 5 cluster with GRP20, a sixth oleosin (Fig. 2A). Likewise, both EXLgenes are contained within a tandem array of six putative lipases on chromosome 1 (EXL1–EXL6) (Fig. 2B). Gene prediction algorithms incorrectly identified the genes in the EXL region; we verified the EXL gene structure by comparison to the expressed sequence tag (EST) database, cDNAs, and reverse transcriptase–polymerase chain reaction (RT-PCR) (19). Each EXL gene contains five exons that encode a single lipase domain (Fig. 2, B and C). The EXL proteins share 35 to 69% amino acid identity, with conserved residues throughout (Fig. 2C). Transcription from EXL1 andEXL3-6 but not EXL2 was detected in flower buds by RT-PCR (data not shown).

Figure 2

Gene structure of pollen coat oleosins and lipases. (A) Oleosin cluster. (B) Lipase cluster. (C) Alignment of the EXL proteins by CLUSTALW. Catalytic GDS residues are in bold. (A) and (B) are drawn to scale; genes are transcribed left to right.

Previously, several GRP cDNAs were identified based on their glycine-rich content, and a single genomic clone corresponding to three cDNAs was found (20). Our data define a larger array. RT-PCR experiments confirmed expression of all six oleosins in flower buds (data not shown). Despite transcription, GRP20 was not detected in the pollen coating, perhaps due to low abundance (Fig. 1). All six GRP proteins contain a consensus oleosin domain, encoded by exon 1, and share similar 5′ upstream elements (Table 1). Exon 2 varies substantially in both length and character, each protein containing a unique repetitive motif (Table 1). This repetition parallels certain other mating genes, where repeat units within a gene are thought to allow for adaptive changes in interacting molecules. Variation in individual repeat sequences may initiate the evolution of species barriers, whereas unaltered repeat units maintain current interactions (3).

Table 1

Oleosin protein characteristics. 5′ elements are as defined in (20). Dash indicates no repetitive motif.

View this table:

Gene duplications occur at approximately 1% per gene per million years, but most duplicated genes subsequently degenerate rapidly to form a pseudogene (21). To explore the evolution ofArabidopsis oleosins, we compared the sequence ofAtGRP clusters from five Arabidopsis ecotypes, cross-fertile strains of Arabidopsis collected from different geographical locations (22). Some gene clusters, such as plant defense gene arrays, can contain a mixture of functional and nonfunctional genes (17). However, all genes of the GRP cluster were functional in the five ecotypes we surveyed. Although insertions or deletions (indels) occurred in both coding and noncoding sequence, coding region indels were constrained to multiples of three nucleotides, maintaining the open reading frames (Table 2). Most amino acid substitutions and indels within the coding sequences occurred in exon 2, as opposed to the exon 1 oleosin domain (Table 2). Indels in exon 2 frequently altered the number or order of repeat units. Nucleotide polymorphisms in exon 1 were primarily silent (16 silent/10 substitution), whereas exon 2 changes usually caused substitutions (20 silent/42 substitution). The average number of polymorphisms between the Ler and Col-0 ecotypes, counting indels as single events, was 0.70%, higher than the genomic average of 0.57% (22). However, pair-wise sequence comparisons revealed regions of higher polymorphism in both coding and noncoding sequences (Fig. 3). The pattern of change differed between ecotypes and could generate mating haplotypes consisting of unique combinations of alleles.

Figure 3

Sliding window analysis of percent pair-wise difference in the AtGRP region. Window, 500 bp. Coding regions are shaded. Dashed line, average difference between ecotypes (22).

Table 2

Pair-wise analysis of nucleotide variation betweenArabidopsis ecotypes. Indels give the number of insertions or deletions, with the number that occur in frame given in parenthesis. Πij is the total number of single nucleotide differences. π = Πij /length. N/A, not applicable. Blank cells indicate no change in the compared region.

View this table:

If the oleosin gene cluster constitutes a recognition haplotype, related species should maintain a cluster but allow divergent alleles. We identified a baterial artificial chromosome (BAC) clone containing five putative pollen oleosin genes (BoGRP1-5) (Fig. 4A) from Brassica oleracea, a species that diverged from Arabidopsis between 12 and 19 million years ago (19, 22, 23); some of these corresponded to previously described cDNAs (24). Both the Arabidopsis and B. oleracea oleosin clusters are flanked by genes sharing ≥75% identity, suggesting synteny (Fig. 4A). The B. oleraceaoleosins are also transcribed in the same direction, share conserved 5′ upstream elements, have a consensus oleosin domain in exon 1, and have a repetitive exon 2 (Table 1 and Fig. 4A). Oleosin gene homologies reflected position in the cluster, with genes in similar positions showing greater similarity (Fig. 4, A, B, and C). Although the lipid-binding function of the oleosins appears conserved, only 40 to 63% amino acid identity is observed between B. oleraceaand Arabidopsis, and only in exon 1, whereas syntenic regions in these species generally show 85% amino acid identity (23). This increased rate of change parallels that observed for alleles of the highly polymorphic self-incompatibility (SI) protein SRK, which shows 50 to 60% amino acid similarity betweenBrassica and Arabidopsis lyrata, a closely related, self-incompatible relative of A. thaliana(25).

Figure 4

Synteny between the A. thalianaand B. oleracea oleosin clusters. (A) Comparisons of the oleosin clusters and flanking DNA. Black boxes, exons; arrows, direction of transcription; black connecting bars, regionswith a BLASTN value of E < 10−30; grey connecting bars, regions with a value of E < 10−20. (B) Alignment of oleosin domains by CLUSTALW. (C) Pair-wise amino acid identity comparisons for the oleosin domain.

The organization of pollen coat genes highlights four trends observed for recognition molecules: higher than average polymorphism, repetition to allow cycles of drift and adaption in interacting molecules, organization in a cluster to facilitate allelic diversity, and generation of unique haplotypes to assist the inheritance of an entire cassette of gene. The oleosin clusters defined here suggest selective pressure to maintain multiple oleosins across species boundaries and functional copies of all six oleosins withinArabidopsis. Although wholesale elimination of oleosins was not seen and the observed single amino acid substitutions may barely affect pollination, over extended time accumulated small variations across the entire cluster might lead to speciation. Future experiments involving transfer of mating haplotypes between species will ultimately test species-specificity.

  • * To whom correspondence should be addressed. E-mail: dpreuss{at}


View Abstract

Stay Connected to Science

Navigate This Article