Evolution of a Novel Phenolic Pathway for Pollen Development

See allHide authors and affiliations

Science  25 Sep 2009:
Vol. 325, Issue 5948, pp. 1688-1692
DOI: 10.1126/science.1174095

From Retrogene to Phenolic Metabolism

Metabolic plasticity, which involves the creation of new genes, is an essential feature of plant adaptation and speciation. Studying plants from the mustard family, Matsuno et al. (p. 1688) show that variants of the cytochrome P450 enzyme family were derived through retroposition, duplication, and subsequent mutaton. Evolutionary changes increased the volume of the substrate pocket altering with what sorts of substrates the enzymes could interact. The enzymes formed the basis for a new metabolic pathway, the products of which include constituents of pollen and of phenylpropanoid metabolism.


Metabolic plasticity, which largely relies on the creation of new genes, is an essential feature of plant adaptation and speciation and has led to the evolution of large gene families. A typical example is provided by the diversification of the cytochrome P450 enzymes in plants. We describe here a retroposition, neofunctionalization, and duplication sequence that, via selective and local amino acid replacement, led to the evolution of a novel phenolic pathway in Brassicaceae. This pathway involves a cascade of six successive hydroxylations by two partially redundant cytochromes P450, leading to the formation of N1,N5-di(hydroxyferuloyl)-N10-sinapoylspermidine, a major pollen constituent and so-far-overlooked player in phenylpropanoid metabolism. This example shows how positive Darwinian selection can favor structured clusters of nonsynonymous substitutions that are needed for the transition of enzymes to new functions.

Plant adaptation relies on a remarkable metabolic plasticity that is reflected by the evolution of large gene families. New family members may emerge by segment duplication (sometimes followed by exon shuffling), gene fusion, or retroposition; that is, reverse transcription of mRNAs followed by insertion of the intronless cDNA into the parent genome (1). A striking example of duplication for gene cluster evolution in the triterpenoid metabolism was recently described (2). A survey of the Arabidopsis and rice genomes identified several retroposons (3, 4), but retrogene evolution for the acquisition of novel functions was well described only in mammals or insects (1).

We recently performed a global coexpression analysis to predict the function of orphan cytochrome P450 genes in Arabidopsis thaliana (5). A candidate emerged with a predicted function in a new branch of phenolic metabolism. CYP98A8 (AT1G74540) and its paralog CYP98A9 (AT1G74550) are two chromosome 1–clustered duplications of an ancestor of CYP98A3, the latter previously shown to meta-hydroxylate the p-coumaric esters of shikimic/quinic acids to form lignin monomers (6, 7). CYP98A8 and CYP98A9 share only about 50% protein identity with CYP98A3, but they do belong to the monophyletic CYP98A clade that includes all confirmed 4-coumaroyl shikimate/quinate meta-hydroxylases; but in protein phylogenies, CYP98A8 and CYP98A9 appear separated from vascular plant sequences by CYP98s from conifers and moss (fig. S1B). This means either that CYP98A8 and CYP98A9 diverged before angiosperm evolution or that they appeared recently and evolved fast, and were thus placed at the base of the clade because of long-branch attraction. In favor of the latter hypothesis, cDNA phylogenies show that CYP98A8 and CYP98A9 form a sister group with CYP98A3 (Fig. 1A and fig. S1D), and no potential orthologs of CYP98A8 exist in the TIGR Plant Transcript Assemblies and PlantGDB databases, except for a single transcript assembly (CYP98A53) from Brassica napus. CYP98A8 and CYP98A9 are intronless, whereas all other known higher plant CYP98As contain two introns at conserved positions, including one that is considered a signature of the expanded CYP71 clan to which the CYP98 family belongs (8). This indicates a gene birth event via mRNA-mediated transposition.

Fig. 1

(A) DNA phylogeny and codon substitution analysis based on near–full-length open reading frames (alignment shown in fig. S1C). Maximum parsimony and maximum likelihood analyses were performed using the Phylip package. Shown is the most parsimonious tree topology (input tree used for further analyses) with bootstrap values (100 replicates). After selection of background and foreground sequences (CYP98A8, CYP98A9, and CYP98A53), the branch site model A implemented in codeml of the paml (v.4) package was used to estimate substitution rates (ω or dN/dS) per codon in the background (ω0) and the foreground (ω1) lineages. Branch lengths are drawn to scale (substitutions per codon) as indicated. For details, see (11). Replicate analysis using a maximum likelihood phylogeny is shown in fig. S1D. (B) Overall view of CYP98A3/p-coumaroylshikimate complex homology model (32). Residues under positive selection in CYP98A8 and CYP98A9 are labeled in dark blue and represented as spheres (carbon, yellow; oxygen, red; nitrogen, blue) at the equivalent locations in CYP98A3. Also shown as spheres are heme (carbon, pink) and p-coumaroylshikimate (carbon, cyan). Red solid circle, active site residues; blue dotted ellipses, surface residues. For a close-up view of the active site, see fig. S4.

Ratios of nonsynonymous and synonymous substitution rates (dN/dS or ω) were first estimated for each branch across the complete sequences with the use of the free-ratio model (9). This already indicated high ω values in the branches leading to CYP98A8, CYP98A9, and CYP98A53, but it was primarily used to identify appropriate background sequences that are clearly under purifying selection (ω < 0.1) across the complete coding region. We expected only a fraction of the coding sequence (the substrate-binding site) to be under positive selection. Thus, the selected background sequences (under purifying selection) were compared with the foreground composed of CYP98A8, CYP98A9, and CYP98A53 by using the branch-site model of positive selection, which tests for positive selection at only a few sites along a few lineages (10, 11). Whereas the ω estimate for the background was close to zero (ω0 = 0.05), a ratio far above 1 was found for the foreground (ω1 = 4.81, Fig. 1A). To ensure positive selection, this model was compared to a null hypothesis assuming neutral evolution in the foreground (that is, ω1 was fixed to 1). Based on a likelihood ratio test (2Δl = 8.94), the hypothesis assuming positive selection in the foreground CYP98A8, CYP98A9, and CYP98A53 clade (Fig. 1A) was accepted (P < 0.01) with a critical value at the 1% levels of 5.99 (12). The most likely scenario is thus that CYP98A8 and CYP98A9 evolved recently at an accelerated rate and under positive Darwinian selection.

Seventeen amino acid residues show a high posterior probability of positive selection (based on the Bayes Empirical Bayes method) (fig. S2). Homology modeling based on the CYP2C8 crystal structure (13) was used to predict the resulting differences between CYP98A3, CYP98A8, and CYP98A9 (Fig. 1B and figs. S3 and S4). Residues under positive selection are spatially clustered into three groups (Fig. 1B). One of these groups includes four residues that are located on the roof of the active site, at the end of helix F and in helix F′, forming the cavity distal from the heme that interacts with the conjugate moiety of the substrate. Moreover, three residues are missing from helix B′ in CYP98A8 and CYP98A9 as compared with CYP98A3. This region was described in CYP2C8 as a potential substrate-access channel (13, 14). In the CYP98A8 and CYP98A9 models, this cavity and the substrate-access channel are larger. In contrast, active-site residues interacting with the coumaroyl moiety of the CYP98A3 substrate in the vicinity of the heme show strong conservation and a low probability of positive selection (fig. S4). The other two groups under positive selection are located on the protein surface. One is spatially near the N terminus and could participate in membrane interaction. The second forms a large cluster in a region that is responsible for active-site opening via translation of helices F and G in other P450s [supporting online material (SOM) text]. Modifications in CYP98A8 and CYP98A9 structures are thus highly clustered. They are predicted to enlarge the distal region of the active-site cavity and substrate-access channel as compared with CYP98A3. They may also suggest a change in enzyme plasticity. Rapid evolution of the CYP98A8 and CYP98A9 genes thus appears to result in functional adaptation.

Analysis of in silico transcriptome (Fig. 2A) (5) and of promoter;;β-glucuronidase (GUS) transformed plants (Fig. 2B to 2O) (11) indicates a shift from predominant expression in the vasculature of roots, stems, and flowers for CYP98A3 to a very strong expression in inflorescence tips, young flower buds, and stamen (especially tapetum and pollen) for both CYP98A8 and CYP98A9, and root tip for CYP98A9. Expression in aging vasculature was, however, also detected for CYP98A9. The evolution of CYP98A8 and CYP98A9 thus seems related to the recruitment of a resident promoter (SOM text and fig. S5A) resulting from retroposition that favored the acquisition of a new function in anther and pollen development.

Fig. 2

(A) Selected expression data were retrieved from the Genevestigator database and processed as described in (7). Experiments highlighting the range of organ and tissue expression are shown as a log2 transformed heat map, with bright yellow indicating greater than 32-fold expression above background (see table S2 for details). susp., suspension. (B to O) Histochemical localization of GUS gene expression under the control of CYP98A3 [(B), (E), (H), (K), and (N)], CYP98A8 [(C), (F), (I), and (M)], or CYP98A9 promoters [(D), (G), (J), (L), and (O)]. (B) to (D) In rosette leaves, expression was observed in the vascular system for CYP98A3::GUS and in main veins only for CYP98A9::GUS (magnification ×4). (E) to (G) Both CYP98A8::GUS and CYP98A9::GUS were expressed in developing anthers, whereas CYP98A3::GUS expression was detected in the petal vascular system (×15). (H) to (J) In roots, GUS expression was observed in vascular tissue (stronger for CYP98A3), except for CYP98A8::GUS lines (×82). (K) and (L) Details of expression in pistils. Expression is strong in the transmitting tissue of CYP98A3::GUS only (×22). (M) Anther thin sections revealed strong expression in the tapetum in CYP98A8::GUS plants (×800) and CYP98A9::GUS (not shown in the figure). (N) and (O). In the root tip, a strong signal was detected in CYP98A9::GUS transgenic lines only (×1000).

CYP98A8 and CYP98A9 catalytic function was investigated by ultraperformance liquid chromatography–mass spectrometry (UPLC-MS/MS). A major metabolite that is present in wild-type flower buds, N1,N5-di(hydroxyferuloyl)-N10-sinapoylspermidine (735 daltons) (1517), was replaced by lower amounts of a more hydrophilic compound, N1,N5,N10-triferuloylspermidine (673 daltons), in homozygous cyp98A8 insertion mutants (SALK_131366; Fig. 3; figs. S5B, S6, and S7; and table S1). Conversely, a significant increase of the 735-dalton compound was observed in the buds of wild-type plants transformed with a CaMV35S::CYP98A8 construct (Fig. 3). We thus concluded that CYP98A8 catalyzes the meta-hydroxylation of the three triferuloylspermidine phenolic rings, which is analogous to CYP84 function in the phenylpropanoid metabolism (18).

Fig. 3

UPLC-MS/MS analysis of methanol extracts of young inflorescences of wild-type and mutant plants. Shown is UV absorbance at 320 nm. For compound fragmentation and UV spectra, see fig. S6. The peaks eluting between 10 and 15 min are flavonoids.

N1,N5,N10-triferuloylspermidine [obtained by chemical synthesis (11)] was then incubated with the reduced form of nicotinamide adenine dinucleotide phosphate (NADPH) and CYP98A8 in recombinant yeast microsomes (11). This gave rise to mono-, di-, and a trace of tri-hydroxylated products (fig. S7), thus confirming triferuloylspermidine meta-hydroxylase activity.

CYP98A9-silenced mutants were generated by RNA interference (RNAi), because no insertion mutant was available (11). A line with 70% suppression of CYP98A9, but no modification in CYP98A8 expression (table S1), was selected for UPLC-MS/MS profiling. Significant reduction in N1,N5-di(hydroxyferuloyl)-N10-sinapoylspermidine in flower buds was concomitant with the appearance of N1-hydroxyferuloyl-N5-caffeoyl-N10-sinapoylspermidine (705 daltons; Fig. 3 and fig. S6). The contribution of CYP98A9 to phenolamide synthesis was confirmed by a strong increase of the 735-dalton compound in the wild-type plants transformed with a CaMV35S::CYP98A9 construct (Fig. 3). No hydroxylation of triferuloylspermidine was, however, observed with recombinant CYP98A9 (fig. S7).

A homozygous cyp98A8 mutant was crossed with homozygous cyp98A9 RNAi-silenced plants. Profiling of the buds from F2 homozygous cyp98A8 mutants in which CYP98A9 was confirmed to be silenced (table S1) showed a minor residual peak of N1,N5,N10-triferuloylspermidine and the accumulation of significant amounts of N′-feruloyl-N′′,N′′′-dicoumaroylspermidine and N′-coumaroyl-N′′,N′′′-diferuloylspermidine (Fig. 3). The latter were also detected in low quantities in cyp98A8 plants. Reduced levels of triferuloylspermidine were, on the other hand, detected in cyp98A8 as compared to the N1,N5-di(hydroxyferuloyl)-N10-sinapoylspermidine found in the wild type. This suggested that both CYP98A8 and CYP98A9 contribute to triferuloylspermidine formation. Recombinant CYP98A8 and CYP98A9 were therefore incubated with N1,N5,N10-tri(4-coumaroyl)spermidine (11). Both enzymes led to NADPH-dependent and sequential hydroxylation into mono-, di-, and tri-caffeoylspermidine (fig. S7). CYP98A8 and CYP98A9 can thus act redundantly as tricoumaroylspermidine meta-hydroxylases. Accumulation of mono- and di-feruloyl intermediates in the null cyp98A8 and null cyp98A8/CYP98A9 RNAi double mutants (Fig. 3 and fig. S6) indicates that the substrate can be released from CYP98A9 between each hydroxylation step, resulting in product methylation. It also suggests that the methylated intermediates can be processed further only by CYP98A8.

CYP98A3 was previously reported to catalyze the meta-hydroxylation of p-coumaroyltyramine with low efficiency (19), so its activity on CYP98A8 and CYP98A9 substrates was tested (fig. S7). As compared with its preferred substrate, p-coumaroylshikimate, CYP98A3 had no activity on triferuloylspermidine but had weak activity on tri(p-coumaroyl)spermidine. Both CYP98A8 and CYP98A9 have completely lost shikimate/quinate ester hydroxylase activities (6). This confirms that CYP98A8 and CYP98A9 activities result from adaptive evolution.

On the basis of coexpression, CYP98A8 was recently suggested to act in concert with AT2G19070 and AT1G67990 to form an overlooked phenolic pathway (5). Our findings validate this prediction, because both the spermidine hydroxycinnamoyl transferase (SHT) and the O-methyltransferases AtTSM1 have recently been shown to be involved in the formation of N1,N5-di(hydroxyferuloyl)-N10-sinapoylspermidine (16, 17). This novel pathway (Fig. 4) provides a putative new source of guaiacyl and syringyl precursors, from p-coumaroyl coenzyme A (CoA), that are available for the formation of biopolymers and phenolic esters. Polyamine oxidation products may, in addition, contribute to the maturation of cell walls (20). Such an alternative route could explain ectopic lignification and conflicting guaiacyl and syringyl end products that accumulate in various mutants of the classical pathway (21, 22). This phenolamide route is independent of the shikimate pool but is more likely to be influenced by nitrogen supply.

Fig. 4

Proposed catalytic sequence in the novel phenolamide pathway. The formation of the final metabolite without the release of intermediates requires the presence of both CYP98A8 and CYP98A9. HCT, hydroxycinnamoyl transferase; SHT, spermidine hydroxycinnamoyl transferase.

Recent evolution argues for the presence of the phenolamide pathway only in Brassicaceae, where hydroxyferuloylspermidines are major pollenkitt constituents, associated with sporopollenin (16, 17). However, phenolamides are antioxidants, ultraviolet (UV) screens, and antimicrobial compounds (23, 24) that have been found in pollen and inflorescences of most flowering plants and are associated with fertility and fruit development (2527). A pathway leading to pollen phenolamides thus exists in other families, but was replaced in the lineage leading to the Brassicaceae. Hydroxylation of p-coumaroylspermidine observed with CYP98A3 raises the possibility that other members of the family evolved the same way. Recent characterization of two acyltransferases of the so-called BAHD family that respectively catalyze the formation of di(sinapoyl) and di(coumaroyl)spermidine in Arabidopsis seeds and roots (28) can also be indicative of an original route involving the transfer of readily hydroxylated phenolics to polyamines. The emergence of a novel pathway in Brassicaceae is possibly related to their specific pollen coat composition and formation (29).

CYP98A8 and CYP98A9 provide an example of adaptive gene evolution via retroposition, positive Darwinian selection, and subsequent duplication that led to novel enzymes and plant metabolic pathways (fig. S8). Both genes acquired predominant expression in the anthers and, via positive selection, evolved a novel function in pollen development. Likewise, most of the successful retrogenes in primates are specifically expressed in the testis and have evolved functional roles in spermatogenesis (1). CYP98A8 and CYP98A9 may be a paradigm example of a similar process operating in plants. Genes mediating sexual reproduction are known to evolve faster than others (30, 31). Their adaptive evolution is thought to be essential for population fitness and contribute to the establishment of fertilization barriers leading to speciation.

Supporting Online Material

Materials and Methods

SOM Text

Figs. S1 to S8

Tables S1 and S2


  • * These authors contributed equally to this work.

References and Notes

  1. Materials and methods are available as supporting material on Science Online.
  2. Single-letter abbreviations for the amino acid residues are as follows: A, Ala; D, Asp; E, Glu; G, Gly; H, His; K, Lys; L, Leu; P, Pro; R, Arg; S, Ser; and T, Thr.
  3. We thank J. J. Bourguignon, M. Legrand, E Grienenberger, D. Banner, and S. J. Perlman for helpful advice; D. Nelson for naming P450s; and N. Abdulrazzak for his contribution. M.M. and J.E. are grateful for funding of European Marie Curie actions; D.W.-R., M.M., V.C., and J.-E.B. acknowledge support from the Human Frontier Science program; and J.E. acknowledges support from the Natural Sciences and Engineering Council of Canada.
View Abstract

Navigate This Article