Metabolic Diversification—Independent Assembly of Operon-Like Gene Clusters in Different Plants

See allHide authors and affiliations

Science  25 Apr 2008:
Vol. 320, Issue 5875, pp. 543-547
DOI: 10.1126/science.1154990


Operons are clusters of unrelated genes with related functions that are a feature of prokaryotic genomes. Here, we report on an operon-like gene cluster in the plant Arabidopsis thaliana that is required for triterpene synthesis (the thalianol pathway). The clustered genes are coexpressed, as in bacterial operons. However, despite the resemblance to a bacterial operon, this gene cluster has been assembled from plant genes by gene duplication, neofunctionalization, and genome reorganization, rather than by horizontal gene transfer from bacteria. Furthermore, recent assembly of operon-like gene clusters for triterpene synthesis has occurred independently in divergent plant lineages (Arabidopsis and oat). Thus, selection pressure may act during the formation of certain plant metabolic pathways to drive gene clustering.

Triterpenes protect plants against pests and diseases and are also important drugs and anticancer agents (14). Like sterols, these compounds are synthesized from the isoprenoid pathway by cyclization of 2,3-oxidosqualene (1, 3). The Arabidopsis genome contains 13 predicted oxidosqualene cyclase (OSC) genes (3, 5). Of these, one encodes cycloartenol synthase (CAS), which is required for sterol biosynthesis, and another encodes lanosterol synthase (LAS), which is conserved across the eudicots and whose function in plants is unknown (Fig. 1A). The 11 remaining OSCs fall into two major clades (I and II) (Fig. 1A). These OSCs produce various triterpenes when expressed in yeast. However, their function in Arabidopsis is unknown. The OSCs in clade I have close homologs in other eudicots; those in clade II appear to be restricted to the Brassicaceae family and show homology to a single OSC from Brassica napus.

Fig. 1.

(A) Neighbor-joining tree of Arabidopsis and oat OSC enzymes (percent bootstrap support indicated). Arabidopsis OSCs are indicated with Arabidopsis Genome Initiative gene codes. Those genes residing in candidate metabolic gene clusters are starred. Oat OSC National Center for Biotechnology Information accession numbers: SAD1 (CAC84558), AsCS1 (CAC84559). (B) Map of the triterpene gene cluster on Arabidopsis chromosome 5. T-DNA insertion mutants are indicated. (C) Microarray expression profiles of the genes in Fig. 1B and the two immediately flanking genes At5g47970 and At5g48020 (neither of which are implicated in secondary metabolism). Absolute expression values are shown with identical scales (± SE, n ≥ 3). Data were retrieved from Genevestigator (26). The genes in Fig. 1B were highly coexpressed across 392 microarray experiments [average correlation coefficient (r) = 0.86]. The flanking genes were not coexpressed with genes in the cluster region or with one another (r < 0.15). Coexpression analysis was performed at the Bio-Array Resource (27) with data from NASCArrays (

Oat (Avena spp.), a monocot that diverged from the eudicots ∼180 million years ago, produces defense-related triterpenes known as avenacins. The first committed step in avenacin synthesisis catalyzed by the OSC β-amyrin synthase (encoded by Sad1) (6). Sad1 is hypothesized to have arisen from a duplicated monocot CAS-like gene after the separation of wheat and oat ∼25 million years ago (6, 7). The second step in avenacin biosynthesis is mediated by SAD2, a member of the newly described monocot-specific CYP51H subfamily of cytochrome P450 enzymes (CYP450s) (8). Sad1 and Sad2 are embedded in a gene cluster that includes genes required for acylation, glucosylation, and other steps in the pathway (2, 7). The avenacin biosynthesis genes are tightly regulated and expressed only in the root epidermis, the site of accumulation of the pathway end product (6, 8). The avenacin gene cluster lies within a region of the oat genome lacking synteny with rice and other cereals (7).

We examined the genomic regions around each of the 13 Arabidopsis OSC genes in the Arabidopsis genome to establish whether genes for triterpene synthesis might be clustered (9). Four OSC genes are flanked by genes predicted to encode other classes of enzymes implicated in secondary metabolism. These four OSCs all belong to clade II, which appears to have undergone accelerated evolution compared with other Arabidopsis OSCs (Fig. 1A). We focused on a region containing four contiguous genes predicted to encode an OSC (At5g48010), two CYP450s (At5g48000 and At5g47990), and a BAHD family acyltransferase (ACT) (At5g47980) (Fig. 1B). The expression of all four genes is highly correlated (Fig. 1C) and occurs primarily in the root epidermis (fig. S1), which suggests that the genes are functionally related (10).

The OSC gene within this region, At5g48010, converts 2,3-oxidosqualene to the triterpene thalianol when expressed in yeast (11). However, thalianol has not been reported in plants. We detected low levels of thalianol in roots but not leaves of wild-type Arabidopsis (Fig. 2, C and D), consistent with the expression of At5g48010. Thalianol was not detectable in root extracts of a null insertion mutant of At5g48010 (thas1-1) (Fig. 2E), which indicated that the At5g48010 gene product [hereafter, named thalianol synthase (THAS)] is required for synthesis of thalianol in Arabidopsis roots. Overexpression of THAS in Arabidopsis resulted in thalianol accumulation in leaves (Fig. 2F) and in dwarfing (Fig. 3A). Mutations in the gibberellin, brassinosteroid, and primary sterol pathways can result in dwarfing (1214). However, the dwarf phenotype of the THAS-overexpressing lines was not rescued by application of gibberellin or brassinosteroid, and the sterol content of these plants was not significantly altered (P > 0.05, Student's t test, n = 3), which suggests that thalianol may be detrimental to plant growth.

Fig. 2.

Extracts from yeast and Arabidopsis were analyzed for triterpene content by GC-MS: TIC, total ion chromatograms; EIC 229, extracted ion chromatograms at a mass-charge ratio (m/z) of 229. Data are representative of at least two separate experiments, each with triplicate samples. (A) Yeast empty-vector control; (B) yeast expressing the At5g48010 cDNA; (C and D) from wild-type Arabidopsis; (E) from an At5g48010 (thas1-1) knockout line, (F) from an Arabidopsis line overexpressing THAS. The ion fragmentation pattern of thalianol was as reported (fig. S2) (11). Thalianol epoxide, a product of bis-oxidosqualene cyclization, was detected in yeast only. The chromatograms are scaled to the highest peak. Unlabeled peaks are sterols.

Fig. 3.

(A) Plants overexpressing thalianol synthase (THAS) are dwarfed. (B) Roots from 7-day-old plants that accumulate thalianol (thah1-1) or thalian-diol (thad1-1) are significantly longer than those of the wild type or thas1-1 (which lacks the entire pathway). Plants that overexpress THAS and, thus, have elevated levels of thalianol also have significantly longer roots than the control. Error bars are ± SE, n = 68 to 90 for three replicate experiments.

Gas chromatography–mass spectrometry (GC-MS) analysis of wild-type root extracts revealed additional peaks that were absent in THAS mutants [(Fig. 4A); wild type and thas1-1]. Because these peaks were dependent on THAS, this suggested that thalianol (peak 1) is converted to unknown downstream products in wild-type Arabidopsis plants (peaks 2a, 2b, 2c, 3a, and 3b). Therefore, the coexpressed genes adjacent to At5g48010 (THAS) were examined to determine whether they function in thalianol modification. We analyzed transferred DNA (T-DNA) insertion mutants in At5g48000, which is immediately adjacent to THAS. At5g48000 is predicted to encode a CYP450 (CYP708A2) belonging to the functionally uncharacterized CYP708 family, a CYP450 family specific to the Brassicaceae (15). Root extracts of mutants affected at At5g48000 (thah1-1 and thah1-2) showed increased thalianol levels compared with the wild type (Fig. 4A and fig. S3, A and B), which suggests that the CYP450 encoded by At5g48000 is required for conversion of thalianol to a downstream product. Furthermore, we observed that peaks 2a to 3b were absent in root extracts of thah1-1 (Fig. 4A) and, so, may correspond to downstream pathway intermediates.

Fig. 4.

(A) Detection of thalianol (1) and other pathway intermediates in root extracts from wild-type and T-DNA insertion lines. TIC, total ion chromatograms; EIC 227 and 229, extracted ion chromatograms at m/z of 227 and 229, respectively. (B) Scheme of the thalianol pathway showing the structures of 2,3-oxidosqualene, thalianol (1), thalian-diol (2a, 2b, and 2c), and desaturated thalian-diol (3a and 3b) (see figs. S2, S4, and S5, for respective ion fragmentation patterns). The hydroxyl group introduced to thalianol by THAH to give thalian-diol is drawn in red. GC-MS ionization data indicate that this hydroxyl group is located at one of the four available carbon positions in rings B or C. Peaks 2a to 2c are isomers of thalian-diol and are likely to differ in the position of the hydroxyl group. Because of the low levels of these compounds in Arabidopsis root extracts, we were unable to determine the precise position of the hydroxyl group in these isomers by nuclear magnetic resonance. The chromatograms are scaled to the highest peak. The peaks between 26 and 28 min (TIC/EIC) are plant sterols. The data are representative of at least two separate experiments, each with triplicate samples.

The second CYP450 in this region (At5g47990) (Fig. 1B) belongs to the CYP705 family, another functionally uncharacterized Brassicaceae-specific CYP450 family (16). The CYP705 and CYP708 families belong to the CYP71 and CYP85 clans respectively, which demonstrates that At5g47990 is not a tandem duplicate of At5g48000. Null insertion mutants and RNA interference (RNAi) knockdown lines for At5g47990 had enhanced levels of peaks 2a and 2b relative to wild-type plants (Fig. 4A and fig. S3, C and D). Peaks 2a, 2b, and 2c, with similar ion fragmentation patterns, were identified as thalian-diol [(3S, 13S, 14R)-malabarica-8,17,21-trien-3,?-diol] (fig. S4) and are likely to represent different thalian-diol isomers. Peaks 2a to 2c were not present in root extracts of thas1-1, which confirmed that production of thalian-diol depends on THAS (Fig. 4A). These peaks were also absent from root extracts of thah1-1 (Fig. 4A), which implicated At5g48000 in thalian-diol biosynthesis. Overexpression of THAH with THAS in Arabidopsis leaves resulted in conversion of thalianol to thalian-diol (fig. S6A), and overexpression of THAH in thah1-1 restored the thalianol pathway (fig. S6B). On the basis of these data, we concluded that At5g48000 encodes thalianol hydroxylase (hereafter, referred to as THAH). Plants that overaccumulate thalian-diol are dwarfed (fig. S6B), which suggests that thaliandiol, like thalianol, is detrimental to growth when produced in the above-ground parts of the plant.

The increased levels of thalian-diol in thad1-1 suggested that At5g47990 was required for conversion of thalian-diol to a further downstream product. We observed that peaks 3a and 3b, which are present in wild-type plants, are absent in thas1-1, thah1-1, and thad1-1 (Fig. 4A). These two peaks were identified as isomers of desaturated thalian-diol [(3S, 13S, 14R)-malabarica-8,15,17,21-tetraen-3,?-diol] (fig. S5). Conversion of thalian-diol to desaturated thalian-diol involves introduction of a double bond at carbon 15 (Fig. 4). CYP450 enzymes can catalyze desaturation reactions of this kind [see ref. (17)]. On the basis of these data, we concluded that At5g47990 encodes thalian-diol desaturase (hereafter, referred to as THAD).

These data show that THAS, THAH, and THAD are contiguous coexpressed genes encoding biosynthetic enzymes required for three consecutive steps in the synthesis and modification of thalianol (Fig. 4B). The fourth gene in the cluster (At5g47980) is predicted to encode a BAHD acyltransferase (ACT). As for THAS, THAH, and THAD, this enzyme also belongs to a Brassicaceae-specific enzyme subgroup. Because At5g47980 has an expression pattern very similar to that of THAS, THAH, and THAD and is implicated in secondary metabolism, it is likely to be required for modification of desaturated thalian-diol. However, we have not detected acylated desaturated thalian-diol in Arabidopsis root extracts. This may be because this compound is further modified or sequestered or is present at very low levels.

The avenacin gene cluster in oat (Avena spp.) confers broad-spectrum resistance to fungal pathogens (2). We tested whether the Arabidopsis thalianol gene cluster was also defense-related. We challenged the roots of mutant and wild-type plants with strains of fungal and bacterial plant pathogens (Alternaria brassicicola, Botrytis cinerea, and Pseudomonas syringae pv tomato DC3000) but saw no discernible differences in disease progression (fig. S7). However, examination of data from a recent survey of genome-wide polymorphisms in Arabidopsis (18) revealed that the thalianol pathway genes represent one of the most conserved regions of the genome. This is the hallmark of a recent selective sweep and implies that this gene cluster confers an important (and as yet unidentified) selective advantage.

Genes for most metabolic pathways are not clustered in plants. However, clustering facilitates the inheritance of beneficial combinations of genes (7, 19, 20); furthermore, disruption of metabolic gene clusters can lead to accumulation of deleterious intermediates (21). The observations that ectopic overaccumulation of thalianol (Fig. 2A) or thalian-diol (fig. S6B) lead to severe dwarfing in Arabidopsis are consistent with a need for tight coordinate regulation of the pathway. It is noteworthy that lines that accumulate elevated levels of either of these compounds have significantly longer roots than the wild type (Fig. 3B), which suggests distinct and organ-specific effects of thalianol and thalian-diol on plant growth. We note that the four coexpressed genes within the thalianol gene cluster have marked histone H3 lysine 27 trimethylation, whereas the immediate flanking genes do not (22), which suggests that clustering may also facilitate coordinate regulation of the gene cluster at the chromatin level.

Although oats and Arabidopsis both contain gene clusters for triterpene synthesis (the avenacin and thalianol clusters, respectively), these two gene clusters are unlikely to share a common origin. This is supported by the fact that the oat Sad1 and Sad2 genes do not have orthologs in Arabidopsis and are monocot-specific (68). Furthermore, there is no evidence for horizontal transfer of either gene cluster from microbes or elsewhere. Phylogenetic analysis suggests that an ancestral gene cluster formed in Arabidopsis around the progenitor of the lineage-specific OSC clade II (Fig. 5). Sequential rearrangements, duplications, and gene loss presumably led to formation of the present-day thalianol cluster. Cluster formation may have been accompanied by the rapid expansion and functional diversification of the lineage-specific OSC clade II, along with the lineage-specific CYP702/708, CYP705, and ACT gene families. In addition, whereas THAS makes thalianol, the other OSCs in clade II produce different triterpene products when expressed in yeast (3, 5). Some of these OSCs may be components of other functional triterpene gene clusters (Fig. 1A), as suggested by genome-wide coexpression of CYP450s (23).

Fig. 5.

Model of thalianol cluster evolution. The OSC tree is as in Fig. 1. Colored circles next to each OSC indicate the presence of adjacent genes encoding specific classes of other biosynthetic enzymes (see key). The colored diamonds indicate points at which common ancestors of genes for these other specific classes of enzyme are hypothesized to have become located in the vicinity of an ancestral OSC gene. The reconstruction minimizes the number of rearrangement and gene loss events required to reach the present-day chromosomal arrangement. The product of At5g48010 (the OSC gene that lies within the functional gene cluster reported in this paper) is indicated in bold. The existence of other triterpene gene clusters is inferred by association of other clade II OSCs with genes for other enzymes implicated in secondary metabolism.

An obvious assumption may be that gene clusters of the kind that we have observed were inherited from early evolutionary progenitor species. However, our data clearly indicate that the thalianol and avenacin gene clusters are the products of separate and recent evolutionary events. This finding suggests that eukaryotic genomes are capable of remarkable plasticity and can assemble operon-like gene clusters de novo, which raises intriguing questions about the molecular mechanisms and evolutionary pressures that have acted to cause these gene clustering arrangements to assemble and become fixed. Comparative genomics will now enable us to trace the origins of such gene clusters and so to gain insights into the mechanisms that drive their formation. A further intriguing question is concerned with why genes for some metabolic pathways are clustered, whereas others are not. Our identification of two triterpene gene clusters [for thalianol in Arabidopsis (this paper) and for avenacins in oat (2, 68, 10)] implies that triterpene pathways may be predisposed to gene clustering. There are two other examples of gene clusters for plant defense compounds (for rice diterpenes and maize benzoxazinoids) (19, 24, 25). As we learn more about why genes for some metabolic pathways are clustered and others are not, we may need to redefine our understanding of plant metabolism.

Supporting Online Material

Materials and Methods

Figs. S1 to S8

Table S1


References and Notes

View Abstract

Navigate This Article