The opium poppy genome and morphinan production

See allHide authors and affiliations

Science  19 Oct 2018:
Vol. 362, Issue 6412, pp. 343-347
DOI: 10.1126/science.aat4096

Poppy genome reveals evolution of opiates

The opium poppy has been a source of painkillers since Neolithic times. Attendant risks of addiction threaten many today. Guo et al. now deliver a draft of the opium poppy genome, which encompasses 2.72 gigabases assembled into 11 chromosomes and predicts more than 50,000 protein-coding genes. A particularly complex gene cluster contains many critical enzymes in the metabolic pathway that generates the alkaloid drugs noscapine and morphinan.

Science, this issue p. 343


Morphinan-based painkillers are derived from opium poppy (Papaver somniferum L.). We report a draft of the opium poppy genome, with 2.72 gigabases assembled into 11 chromosomes with contig N50 and scaffold N50 of 1.77 and 204 megabases, respectively. Synteny analysis suggests a whole-genome duplication at ~7.8 million years ago and ancient segmental or whole-genome duplication(s) that occurred before the Papaveraceae-Ranunculaceae divergence 110 million years ago. Syntenic blocks representative of phthalideisoquinoline and morphinan components of a benzylisoquinoline alkaloid cluster of 15 genes provide insight into how this cluster evolved. Paralog analysis identified P450 and oxidoreductase genes that combined to form the STORR gene fusion essential for morphinan biosynthesis in opium poppy. Thus, gene duplication, rearrangement, and fusion events have led to evolution of specialized metabolic products in opium poppy.

Throughout history, opium poppy (Papaver somniferum L.) has been both a friend and foe of human civilization. In use since Neolithic times (1), the sap, known as opium, contains various alkaloids, including morphine and codeine, with effects ranging from pain relief and cough suppression to euphoria, sleepiness, and addiction. Opioid-based analgesics remain among the most effective and cheap treatments for the relief of severe pain and palliative care, but, owing to their addictive properties, careful medical prescription is essential to avoid misuse. Access to morphine equivalents to alleviate serious health-related suffering is unequal: In the United States and Canada, more than 3000% of the estimated need is met and in Western Europe, 870% is met, whereas in China, Russia, India, and Nigeria, 16%, 8%, 4%, and 0.2%, respectively, is met (2). Addressing the lack of access to pain relief or palliative care, especially among poor people in low to middle income countries, has been recognized as a global health and equity imperative (2).

Chemical synthesis or synthetic biology approaches are not yet commercially viable for any of the morphinan subclass of benzylisoquinoline alkaloids (BIAs) (35), and opium poppy remains the only commercial source. Genome rearrangements have been important in the evolution of BIA metabolism in opium poppy. For example, a cluster of 10 genes encodes enzymes for production of the antitussive and anticancer compound noscapine, which belongs to the phthalideisoquinoline subclass of BIAs (6), and a P450 oxidoreductase gene fusion (4, 7, 8) is responsible for the key gateway reaction directing metabolites toward the morphinan branch and away from the noscapine branch. Here we present the sequence assembly of the opium poppy genome to aid investigation into the evolution of BIA metabolism and provide a foundation for further yield improvement of this medicinal plant.

Large, complex plant genomes with an abundance of repeated sequences still pose challenges for genome analysis. In this study, we combined sequencing technologies (fig. S1), including Illumina paired-end and mate-pair (214X), 10X Genomics linked reads (40X), and Pacific Biosciences long-read (66.8X) sequencing and, for quality checking, Oxford Nanopore and Illumina sequencing of bacterial artificial chromosomes (Table 1 and tables S1 and S2). The final genome assembly of 2.72 Gb covers 94.8% of the estimated genome size (figs. S2 to S4 and table S3), and 81.6% of sequences have been assigned into individual chromosomes (fig. S5 and table S4) using a linkage map generated by sequence-based genotyping of 84 F2 plants (tables S5 and S6). We annotated the genome by using the MAKER pipeline (9) to incorporate protein homolog and transcriptome data from seven tissues (table S7). This annotation predicted 51,213 protein-coding genes (Table 1 and fig. S6) and 9494 noncoding RNAs (table S8). Benchmarking Universal Single-Copy Orthologs (BUSCO) (10) analysis based on plant gene models estimates 95.3% completeness (fig. S7). All predicted protein-coding genes are supported by RNA sequencing (RNA-seq) data or homologs, whereas 68.8% have significant hits in the InterPro database (Table 1). Repetitive elements make up 70.9% of the genome. Of the repetitive elements, 45.8% are long terminal repeat retrotransposons (fig. S8).

Table 1 Assembly and annotation statistics of opium poppy genome.

View this table:

Synteny analysis revealed a relatively recent whole-genome duplication (WGD) event and traces of what we consider to be ancient segmental duplications, although an ancient WGD cannot be ruled out (Fig. 1C and figs. S9 to S14). Distribution of synonymous substitutions per synonymous site (Ks) for P. somniferum paralogous genes and syntenic blocks confirmed a major WGD peak (Fig. 1C; figs. S13, A and D, and S15; and table S9) and a minor segmental duplication peak (Fig. 1C and fig. S13D). Intergenomic colinearity analysis indicated that P. somniferum did not experience γ, the hexaploidization event shared in core eudicots, as demonstrated by a 3:2 syntenic relationship between grape (Vitis vinifera) (11) and P. somniferum (fig. S9, C and F). Syntenic blocks (>132 kb) account for 86% total coverage across the whole genome (Fig. 1A and tables S10 to S13). Of the 25,744 genes arising from the WGD, 89.3% are present as two copies and 10.7% are present in more than two copies (fig. S10). Gene ontology analysis suggests that gene duplicates from the WGD are enriched with terms such as “cell redox homeostasis” and “positive regulation of transcription” (fig. S16). Comparison of P. somniferum with an ancestral eudicot karyotype (AEK) genome (12) (Fig. 1B) and with grape also supported the P. somniferum WGD (figs. S9, S11, and S12). We used OrthoFinder (13) to identify 48 single-copy orthologs shared across 11 angiosperm species, and phylogenetic analysis of these using BEAST (14) indicated that P. somniferum diverged from Aquilegia coerulea (Ranunculaceae) and Nelumbo nucifera (Nelumbonaceae) around 110 million years (Ma) ago and 125 Ma ago, respectively (Fig. 1D). Using divergence time and mean Ks values of syntenic blocks between P. somniferum and A. coerulea, we estimated the synonymous substitutions per site per year as 6.98 × 10−9 for Ranunculales, which led to the estimated time of the WGD at around 7.8 Ma ago (Fig. 1C, figs. S13 and S17, and table S9). By applying a phylogenomic approach that traces the history of paralog pairs using phylogenetic trees (15), we constructed 95 rooted maximum likelihood trees containing a pair of opium poppy paralogs from under the Ks peak around 1.5 as anchors. Additionally, we found that 65% of the trees support segmental duplications originating from multiple events occurring before the Papaveraceae-Ranunculaceae divergence at 110 Ma ago, whereas 35% support events occurring after the divergence (table S14 and data S1 and S2). After applying a more stringent approach with a bootstrap threshold cutoff at 50% for ancient gene pairs (15), we determined that 21% of trees support duplication events occurring before the Papaveraceae-Ranunculaceae divergence. The Ks distributions of A. coerulea paralogs and syntenic genes support a WGD event in this representative of the Ranunculaceae at 111.3 ± 35.6 Ma ago (Fig. 1C and table S9). A synteny dot plot of the opium poppy genome assembly with the A. coerulea genome assembly (16) revealed a 2:2 syntenic relationship (fig. S9D), which suggests that both the opium poppy and A. coerulea WGD events happened after the Papaveraceae-Ranunculaceae divergence, as shown in Fig. 1D.

Fig. 1 Opium poppy genome features and whole-genome duplication.

(A) Characteristics of the 11 chromosomes of P. somniferum. Tracks a to c represent the distribution of gene density, repeat density, and GC density, respectively, with densities calculated in 2-Mb windows. Track d shows syntenic blocks. Band width is proportional to syntenic block size. (B) Comparison with ancestral eudicot karyotype (AEK) chromosomes reveals synteny. The syntenic AEK blocks are painted onto P. somniferum chromosomes. (C) Synonymous substitution rate (Ks) distributions of syntenic blocks for P. somniferum paralogs and orthologs with other eudicots are represented with colored lines as indicated. (D) Inferred phylogenetic tree with 48 single-copy orthologs of 11 species identified by OrthoFinder (13). Posterior probabilities for all branches exceed 0.99. The timing of the P. somniferum and A. coerulea whole-genome duplications (WGDs) was estimated in this study, and other reported whole-genome triplication (WGT)/WGD timings are superimposed on the tree. Divergence timings are estimated using BEAST (14) and are indicated by light blue bars at the internodes with 95% highest posterior density (HPD).

The genome assembly allowed us to locate all of the functionally characterized genes of BIA metabolism in opium poppy, as well as their closely related homologs, to either chromosomes or unplaced scaffold positions (table S15). The noscapine gene cluster is located within a 584-kb region on chromosome 11, along with the (S)- to (R)-reticuline (STORR) gene fusion plus the remaining four genes in the biosynthetic pathway to production of the morphinan alkaloid, thebaine (Fig. 2, A and B). These genes are coexpressed in stems (Fig. 2C and fig. S18), and we refer to them as the BIA gene cluster. None of the other genes known to be associated with BIA metabolism—including BERBERINE BRIDGE ENZYME (BBE); TETRAHYDROPROTOBERBERINE N-METHYLTRANSFERASE (TNMT); and the bifurcated morphinan branch pathway genes CODEINE 3-O-DEMETHYLASE (CODM), THEBAINE 6-O-DEMETHYLASE (T6ODM), and CODEINONE REDUCTASE (COR)—are in a biosynthetic gene cluster (table S15). We used the plantiSMASH genome mining algorithm (17) to search the 11 chromosomes and unplaced scaffolds that encode annotated genes for additional gene clusters predicted to be associated with plant specialized metabolism. This approach detected the BIA gene cluster and a number of functionally uncharacterized genes across the same region, among which a cytochrome P450 (PS1126530.1) and a methyltransferase (PS1126590.1) exhibited a similar expression pattern to that of the 15 genes of BIA metabolism (Fig. 2A and table S16). Expression of genes immediately outside the 584-kb BIA gene cluster boundaries was low in aerial tissue (table S16). These genes include METHYLSTYLOPINE 14-HYDROXYLASE (MSH), which is involved in the sanguinarine branch of BIA metabolism (18). MSH is expressed in root tissue together with other sanguinarine pathway genes, which are dispersed across the genome (table S15). The plantiSMASH algorithm also found 49 other possible gene clusters across the 11 chromosomes and 34 on unplaced scaffolds, several of which show tissue-specific expression patterns (table S16). Paralogs of the morphinan pathway genes SALUTARIDINE SYNTHASE (SALSYN), SALUTARIDINE REDUCTASE (SALR), and SALUTARIDINOL-7-O-ACETYL TRANSFERASE (SALAT) and the recently discovered THEBAINE SYNTHASE (19) were identified on an unplaced scaffold in synteny with the BIA biosynthesis gene cluster on chromosome 11 (Fig. 2A and fig. S19 and S20). The expression pattern of these genes matches that of the genes in the BIA cluster (Fig. 2C and table S16).

Fig. 2 Genomic arrangement of key genes of BIA metabolism.

(A) Arrangement and chromosomal position as indicated of the 584-kb BIA gene cluster on chromosome 11 (chr11) encoding 15 coexpressed genes involved in noscapine and morphine biosynthesis (cluster 49, table S16). Below the BIA gene cluster is shown a syntenic block from chr2 (clusters 11 and 12, table S16) associated with the noscapine component of the cluster and a syntenic block from unplaced scaffold 21 associated with the morphinan component (cluster 70, table S16). Syntenic gene pairs are indicated by dashed lines. (B) Schematic representation of noscapine and morphinan branch pathways with the reactions associated with BIA gene cluster highlighted in green boxes. “spont.” indicates spontaneous reactions. 4-HPA, 4-hydroxyphenylacetaldehyde; fpkm, fragments per kilobase of transcript per million mapped reads. Gene abbreviations are defined in table S15. (C) Tissue-specific expression of noscapine and morphinan biosynthesis genes across different tissues. AtA, at anthesis; 5DPA, 5 days postanthesis. BIA pathway genes and their expression values (converted to z scores) across different tissues are displayed as a heat map. Genes located in the BIA gene cluster are shown in bold. (D) Schematic structure of STORR on chr11 and the genomic region on chr2 containing its closest paralogs corresponding to the P450 and reductase modules. Dashed lines denote exon-intron boundaries. (E) Arrangement of locally duplicated copies of CODEINE 3-O-DEMETHYLASE (CODM), THEBAINE 6-O-DEMETHYLASE (T6ODM), and CODEINONE REDUCTASE (COR) genes (other annotated genes in the associated genomic regions are not shown).

To investigate the evolutionary history of the BIA gene cluster, we used MCScanX (20) to perform two rounds of synteny analysis with either the “all BLASTp” result as input for blocks with distant homology or the default “top 5 BLASTp” result for blocks with close homology. We found that the top-ranked syntenic block with the noscapine branch genes has distant homology and is on chromosome 2 [expect value (E-value) = 8.1× 10−15, all BLASTp], whereas the top-ranked syntenic block for morphinan pathway genes has close homology and is on unplaced scaffold 21 (E-value = 0, top 5 BLASTp) (Fig. 2A and tables S17 and S18). Ks and amino acid identity of syntenic gene pairs (fig. S21 and table S17) demonstrate that the syntenic block associated with the noscapine branch component of the BIA gene cluster is due to an ancient duplication with a median Ks value of 3.9, whereas the syntenic block associated with the morphinan branch component is due to a much more recent duplication occurring in the same time frame as the WGD event around 7.8 Ma ago.

The fusion event resulting in STORR was key for evolution of morphinan biosynthesis in the Papaveraceae (Fig. 2B) (7). Gene family analysis of P450 and reductase modules of STORR revealed their closest paralogs located 865 base pairs (bp) apart on chromosome 2 (Fig. 2D and fig. S22). These paralogs have the same gene orientation and exon-intron boundaries as the STORR modules, and on this basis we propose that the STORR gene fusion involved an 865-bp deletion after a duplication (Fig. 2D and table S19). STORR and its closest paralogs show amino acid sequence identity of 75 and 82% for the P450 and oxidoreductase modules, respectively, which suggests that the duplication leading to the STORR gene fusion occurred earlier than the WGD event (fig. S21).

From thebaine, the bifurcated morphinan branch giving rise to codeine and morphine requires three enzymatic reactions, two catalyzed by the 2-oxoglutarate/Fe(II)-dependent dioxygenases, CODM and T6ODM, and a third catalyzed by COR (Fig. 2B). The genome assembly reveals that CODM and T6ODM are encoded by colocalized gene copies on chromosomes 1 and 2, respectively (Fig. 2E and table S19). Phylogenetic analysis shows that copies of both CODM and T6ODM share more than 97% protein sequence identity, whereas the closest paralogs of CODM and T6ODM share 75.6 and 88.6%, respectively (figs. S21 and S22 and table S19). There are four copies of COR, two dispersed on chromosome 7 and two adjacent on unplaced scaffold 107 (Fig. 2E). Copies of COR share more than 95% protein sequence identity, and the closest paralogs share ~74% (figs. S21 and S22 and table S19). This closest-paralog analysis indicates that—as is the case with STORRT6ODM, CODM, and COR emerged before the WGD (fig. S21). Because T6ODM, CODM, and COR use thebaine and downstream intermediates, we assume that the ability to produce thebaine had evolved before the WGD. The near sequence identity between the copies within each of the T6ODM, CODM, and COR gene families indicates that the increase in copy number of these genes occurred more recently than the WGD event. On the basis of the above timing of events, we speculate that the BIA gene cluster was assembled before the WGD event and, after duplication, underwent deletion of the noscapine component and STORR, leaving the morphinan component on unplaced scaffold 21.

The presence of genes exclusively associated with biosynthesis of both phthalideisoquinolines and morphinans in the BIA gene cluster implies a selection pressure that favors clustering of genes associated with these classes of alkaloids. BBE and TNMT functions are not exclusive for noscapine biosynthesis: Both are required for sanguinarine biosynthesis, which occurs predominantly in root rather than aerial tissues where noscapine and morphine biosynthesis occurs. Selective pressure on BBE and TNMT associated with their involvement in the biosynthesis of sanguinarine in root tissue may have kept them from being part of the BIA gene cluster even though they are also both expressed in stem tissue (Fig. 2C). Coordinate regulation of gene expression is considered to be part of the selective pressure resulting in gene cluster formation (21). In opium poppy, the exclusivity of gene function and complexity of the gene expression pattern could have determined which genes are clustered.

Supplementary Materials

Materials and Methods

Figs. S1 to S23

Tables S1 to S19

References (2276)

Data S1 and S2

References and Notes

  1. The Aquilegia coerulea v3.1 genome data (released 8 August 2014; were produced by the U.S. Department of Energy Joint Genome Institute.
Acknowledgments: We thank Y. Jiao and L. Chen for helpful discussions on genome analysis and J. Hai, J. Mitchell, S. Donninger, and J. Hudson for administrative and technical support. This work involved the use of the York Bioscience Technology Facility. Funding: K.Y., L.G., and X.Y. are supported by the National Science Foundation of China (31671372, 31701739, and 61702406), the National Key R&D Program of China (2017YFC0907500 and 2018YFC0910400), “the Thousand Talents Plan,” and the Key Construction Program of the National “985” Project. Z.N. is funded by the Wellcome Trust (WT098051), and I.A.G. received support from the UK Biotechnology and Biological Sciences Research Council (BB/K018809/1) and the Garfield Weston Foundation. Author contributions: K.Y., Z.N. and I.A.G. designed and supervised research. T.W., R.T., T.A.B., and Y.Li collected materials for sequencing and generated transcriptome data and linkage maps. Z.N. performed the genome assembly. X.Y., L.G., Y.Lu, T.W., and Y.Li performed genome annotation and synteny analysis. L.G., X.Y., Y.Li, and T.W. performed transcriptome and phylogenomic analysis. T.W., Y.Li, Z.H., X.Y., L.G., K.Y., and I.A.G. interpreted evolution of BIA metabolism. X.Y., L.G., Z.N., Y.Li, and T.W. prepared figures and tables. I.A.G., L.G., T.W., X.Y., Y.Li, Z.N., and K.Y. wrote the paper. Competing interests: None declared. Data and materials availability: All raw reads and assembled sequence data have been uploaded to NCBI under BioProjectID PRJNA435796. The Whole Genome Shotgun project has been deposited under accession number PUWZ00000000. The nine transcriptome datasets have accession numbers GSM3022361 to GSM3022369, and the 15 BAC sequences have accession numbers MG995579 to MG995593. HN1 is available from the University of York on behalf of Sun Pharmaceutical Industries Australia for experimental purposes, subject to a material transfer agreement.
View Abstract

Stay Connected to Science

Navigate This Article