Research Article

Identification of Functional Elements and Regulatory Circuits by Drosophila modENCODE

See allHide authors and affiliations

Science  24 Dec 2010:
Vol. 330, Issue 6012, pp. 1787-1797
DOI: 10.1126/science.1198374
  • Fig. 1

    Overview of Drosophila modENCODE data sets. Range of genomic elements and trans factors studied, with relevant techniques and resulting genome annotations. hnRNA, heterogeneous nuclear RNA.

  • Fig. 2

    Coding and noncoding genes and structures. (A) Extended region of male-specific expression in chromosome 2R including new protein-coding and noncoding transcripts. MIP03715 contains two short ORFs of 23 and 21 codons, respectively. ORF multispecies alignments (color coded) show abundant synonymous (bright green) and conservative (dark green) substitutions and a depletion of nonsynonymous substitutions (red), indicative of protein-coding selection [ratio of nonsynonymous to synonymous substitutions (dN/dS) < 1 for both, P < 10−7 and P < 10−11, respectively, likelihood ratio test]. Surrounding regions show abundant stop codons (blue, magenta, yellow) and frame-shifted positions (orange). (B) A transcribed region in chromosome 3R (26,572,290 to 26,573,456), identified by RNA-seq and supported by promoter-specific and transcription-associated chromatin marks, shows RNA secondary-structure conservation in eight Drosophila species. (C) Example of a new miRNA derived from a protein-coding exon of CG6700, with 21- to 23-nt RNAs indicative of Drosha/Dicer-1 processing and also recovered in AGO1-immunoprecipitate libraries from S2 cells and adult heads indicative of Argonaute loading. Evolutionary evidence suggests protein-coding constraint, no conservation for the mature arm, and conservation of the star arm. Red boxes indicate 8-mer “seed” sequence potentially mediating 3′ UTR targeting.

  • Fig. 3

    Chromatin-based annotation of functional elements. (A) Average enrichment profiles of histone marks, chromosomal proteins, and physical chromatin properties at genes, origins of replications, insulator proteins, and TF binding positions. Each panel shows 4 kb centered at a specified location, either proximal to TSS (prox.) or distal (dist.). (B) Example of a transcript predicted by chromatin signatures associated with promoter (red trace) and gene bodies (blue box) and supported by cDNA evidence. Strong RNA Pol II and H3K4me3 peaks in the promoter region and strong H2B ubiquitination extending toward the previously annotated luna gene are confirmed by RNA-seq junction reads that were not used in the prediction. (C) Intergenic H3K36me1 chromatin signatures predict replication activity. Enrichment of multiple chromatin marks were used to identify putative large (>10 kbp) intergenic H3K36me1/H3K18ac domains located outside of annotated genes. Although these marks generally correspond to long introns within transcripts, their intergenic domains were enriched for replication activity (fig. S5). In this example from BG3 cells, such a domain was found upstream of the bi locus and is associated with early replication, contains an early origin, is enriched for ORC binding, and is further supported by NippedB binding.

  • Fig. 4

    Discovery and characterization of chromatin states and their functional enrichments. Combinatorial patterns of chromatin marks in S2 and BG3 cells reveal chromatin states associated with different classes of functional elements. A discrete model (states d1 to d30) captures the presence/absence information, and a continuous model (states c1 to c9) also incorporates mark intensity information (22). States were learned solely from mapped locations of marks (left) and were associated with modENCODE-defined elements (right) with most pronounced patterns in euchromatin (green) and heterochromatin (blue) shown here (additional variations shown in fig. S6).

  • Fig. 5

    High-occupancy TF binding regions and their relation to motifs, ORC, and chromatin. (A) Enrichment of known motifs for regions bound by corresponding TF, sorted by average complexity, denoting the number of distinct TFs bound in the same region. For eight TFs, motifs are depleted (blue) for higher-complexity regions, suggesting non–sequence-specific recruitment. In seven of eight cases, known motifs were enriched in bound regions (Enrich), suggesting sequence-specific recruitment in lower-complexity regions. For each factor, binding sites were highly reproducible between replicates (Reprod). (B) ORC versus TF complexity. The relation between HOT spot complexity (x axis) and enrichment in ORC binding (y axis). (C) Discovered motifs in high- or low-complexity regions (boxed range) and their enrichment in regions of higher (red) or lower (blue) complexity. M1 to M5 are candidate “drivers” of HOT region establishment.

  • Fig. 6

    Genome coverage by modENCODE data sets. (A) Unique (bars) and cumulative (lines) coverage of nonrepetitive (blue line) and conserved (red line) genomes. (B) Multiple coverage for data sets grouped into transcribed elements (red), bound regulators (blue), and chromatin domains (green) (17). Across all three classes (black), 10.8% of the genome is covered 15 or more times, and 69.5% is covered at least twice. (C) Increased coverage in a Chr2R region with no prior annotation (left half), now showing multiple overlapping data sets. Coverage by different tracks is highly clustered (fig. S11), with some regions showing little coverage and others densely covered by many types of data.

  • Fig. 7

    Properties of the physical regulatory network. (A) Hierarchical view of mixed ChIP-based/miRNA physical regulatory network that combines transcriptional regulation by 76 TFs (green) from ChIP experiments and posttranscriptional regulation by 52 miRNAs (red). TFs are organized in a five-level hierarchy on the basis of their relative proportion of TF targets versus TF regulators. miRNAs are separated into two groups: the ones that are regulated by TFs (left) and the ones that only regulate TFs (right). The horizontal position of the TFs in each level shows whether they regulate miRNAs (left), have no regulation to or from miRNAs (middle), or do not regulate but are targeted by miRNAs (right). Different shades of green and red represent the total number of target genes for TFs and miRNAs, respectively (darker nodes indicate more targets). Ninety-two percent of TF regulatory connections are downstream connections from higher levels to lower levels (green), and only 8% are upstream (blue). miRNA regulatory connections are red. (B) Highly enriched network motifs in a mixed physical regulatory network including TFs (green), miRNAs (red), and target genes (black). For each motif, five examples are shown. Known activators, blue; known repressors, red; other TFs, black.

  • Fig. 8

    Gene function prediction from coexpression and co-regulation patterns. Receiver operator characteristic curves for GO terms with predicted new members and area-under-the-curve statistics. False negatives for each GO term are predictions for genes previously annotated for “incompatible” GO terms, defined as pairs of GO terms that have less than 10% common genes relative to the union of their gene sets.

  • Fig. 9

    Predictive models of regulator, region, and gene activity. (A) Dynamic regulatory map produced by DREM predicts stage-specific regulators associated with expression changes (y axis, log space relative to first time point) across developmental stages (x axis) (17). Each path (colored lines) indicates the average expression of a group of genes (solid circles) and its standard deviation (size of circle). Predicted bifurcation events, or splits, (open circles) are numbered 1 through 19. The colored insets show the expression level of each individual gene going through the split and ranked regulators from the physical (black) or functional (blue) regulatory network associated with the higher (H), lower (L), or middle (M) path. The uncolored inset shows the expression of repressor SU(HW), whose expression decrease coincides with an expression increase of its targets (red asterisk). (B) Predicted S2 activators (top group) or repressors (bottom group), based on the coherence between relative expression of the TF in S2 (yellow) versus BG3 (green) and the relative motif enrichment (red) or depletion (blue) in S2 versus BG3 for activating (left columns) or repressive marks (right columns). (C) True (top of shaded area) and predicted (dotted blue line) expression levels for target genes, from the expression levels of inferred activators (red) and repressors (green). Only the top five positive and negative regulators are shown, ranked by their contribution to the expression prediction (weight of linear-regression model). Examples are shown from 8 of 1487 predictable genes, ranked by prediction quality scores (rank in upper right corner), evaluated as the averaged squared error between predicted and true expression levels across the time course. An expanded set of examples is shown in fig S23.

Additional Files

  • Identification of Functional Elements and Regulatory Circuits by Drosophila modENCODE
    The modENCODE Consortium, Sushmita Roy, Jason Ernst, Peter V. Kharchenko, Pouya Kheradpour, Nicolas Negre, Matthew L. Eaton, Jane M. Landolin, Christopher A. Bristow, Lijia Ma, Michael F. Lin, Stefan Washietl, Bradley I. Arshinoff, Ferhat Ay, Patrick E. Meyer, Nicolas Robine, Nicole L. Washington, Luisa Di Stefano, Eugene Berezikov, Christopher D. Brown, Rogerio Candeias, Joseph W. Carlson, Adrian Carr, Irwin Jungreis, Daniel Marbach, Rachel Sealfon, Michael Y. Tolstorukov, Sebastian Will, Artyom Alekseyenko, Carlo Artieri, Benjamin W. Booth, Angela N. Brooks, Qi Dai, Carrie A. Davis, Michael O. Duff, Xin Feng, Andrey Gorchakov, Tingting Gu, Jorja G. Henikoff, Philipp Kapranov, Renhua Li, Heather MacAlpine, John Malone, Aki Minoda, Jared Nordman, Katsutomo Okamura, Marc Perry, Sara Powell, Nicole C. Riddle, Akiko Sakai, Anastasia Samsonova, Jeremy E. Sandler, Yuri Schwartz, Noa Sher, Rebecca Spokony, David Sturgill, Marijke van Baren, Kenneth H. Wan, Li Yang, Charles Yu, Elise Feingold, Peter Good, Mark Guyer, Rebecca Lowdon, Kami Ahmad, Justen Andrews, Bonnie Berger, Steven E. Brenner, Michael R. Brent, Lucy Cherbas, Sarah C. R. Elgin, Thomas R. Gingeras, Robert Grossman, Roger A. Hoskins, Thomas C. Kaufman, William Kent, Mitzi Kuroda, Terry Orr-Weaver, Norbert Perrimon, Vincenzo Pirrotta, James W. Posakony, Bing Ren, Steven Russell, Peter Cherbas, Brenton R. Graveley, Suzanna Lewis, Gos Micklem, Brian Oliver, Peter J. Park, Susan E. Celniker, Steven Henikoff, Gary H. Karpen, Eric C. Lai, David M. MacAlpine, Lincoln D. Stein, Kevin P. White, Manolis Kellis

    Supporting Online Material

    This supplement contains:
    Materials and Methods
    SOM Text
    Figs. S1 to S23
    Tables S1 to S8

    This file is in Adobe Acrobat PDF format.

    Other Supporting Online Material for this manuscript includes the following:
    Data Sets S1 to S17 (available at

Stay Connected to Science

Navigate This Article