Functional Architecture and Evolution of Transcriptional Elements That Drive Gene Coexpression

See allHide authors and affiliations

Science  14 Sep 2007:
Vol. 317, Issue 5844, pp. 1557-1560
DOI: 10.1126/science.1145893


Transcriptional coexpression of interacting gene products is required for complex molecular processes; however, the function and evolution of cis-regulatory elements that orchestrate coexpression remain largely unexplored. We mutagenized 19 regulatory elements that drive coexpression of Ciona muscle genes and obtained quantitative estimates of the cis-regulatory activity of the 77 motifs that comprise these elements. We found that individual motif activity ranges broadly within and among elements, and among different instantiations of the same motif type. The activity of orthologous motifs is strongly constrained, although motif arrangement, type, and activity vary greatly among the elements of different co-regulated genes. Thus, the syntactical rules governing this regulatory function are flexible but become highly constrained evolutionarily once they are established in a particular element.

Gene products that are involved in the same molecular process must be coordinately expressed. Transcriptional coexpression is achieved by regulatory proteins and their target cis-regulatory elements that promote gene transcription in overlapping spatiotemporal distributions (17). The function of a cis-element is encoded in its molecular architecture: a nucleotide sequence with instantiations (motifs) of the binding sites (motif types) for one or more transcription factors arranged with functionally significant motif combinations, orientations, or spacing. We address the molecular architecture of cis-elements driving gene coexpression and how the sequence and function of such elements evolves.

Because motifs are the functional units within a cis-element, analysis of a cis-element's molecular architecture and evolution requires an experimental system that allows quantification of each motif's activity, in a sufficiently large number of coexpressed genes whose functions have been maintained throughout evolution. The urochordate Ciona harbors such a system in the form of 19 genes that are coexpressed in the 36 muscle cells of the developing embryo (6). Of these 19 genes, 17 function in the same macromolecular complex, underscoring the requirement for tight coexpression. The genes include six single-copy loci from C. savignyi and their six orthologs in the sister species, C. intestinalis: α-tropomyosin 1 (AT1), α-tropomyosin 2 (AT2), myosin binding protein (MBP), troponin I (TI), troponin T (TT), and creatine kinase (CK). The remaining seven genes comprise two or three paralogs each of the multicopy gene families muscle actin (MA), myosin light chain (MLC), and myosin regulatory light chain (MRLC) from C. savignyi (fig. S1). The three motifs that mediate muscle-specific transcription in Ciona in general, and of these loci in particular, are the cyclic adenosine 5′-monophosphate response element (CRE) (6, 8), the MyoD motif (911), and the Tbx6 motif (12) (fig. S2).

We investigated the functional architecture of the 19 cis-elements by a comprehensive mutagenesis effort coupled with a whole-embryo expression assay (1315). Each reporter construct harboring specifically mutagenized sequences was transfected into hundreds of developing embryos. Activity of a mutagenized element was measured as the percentage of muscle cells expressing the reporter, which we show to correlate with average transcript levels [fig. S4 and supporting online material (SOM) text, section 3]. A first few hundred constructs, assayed in over 2000 transfections, defined the cis-elements responsible for the majority of function of each locus (Fig. 1 and table S1). We then dissected each cis-element using 220 constructs with small deletions [5 to 10 base pairs (bp)] or site-directed mutants that removed putative motifs in isolation or different combinations. Our quantitative results are based on 1237 transfections (five biological replicates per construct), which yielded a total of 85,506 transgenic embryos (table S2).

Fig. 1.

Experimental design. For each of 19 loci, initial deletion series (truncated lines) locate regions of concentrated function (gray bar). Fine-scale deletions and site-directed mutations (open circles) target putative motifs (solid circles). Set of constructs is represented as matrix X of categorical explanatory variables (ones and zeroes) whose replicated transfections yield expression measurements (matrix Y of yn, i). Functional contributions of individual motifs (B1 to B5) are estimated with regression models. YFG, your favorite gene.

A quantitative and biologically meaningful representation of the functional architecture of each cis-element required an analysis framework to estimate the activity of each motif. To choose a framework, we needed to assess the relative importance of genetic interactions between motifs. Using the subset of the data that was appropriate for interaction analyses, we determined that most of the cis-elements examined functioned with little epistasis (SOM text, section 4, and fig. S5). Further evidence for motif independence was obtained by motif substitution experiments (SOM text, section 5, and fig. S6). This indicated that we could use multivariate regression (16) as the analytical framework to quantify motif activity.

Model predictions approximate the observed data well and are robust to several analytical scenarios (SOM text, section 6, fig. S7, and tables S1 and S3). After considering the advantages and drawbacks of these scenarios, we chose to continue our analyses with additive regression models (SOM text, section 6). These models explain 30 to 89% (mean 67%) of the variance of expression at each element (table S1), again underscoring that genetic independence explains most of the data well. Consistent with additivity, we express motif function in “expression frequency units” (efu), meaning that a motif with an inferred activity of x efu increases by x the percentage of muscle cells in which expression is detected (SOM text, section 3). The mean per-element fraction of activity attributed to the motifs is 83%. The 77 motifs affect element activity by –0.14 to 0.45 efu (Fig. 2A), with 39 motifs having significantly nonzero activating function (partial regression coefficient t statistic, P < 0.05).

Fig. 2.

Molecular architecture of muscle co-regulation. Motifs are depicted as circles, and color indicates motif type: CRE (red), MyoD (green), and Tbx6 (blue). (A) Cis-regulatory function of 77 individually resolved motifs. Activity and standard error are plotted on the y axis; motifs are sorted along the x axis by increasing activity. (B) Distribution of cis-regulatory function at the 19 loci of this study. Cs, C. savignyi; Ci, C. intestinalis. Labels below axes indicate distance to transcription start site. Area of circle is proportional to estimated motif activity.

Having obtained quantitative estimates of motif activity, we examined each element's functional architecture. Apart from the obvious clustering of functional motifs, we were unable to discern any features (such as spacing, order, and relative orientation of motifs) that might explain the functions of individual motifs or of the elements as a whole, which would have shed light on organizational principles of regulatory elements. Indeed, there is notable heterogeneity among the loci: Elements are built from motifs of widely varying activity, from different combinations of motif types, and in diverse arrangements (Fig. 2B). For example, the cis-element at CK spans 31 bp and consists of one intermediate and one strong Tbx6 motif, whereas the AT1 cis-element consists of two weak CRE motifs, followed by two intermediate Tbx6 motifs and a strong MyoD motif, across 35 bp. Although motif independence is prevalent, elements do somewhat differ in how much genetic interaction exists. At MBP and AT2, for example, the additive model explains the data very well with high correlations between the predictions and the actual data (r2Cs-MBP = 0.83, r2Cs-AT2 = 0.77; tables S1 and S3) and with little function unexplained by the model. Function at MA1, by contrast, is not described as well by models without interactions (table S3).

Conceivably, the heterogeneous regulatory architectures specify subtle differences in expression pattern or timing during developmental stages or physiological conditions not assayed here. It is clear, however, that the genes' tight coexpression in the embryonic tail muscle is achieved by a common and restricted set of three transcription factors acting upon vastly different cis-element architectures, the diversity of which defies the expectation that commonalities in design underlie co-regulation.

In stark contrast to the apparent flexibility of regulatory architecture, we observed little change in motif activity, order, or composition between orthologous elements of C. intestinalis and C. savignyi. (Neutral sequence divergence between the two Ciona species is approximately equivalent to that between mammals and birds, ruling out the possibility that these sequences have not been afforded enough time to accumulate change.) At single-copy genes, 26 of the 27 motifs with statistically significant activity have a clearly orthologous counterpart. Orthologous motifs drive very similar, in many cases indistinguishable, amounts of activity (Fig. 3A). For example, both MBP orthologs are regulated by a strong MyoD, a weak Tbx6, and a weak CRE motif, with less than 0.039 efu average deviation in individual motif activity. In total, the activity of orthologous regulatory motif pairs is highly correlated between the two species (Spearman's ρ = 0.61, P < 0.005; Fig. 3B). Thus, co-regulated gene expression at these loci has been maintained by conserving the locus-specific ancestral cis-regulatory architectures, with purifying selection tolerating little functional flexibility.

Fig. 3.

Evolution of motif activity. (A) Motif-level distribution of regulatory activity at six orthologous gene pairs. Distance from transcription start site and motif activity are plotted along the x and y axes, respectively. Open and solid circles represent individually resolved C. savignyi and C. intestinalis motifs, respectively. Colors are same as in Fig. 2. (B) Conservation of orthologous motif activity. C. intestinalis and C. savignyi motif activity plotted against each other. (C) Compensatory evolution of AT2 regulatory elements for C. savignyi (top) and C. intestinalis (bottom). Arrow direction and thickness represent Tbx6 motif orientation and strength of match to its position-specific scoring matrix. Bar plots depict activity of each Tbx6 motif, as estimated from additive regression models. (D) Functional turnover in paralogous motifs. Plotted as in (B). Error bars in all panels depict standard error.

Strong constraint is evident at the sequence level as well. Functional regulatory motifs exhibit far fewer substitutions than the genome-wide average (P <3.8 × 10–10) (SOM text, section 7, and fig. S8A). The pairwise identity between orthologous functional motifs is 79%, whereas the genome-wide background identity is <20% (including insertions and deletions). Sequence identity markedly drops off outside the boundaries of the functional motifs (Spearman's ρ = –0.57, P < 0.05), reaching genome-wide background levels within 12 bp (fig. S8B). Finally, there is less population genetic variation in the motifs than on average in the genome (17) (SOM text, section 7, and fig. S10). This demonstrates that the motif sequences are subject to far greater evolutionary constraint than flanking sequence and that the functional motifs themselves are the units maintained by purifying selection.

The sequence changes that have occurred are not distributed evenly among the orthologous motifs, with functionally strong motifs having accumulated fewer substitutions than weak motifs. Notably, motif activity is significantly correlated with percent identity (Spearman's ρ = 0.35, P < 0.05). This is likely due to strong regulatory motifs being responsible for a larger fraction of the total function of a regulatory element; substitutions in them will therefore result in greater phenotypic consequences and be subject to stronger levels of purifying selection, decreasing the evolutionary rate of the element.

Further emphasizing that the activity of an element is controlled tightly by evolutionary pressuresisacaseofcompensatoryevolutioninAT2. Two Tbx6 motifs (Fig. 3, A and C) are functionally strong in one species and weak in the other, in a complementary pattern. These activity differences correlate with substitutions away from or toward the motif consensus (Fig. 3C). The functional differentiation of these cis-regulatory motifs did not result from motif gain or loss but from base substitutions that modified motif activity.

Comparison of the cis-regulatory architectures of the three groups of paralogs from C. savignyi presents a strong contrast to the highly constrained orthologous cis-elements. The paralogous architectures show a high degree of differentiation in the form of element and motif-level sequence turnover as well as functional divergence of well-aligned motifs (Figs. 2B and 3D). Thus, whereas purifying selection acting on orthologous motifs in single-copy genes is strong enough to maintain conservation of regulatory motif sequence and function over long evolutionary distances, the paralogous motifs of clusters of genes that encode the same protein exhibit far greater rates of turnover. We speculate that this greater flexibility in elements of clustered multicopy genes is tolerated because changes in the activity of one element have a small effect on the total function of the cluster.

The two most prominent developmental mechanisms that build a multicellular organism are pattern formation and cellular differentiation. Previous studies of regulatory architecture and evolution were conducted in pattern formation systems, either by leveraging sequence comparisons and broad functional genomic data (1820) or by studying a single regulatory element in detail (2124). By contrast, we dissected the regulation of coexpression during cellular differentiation and introduced a quantitative framework for targeted experimental analysis of motif function. Using the Ciona muscle system, we demonstrated that coexpression is driven by regulatory motifs of broadly varying activity assembled into a diverse array of cis-elements. Despite this flexibility in cis-regulatory architecture, motif-level sequence and function are exquisitely maintained in distantly related orthologs. Thus, whereas a diversity of cis-regulatory architectures can generate nearly identical phenotypic outputs, the fitness landscapes separating them appear to be sufficiently rugged to constrain their evolution (25).

Our findings have implications for understanding genetic variation in such co-regulatory systems. Polymorphisms in cis-elements will range in phenotype, depending on the amount of activity that the affected motif contributes to the function of its element; the most direct evidence for this view is the wide range of effects on cis-element function by the individual motif mutants we tested. Similarly, a polymorphism in a trans-acting factor will not affect expression of all targets equally but will instead have a target-specific effect whose magnitude is determined by the architecture of the target's cis-element. These conclusions highlight the challenges that lie ahead for interpretation of genetic variation in gene regulatory systems, including those of vertebrates, Ciona's most advanced close relatives.

Supporting Online Material

Materials and Methods

SOM Text

Figs. S1 to S11

Tables S1 to S3

Marker (Java image annotation tool)


References and Notes

View Abstract

Stay Connected to Science

Navigate This Article