The Robustness and Evolvability of Transcription Factor Binding Sites

See allHide authors and affiliations

Science  21 Feb 2014:
Vol. 343, Issue 6173, pp. 875-877
DOI: 10.1126/science.1249046

Robust to Change

The variation in genetic sequences determining the binding of transcription factors is believed to be an important facet of evolution. However, the degree to which a genome is robust, that is, able to withstand changes and how robustness affects evolution is unclear. Payne and Wagner (p. 875) investigated the empirical support for mutational robustness by examining transcription factor (TF) binding in the mouse and yeast genomes. A network analysis of the degree of variation revealed that the sites with the highest affinity for TF binding exhibit the greatest tolerance for nucleotide substitutions, whereas low-affinity sites exhibit greater sensitivity to mutation. Thus, while mutational robustness and evolvability are antagonistic at the genotypic level, they are synergistic at the phenotypic level.


Robustness, the maintenance of a character in the presence of genetic change, can help preserve adaptive traits but also may hinder evolvability, the ability to bring forth novel adaptations. We used genotype networks to analyze the binding site repertoires of 193 transcription factors from mice and yeast, providing empirical evidence that robustness and evolvability need not be conflicting properties. Network vertices represent binding sites where two sites are connected if they differ in a single nucleotide. We show that the binding sites of larger genotype networks are not only more robust, but the sequences adjacent to such networks can also bind more transcription factors, thus demonstrating that robustness can facilitate evolvability.

Changes in gene expression via mutations in cis-regulatory regions can explain much of life’s diversity (1). Of particular importance are mutations in the specific sequences that determine transcription factor (TF) binding sites and coordinate gene expression in both space and time. Such mutations may change the identity of the cognate TF or alter the affinity with which a site is bound (2). This may, in turn, change the structure or logic of the transcriptional regulatory circuits in which these sites are embedded (3) and lead to adaptations in the form of novel gene expression patterns (4). Such adaptations may eventually lead to evolutionary innovations, such as new pigmentation patterns (5) or body structures (6).

Transcription factor binding sites are typically between 6 and 10 nucleotides long, which may reflect a tradeoff between the specificity of a site and its robustness to mutation (7). TF binding sites can be degenerate, with some TFs binding hundreds of different sequences, whereas others bind merely dozens (8). It is not known how this degeneracy contributes to the mutational robustness of TF binding sites, nor to their evolvability, which is defined as the ability to bind different TFs after mutation.

Many recent studies have attempted to elucidate the robustness and evolvability of living systems [reviewed in (9)]. These studies tend to use computation to map genotypes to phenotypes, facilitating the systematic characterization of vast genotype spaces. Several of these modeling efforts suggest that genotype networks [neutral networks (10)] are responsible for the robustness and evolvability of living systems. A genotype network is a set of genotypes that have the same phenotype, where two genotypes are connected by an edge if they differ by a single mutation. Large genotype networks confer robustness because genetic perturbations are unlikely to drive a genotype off the network (11), and these networks confer evolvability because they extend throughout genotype space, providing mutational access to a diversity of genotypes that have different phenotypes (12).

Although for most biological systems it is currently not possible to experimentally determine an exhaustive genotype-to-phenotype map, recent advances in microarray technologies (13, 14) have made such a mapping possible for TF binding sites. We used protein binding microarray data to characterize the genotype networks of TF binding sites for 104 mouse (8) and 89 yeast (15) TFs (16). For each TF, we used the enrichment score (E-score)—a proxy for binding affinity (8, 13)—of each of the 32,896 possible contiguous binding sites (eight nucleotides in length) to categorize a site as bound or unbound (fig. S1) (16). We then assessed the genotype networks for the robustness and evolvability of individual TF binding sites and of the complete binding repertoires of TFs (Fig. 1). For example, the mouse TF Foxa2 (Fig. 1A), a key regulator of developmental processes (17), is presented as a representative of the TFs we consider (database S1 and fig. S2) and is used to illustrate how sites are connected in the genotype network (Fig. 1B) and how mutation can change a site’s cognate TFs (Fig. 1C). Where possible, we complement our analysis with in vivo binding data generated by genome-wide chromatin immunoprecipitation followed by DNA sequencing.

Fig. 1 Genotype networks of TF binding sites.

(A) Genotype network for the mouse TF Foxa2. Each vertex represents a DNA sequence bound by a TF (false discovery rate Q < 0.001), its color captures binding affinity (darker = higher), and its size indicates the number of neighboring binding sites (bigger = more neighbors). The latter is proportional to mutational robustness (see text). (B) Vertices are neighbors and are connected by an edge if they represent sites that are separated by a single small mutation. We consider two kinds of such mutations, namely point mutations and indels that shift an entire, contiguous binding site by a single base (16). (C) Some mutations transform a DNA sequence that is on the genotype network into one that is not (black dotted lines). In these cases, the mutant sequence may bind another TF (hypothetical new cognate TFs indicated by black, white, and gray circles).

For 99% of the TFs (103 of 104 in mice and 87 of 89 in yeast), the majority of bound sites were part of a single connected genotype network, which we refer to as the dominant genotype network. Moreover, for 60% of the TFs (65 of 104 in mice and 51 of 89 in yeast), the dominant genotype network comprised all of the bound sites (e.g., Foxa2) (Fig. 1 and database S1). We also observed that the number of disconnected genotype networks per TF decreased as the number of bound sites increased (fig. S3), indicating that decreasing TF specificity promotes genotype network connectivity. The basic structural properties of these genotype networks did not differ between mouse and yeast TFs (fig. S4), but were significantly different from what was expected under a null model [Permutation test, Pnull < 0.005 (16) (fig. S5)] and exhibited variation both within and among DNA binding domain structural classes (fig. S6 and database S1).

We quantified the robustness of each of a TF’s binding sites as the site’s number of neighbors in the genotype network, divided by its number of possible neighbors (16). Because the timing, location, and level of gene expression are important for many biological functions, their disruption through mutations in TF binding sites can be deleterious. Thus, the mutational robustness of a TF binding site can be an important factor in the resilience of gene expression to genetic change. On average, a Foxa2 binding site (Fig. 2A) can tolerate 37% of possible mutations [significantly more than expected under the null model (16); Permutation test, Pnull < 0.005] but exhibits substantial variation around this average and ranges from binding sites that can only tolerate 3% of all possible single mutations to those that can tolerate the majority of such mutations (72%). For all other TFs, average mutational robustness ranged from 7 to 48% (Fig. 2B). We additionally found that mutational robustness and binding affinity were positively correlated (fig. S7) (18) and that high-affinity sites were often enriched in vivo (fig. S7A) (18), both genome-wide (table S1) and within putative enhancers (table S2), suggesting that in vivo binding sites are often mutationally robust.

Fig. 2 The mutational robustness and evolvability of TF binding sites.

(A) Distribution of mutational robustness for all sites bound by Foxa2. (B) Distribution of the average mutational robustness across all sites bound by each of the 104 mouse (dark red) and 89 yeast (light beige) TFs. An intermediate shade indicates that the bars are overlapping. (C) Distribution of evolvability for all sites bound by Foxa2. (D) Distribution of the average evolvability across all sites bound by each of the mouse and yeast TFs.

Many of the morphological differences between closely related organisms are caused by mutations in cis-regulatory regions (4, 6, 1921). These mutations often comprise only one or a few base pair changes that may result in the gain or loss of one or more TF binding sites. To assess how mutations in the binding sites of specific TFs may bring about novel regulation, we measured a binding site’s evolvability as the proportion of TFs in our data set that bind the sequences that lie within a single mutation of the binding site, but are not themselves part of the TF’s genotype network (Figs. 1C and 2, C and D). Whereas such mutations are likely often deleterious, they also have the potential to generate novel gene expression patterns that may be adaptive. The estimate of evolvability for all binding sites of Foxa2 demonstrates that every site is within a single mutation of at least one sequence that binds a TF other than Foxa2, as expected under the null model (Permutation test, Pnull = 1). This suggests that the binding preferences of the TFs considered here are so highly intertwined that any sequence, whether or not it is part of a large genotype network, will neighbor at least one sequence that binds another TF. On average, the sites bound by Foxa2 were separated by a single mutation from sequences that bind 26% of the other 103 mouse TFs (Permutation test, Pnull = 0.069). Similar observations hold for all of the mouse and yeast TFs that we considered (Fig. 2D and database S1).

Theoretical studies suggest that both robustness and evolvability are facilitated by the existence of large genotype networks (9). We provide empirical evidence for this theory through the measurement of repertoire robustness, defined as the average mutational robustness of each binding site in the repertoire (see Fig. 2B), and repertoire evolvability, defined as the proportion of TFs in our data set that bind sequences within a single mutation of any binding site in the repertoire. These measurements show that large genotype networks confer repertoire robustness (Fig. 3A) (Spearman’s correlation coefficient r = 0.90, P < 1.0 × 10–50) and repertoire evolvability (Fig. 3B) (Spearman’s r = 0.65, P = 1.84 × 10–24). Although repertoire robustness increased gradually with genotype network size and was significantly higher than expected for all TFs (Permutation test, Pnull < 0.005), repertoire evolvability increased more abruptly and was significantly higher than expected for 91% of TFs, as compared with the null model (Permutation test, Pnull < 0.005 for 96 of 104 in mouse and 80 of 89 in yeast) (database S1), such that even small genotype networks had high evolvability. This stems from the diversity of TFs that bind the sequences one mutation away from any two sites (fig. S8) (18). We also found that the binding sites of TFs with large genotype networks were more likely to arise de novo in DNA sequences (fig. S9) (18). These results show that whereas a tradeoff may exist between robustness and evolvability at the level of an individual binding site (18), the organization of these sites as a connected genotype network facilitates a synergistic relation between robustness and evolvability at the level of the binding repertoire. These results are insensitive to the threshold used to define a site as bound (figs. S10 to S12) (18), as well as to an alternative evolvability measure that considers only TFs with differing DNA binding domains (fig. S13) (18).

Fig. 3 Large genotype networks confer repertoire robustness and evolvability.

Repertoire (A) robustness and (B) evolvability are shown as a function of the number of TF binding sites in the dominant genotype network for all 104 mouse and 89 yeast TFs.

On the basis of in vitro and in vivo measurements of TF-DNA interactions, our observations imply that it is almost always possible to transform one bound site into any other via a series of small mutations that preserve TF binding. This suggests that the mutational robustness of TF binding sites can be fine-tuned via mutation. The broad distributions of binding site robustness and evolvability are consistent with in vivo studies of TF binding sites, which have reported that the number of point mutations with a regulatory effect can vary greatly among sites (2, 22), and with comparative studies of binding site turnover in closely related species (1921). Our analysis of TF binding repertoires indicate that decreased TF specificity yields large connected genotype networks that confer robustness and evolvability to the binding sites they harbor. Although our findings have several caveats (18), they are in line with studies of genotype networks in biological systems, including the existence of a large dominant genotype network (10), the tradeoff between robustness and evolvability for individual genotypes (23), and the observation that large genotype networks confer robustness and evolvability (12). As high-throughput technologies continue to advance, it may become possible to exhaustively study not only TF binding sites, but also entire regulatory circuits (24), paving the way for a more complete understanding of the robustness and evolvability of living systems.

Supplementary Materials

Materials and Methods

Supplementary Text

Figs. S1 to S15

Tables S1 and S2

References (2571)

Database S1

References and Notes

  1. Materials and methods are available as supplementary materials on Science Online.
  2. See supplementary text on Science Online.
  3. Acknowledgments: We thank A. Barve, D. Pechenick, J. Aguilar-Rodríguez, K. Sprouffske, and D. Urbach for discussions. J.L.P. acknowledges support from the International Research Fellowship Program of the NSF. A.W. acknowledges support through Swiss National Science Foundation grant 315230-129708 and the University Priority Research Program in Evolutionary Biology at the University of Zurich. Database S1 is available as supplementary material on Science Online.
View Abstract

Navigate This Article