Research Article

A comprehensive Xist interactome reveals cohesin repulsion and an RNA-directed chromosome conformation

See allHide authors and affiliations

Science  17 Jul 2015:
Vol. 349, Issue 6245, aab2276
DOI: 10.1126/science.aab2276

Protein partners for chromosome silencing

Female mammals have two X chromosomes, one of which is almost completely shut down during development. The long noncoding Xist RNA plays a role in this process. To understand how a whole chromosome can be stably inactivated, Minajigi et al. identified many of the proteins that bind to the Xist RNA, which include cohesins. Paradoxically, the interaction between Xist and cohesin subunits resulted in repulsion of cohesin complexes from the inactive X chromosome, changing the three-dimensional shape of the whole chromosome.

Science, this issue 10.1126/science.aab2276

Structured Abstract

INTRODUCTION

The mammal has evolved an epigenetic mechanism to silence one of two X chromosomes in the XX female to equalize gene dosages with the XY male. Once established, the inactivated X chromosome (Xi) is extremely stable and is maintained through the lifetime of the female mammal. The principal regulator, Xist, is a long noncoding RNA that orchestrates the silencing process along the Xi. Xist is believed to operate as a scaffold to recruit and spread repressive complexes, such as Polycomb Repressive Complex 2, along the X chromosome. The identities of crucial interacting factors, however, have remained largely unknown.

RATIONALE

Although the Xi’s epigenetic stability is a necessary homeostatic property, an ability to unlock this epigenetic state is of great current interest. The X chromosome is home to nearly 1000 genes, at least 50 of which have been implicated in X-linked diseases, such as Rett syndrome and fragile X syndrome. The Xi is therefore a reservoir of functional genes that could be tapped to replace expression of a disease allele on the active X (Xa). A major gap in current understanding is the lack of a comprehensive Xist interactome. Progress toward a full interactome would advance knowledge of epigenetic regulation by long noncoding RNA and potentially inform treatment of X-linked diseases.

RESULTS

We have developed an RNA-centric proteomic method called iDRiP (identification of direct RNA-interacting proteins).

Using iDRiP, we identified 80 to 200 proteins in the Xist interactome. The interactors fall into several functional categories, including cohesins, condensins, topoisomerases, RNA helicases, chromatin remodelers, histone modifiers, DNA methyltransferases, nucleoskeletal factors, and nuclear matrix proteins. Targeted inhibition demonstrates that Xi silencing can be destabilized by disrupting multiple components of the interactome, consistent with the idea that these factors synergistically repress Xi transcription. Triple-drug treatments lead to a net increase of Xi expression and up-regulation of ~100 to 200 Xi genes. We then carry out a focused study of X-linked cohesin sites. Chromatin immunoprecipitation sequencing analysis demonstrates three types of cohesin sites on the X chromosome: Xi-specific sites, Xa-specific sites, and biallelic sites. We find that the Xa-specific binding sites represent a default state. Ablating Xist results in restoration of Xa-specific sites on the Xi. These findings demonstrate that, while Xist attracts repressive complexes to the Xi, it actively repels chromosomal architectural factors such as the cohesins from the Xi. Finally, we examine how Xist and the repulsion of cohesins affect Xi chromosome structure. In wild-type cells, the Xa is characterized by ~112 topologically associated domains (TADs) and the Xi by two megadomains. Intriguingly, loss of Xist and restoration of cohesin binding result in a reversion of the Xi to an Xa-like chromosome conformation. Hi-C analysis shows that TADs return to the Xi in a manner correlated with the reappearance of cohesins and with a transcriptionally permissive state.

CONCLUSION

Our study unveils many layers of Xi repression and demonstrates a central role for RNA in the topological organization of mammalian chromosomes. Our study also supports a model in which Xist RNA simultaneously acts as (i) a scaffold for the recruitment of repressive complexes to establish and maintain the inactive state and (ii) a repulsion mechanism to extrude architectural factors such as cohesins to avoid acquisition of a transcription-favorable chromatin conformation. Finally, our findings indicate that the stability of the Xi can be perturbed by targeted inhibition of multiple components of the Xist interactome.

An operational model for how Xist RNA orchestrates the Xi state.

Xist is a multitasking RNA that brings many layers of repression to the Xi. Although Xist RNA recruits repressive complexes (such as PRC1, PRC2, DNMT1, macroH2A, and SmcHD1) to establish and maintain the inactive state, it also actively repels activating factors and architectural proteins (such as the cohesins and CTCF) to avoid acquisition of a transcription-favorable chromatin conformation.

Abstract

The inactive X chromosome (Xi) serves as a model to understand gene silencing on a global scale. Here, we perform “identification of direct RNA interacting proteins” (iDRiP) to isolate a comprehensive protein interactome for Xist, an RNA required for Xi silencing. We discover multiple classes of interactors—including cohesins, condensins, topoisomerases, RNA helicases, chromatin remodelers, and modifiers—that synergistically repress Xi transcription. Inhibiting two or three interactors destabilizes silencing. Although Xist attracts some interactors, it repels architectural factors. Xist evicts cohesins from the Xi and directs an Xi-specific chromosome conformation. Upon deleting Xist, the Xi acquires the cohesin-binding and chromosomal architecture of the active X. Our study unveils many layers of Xi repression and demonstrates a central role for RNA in the topological organization of mammalian chromosomes.

The mammalian X chromosome is unique in its ability to undergo whole-chromosome silencing. In the early female embryo, X-chromosome inactivation (XCI) enables mammals to achieve gene dosage equivalence between the XX female and the XY male (13). XCI depends on Xist RNA, a 17-kb long noncoding RNA (lncRNA) expressed only from the inactive X chromosome (Xi) (4) and that implements silencing by recruiting repressive complexes (58). Whereas XCI initiates only once during development, the female mammal stably maintains the Xi through her lifetime. In mice, a germline deletion of Xist results in peri-implantation lethality due to a failure of Xi establishment (9), whereas a lineage-specific deletion of Xist causes a lethal blood cancer due to a failure of Xi maintenance (10). Thus, both the de novo establishment and proper maintenance of the Xi are crucial for viability and homeostasis. There are therefore two critical phases of XCI: (i) A one-time initiation phase in peri-implantation embryonic development that is recapitulated by differentiating embryonic stem (ES) cells in culture, and (ii) a lifelong maintenance phase that persists in all somatic lineages.

Once established, the Xi is extremely stable and difficult to disrupt genetically and pharmacologically (1113). In mice, X reactivation is programmed to occur only twice: once in the blastocyst to erase the imprinted XCI pattern and a second time in the germ line before meiosis (14, 15). Although the Xi’s epigenetic stability is a homeostatic asset, an ability to unlock this epigenetic state is of great current interest. The X chromosome is home to nearly 1000 genes, at least 50 of which have been implicated in X-linked diseases, such as Rett syndrome and fragile X syndrome. The Xi is therefore a reservoir of functional genes that could be tapped to replace expression of a disease allele on the active X (Xa). A better understanding of Xi repression would inform both basic biological mechanisms and treatment of X-linked diseases.

It is believed that Xist RNA silences the Xi through conjugate protein partners. A major gap in current understanding is the lack of a comprehensive Xist interactome. Despite multiple attempts to define the complete interactome, only four directly interacting partners have been identified over the past two decades, including PRC2, ATRX, YY1, and HNRPU: Polycomb repressive complex 2 (PRC2) is targeted by Xist RNA to the Xi; the ATRX RNA helicase is required for the specific association between Xist and PRC2 (16, 17); YY1 tethers the Xist-PRC2 complex to the Xi nucleation center (18); and the nuclear matrix factor, HNRPU/SAF-A, enables stable association of Xist with the chromosomal territory (19). Many additional interacting partners are expected, given the large size of Xist RNA and its numerous conserved modular domains. We have developed an RNA-based proteomic method and implement an unbiased screen for Xist’s comprehensive interactome. We identify a large number of high-confidence candidates, demonstrate that it is possible to destabilize Xi repression by inhibiting multiple interacting components, and then delve into a focused set of interactors with the cohesins.

Results

iDRiP identifies multiple classes of Xist-interacting proteins

A systematic identification of interacting factors has been challenging because of Xist’s large size, the expected complexity of the interactome, and the persistent problem of high background with existing biochemical approaches (20). A high background could be particularly problematic for chemical cross-linkers that create extensive covalent networks of proteins, which could in turn mask specific and direct interactions. We therefore developed iDRiP (identification of direct RNA interacting proteins) using the zero-length cross-linker, ultraviolet (UV) light, to implement an unbiased screen of directly interacting proteins in female mouse fibroblasts expressing physiological levels of Xist RNA (Fig. 1A). We performed in vivo UV cross-linking, prepared nuclei, and solubilized chromatin by DNase I digestion. Xist-specific complexes were captured using nine complementary oligonucleotide probes spaced across the 17-kb RNA, with a 25-nucleotide probe length designed to maximize RNA capture while reducing nonspecific hybridization. The complexes were washed under denaturing conditions to eliminate factors not covalently linked by UV to Xist RNA. To minimize background due to DNA-bound proteins, a key step was inclusion of DNase I treatment before elution of complexes (see the supplementary text). We observed substantial enrichment of Xist RNA over highly abundant cytoplasmic and nuclear RNAs (U6, Jpx, and 18S ribosomal RNA) in eluates of female fibroblasts (Fig. 1B). Enrichment was not observed in male eluates or with luciferase capture probes. Eluted proteins were subjected to quantitative mass spectrometry (MS), with spectral counting (21) and multiplexed quantitative proteomics (22) yielding similar enrichment sets (table S1).

Fig. 1 iDRiP-MS reveals a large Xist interactome.

(A) iDRiP schematic. (B) RT-qPCR demonstrated the specificity of Xist pulldown by iDRiP. Xist and control luciferase probes were used for pulldown from UV–cross-linked female and control male fibroblasts. Efficiency of Xist pulldown was calculated by comparing to a standard curve generated using 10-fold dilutions of input. Mean ± standard error (SE) of three independent experiments shown. P determined by Student’s t test. (C) Selected high-confidence candidates from three biological replicates are grouped into functional classes. Additional candidates are shown in table S1. (D) UV-RIP-qPCR validation of candidate interactors. Enrichment calculated as percentage input, as in (B). Mean ± SE of three independent experiments shown. P determined by Student’s t test. (E) RNA immunoFISH to examine localization of candidate interactors (green) in relation to Xist RNA (red). Immortalized MEF cells are tetraploid and harbor two Xi.

From three independent replicates, iDRiP-MS revealed a large Xist protein interactome (Fig. 1C and table S1). Recovery of known Xist interactors, PRC2, ATRX, and HNRPU, provided a first validation of the iDRiP technique. Also recovered were macrohistone H2A (mH2A), RING1 (PRC1), and the condensin component, SmcHD1—proteins known to be enriched on the Xi (19, 23, 24) but not previously shown to interact directly with Xist. More than 80 proteins were found to be ≥threefold enriched over background; >200 proteins were ≥twofold enriched (table S1). In many cases, multiple subunits of the epigenetic complex were identified, boosting our confidence in them as interactors. We verified select interactions by performing a test of reciprocity: By baiting with candidate proteins in an antibody capture, RIP-qPCR (RNA immunoprecipitation–quantitative polymerase chain reaction) of UV-cross-linked cells reciprocally identified Xist RNA in the pulldowns (Fig. 1D). Called on the basis of high enrichment values, presence of multiple subunits within a candidate epigenetic complex, and tests of reciprocity, novel high-confidence interactors fell into several functional categories: (i) Cohesin complex proteins SMC1a, SMC3, RAD21, WAPL, and PDS5a/b, as well as CCCTC-binding factor (CTCF) (25), which are collectively implicated in chromosome looping (2628); (ii) histone modifiers such as aurora kinase B (AURKB), a serine/threonine kinase that phosphorylates histone H3 (29); RING1, the catalytic subunit of polycomb repressive complex 1 (PRC1) for H2A-K119 ubiquitylation (23); SPEN and RBM15, which associate with histone deacetylases; (iii) switch/sucrose non-fermentable (SWI/SNF) chromatin remodeling factors; (iv) topoisomerases TOP2a, TOP2b, and TOP1, which relieve torsional stress during transcription and DNA replication; (v) miscellaneous transcriptional regulators like MYEF2 and ELAV1; (vi) nucleoskeletal proteins that anchor chromosomes to the nuclear envelope, SUN2, lamin-B receptor (LBR), and LAP2; (vii) nuclear matrix proteins hnRPU/SAF-A, hnRPK, and MATRIN3; and (viii) the DNA methyltransferase DNMT1, known as a maintenance methylase for CpG dinucleotides (30).

To study their function, we first performed RNA immunofluorescence in situ hybridization (immunoFISH) of female cells and observed several patterns of Xi coverage relative to the surrounding nucleoplasm (Fig. 1E). Like PRC2, RING1 (PRC1) has been shown to be enriched on the Xi (23) and is therefore not pursued further. TOP1 and TOP2a/b appeared neither enriched nor depleted on the Xi (100%, n > 50 nuclei). AURKB showed two patterns of localization: pericentric enrichment (20%, n > 50) and a more diffuse localization pattern (80%, data not shown), consistent with its cell-cycle-dependent chromosomal localization (29). On the other hand, whereas SUN2 was depleted on the Xi (100%, n = 52), it often appeared as pinpoints around the Xi in both day 7 differentiating female ES cells (establishment phase; 44%, n = 307) and in fibroblasts (maintenance phase; 38.5%, n = 52), consistent with SUN2’s function in tethering telomeres to the nuclear envelope. Finally, the cohesins and SWI/SNF remodelers unexpectedly showed a depletion relative to the surrounding nucleoplasm (100%, n = 50 to 100). These patterns suggest that the Xist interactors operate in different XCI pathways.

To ask if the factors intersect the PRC2 pathway, we stably knocked down (KD) top candidates using short-hairpin RNAs (shRNAs) (table S2) and performed RNA immunoFISH to examine trimethylation of histone H3-lysine 27 (H3K27me3) (Fig. 2, A and B). No major changes to Xist localization or H3K27me3 were evident in d7 ES cells (fig. S1). There were, however, long-term effects in fibroblasts: The decrease in H3K27me3 enrichment in shSMARCC1 and shSMARCA5 cells (Fig. 2, A and B) indicated that SWI/SNF interaction with Xist is required for proper maintenance of PRC2 function on the Xi. Steady-state Xist levels did not change by more than twofold (Fig. 2C) and were therefore unlikely to be the cause of the polycomb defect. Knockdowns of other factors (cohesins, topoisomerases, SUN2, and AURKB) had no obvious effects on Xist localization and H3K27me3. Thus, whereas the SWI/SNF factors intersect the PRC2 pathway, other interactors do not overtly affect PRC2.

Fig. 2 Effect of depleting Xist interactors on H3K27me3.

(A) RNA immunoFISH of Xist (red) and H3K27me3 (green) after shRNA KD of interactors in fibroblasts (tetraploid; two Xist clouds). KD efficiencies (fraction remaining): SMC1a-0.48, SMC3-0.39, RAD21-0.15, AURKB-0.27, TOP2b-0.20, TOP2a-0.42, TOP1-0.34, CTCF-0.62, SMARCA4-0.52, SMARCA5-0.18, SMARCC1-0.25, SMARCC2-0.32, SMARCB1-0.52, and SUN2-0.72. Some factors are essential; therefore, high-percentage KD may be inviable. All images are presented at the same photographic exposure and contrast. (B) Quantitation of RNA immunoFISH results. n, sample size. Percentages of aberrant Xist/H3K27me3 associations are shown. (C) RT-qPCR of Xist levels in KD fibroblasts, normalized to shControls. Means ± SD of two independent experiments are shown.

Xi reactivation via targeted inhibition of synergistic interactors

Given the large number of interactors, we created a screen to analyze effects on Xi gene expression. We derived clonal fibroblast lines harboring a transgenic green fluorescent protein (GFP) reporter on the Xi (fig. S2) and shRNAs against Xist interactors. Knockdown of any one interactor did not reactivate GFP by more than fourfold (Fig. 3A, shControl + none, and fig. S3A). Suspecting synergistic repression, we targeted multiple pathways using a combination of drugs. To target DNMT1, we employed the small molecule 5′-azacytidine (aza) (30) at a nontoxic concentration of 0.3 μM [≤median inhibitory concentration (IC50)], which minimally reactivated GFP (Fig. 3A, shControl + aza). To target TOP2a/b (31), we employed etoposide (eto) at 0.3 μM (≤IC50), which also minimally reactivated GFP (Fig. 3A, shControl + eto). Combining 0.3 μM aza + eto led to an 80- to 90-fold reactivation—a level that was almost half of GFP levels on the Xa (Xa-GFP) (Fig. 3A), suggesting strong synergy between DNMT1 and TOP2 inhibitors. Using aza + eto as priming agents, we designed triple-drug combinations inclusive of shRNAs for proteins that have no specific small-molecule inhibitors. In various shRNA + aza + eto combinations, we achieved up to 230-fold GFP reactivation—levels that equaled or exceeded Xa-GFP levels (Fig. 3A). The greatest effects were observed for combinations using shSMARCC2 (227x), shSMARCA4 (180x), and shRAD21 (211x). shTOP1 and shCTCF were also effective (175x and 154x, respectively). Combinations involving remaining interactors yielded 63 to 94x reactivation.

Fig. 3 Derepression of Xi genes by targeting Xist interactors.

(A) Relative GFP levels by RT-qPCR analysis in female fibroblasts stably knocked down for indicated interactors ± 0.3 μM 5′-azacytidine (aza) ± 0.3 μM etoposide (eto). Xa-GFP, control male fibroblasts with X-linked GFP. Means ± SE of two independent experiments are shown. P, determined by Student’s t test. (B) Allele-specific RNA-seq analysis: Number of up-regulated Xi genes for each indicated triple-drug treatment (aza + eto + shRNA). Blue, genes specifically reactivated on Xi [fold change (FC) > 2]; red, genes also up-regulated on Xa (FC > 1.3). (C) RNA-seq heat map indicating that a large number of genes on the Xi were reactivated. X-linked genes reactivated in at least one of the triple-drug treatment (aza + eto + shRNA) were shown in the heat map. Color key, Log2 FC. Cluster analysis performed based on similarity of KD profiles (across) and on the sensitivity and selectivity of various genes to reactivation (down). (D) Chromosomal locations of Xi reactivated genes (colored ticks) for various aza + eto + shRNA combinations. (E) Read coverage of four reactivated Xi genes after triple-drug treatment. Xi, mus reads (scale, 0 to 2). Comp, total reads (scale, 0 to 6). Red tags appear only in exons with SNPs.

We then performed allele-specific RNA sequencing (RNA-seq) to investigate native Xi genes. In an F1 hybrid fibroblast line in which the Xi is of Mus musculus (mus) origin and the Xa of Mus casteneus (cas) origin, >600,000 X-linked sequence polymorphisms enabled allele-specific calls (32). Two biological replicates of each of the most promising triple-drug treatments showed good correlation (figs. S4 to S6). RNA-seq analysis showed reactivation of 75 to 100 Xi-specific genes in one replicate (Fig. 3B) and up to 200 in a second replicate (fig. S3B), representing a large fraction of expressed X-linked genes, considering that only ~210 X-linked genes have an fragments per kilobase of transcript per million mapped reads (FPKM) ≥ 1.0 in this hybrid fibroblast line. Heat map analysis demonstrated that, for individual Xi genes, reactivation levels ranged from 2 to 80x for various combinatorial treatments (Fig. 3C). There was a net increase in expression level (ΔFPKM) from the Xi in the triple-drug-treated samples relative to the shControl + aza + eto, whereas the Xa and autosomes showed no obvious net increase, thereby suggesting preferential effects on the Xi due to targeting synergistic components of the Xist interactome. Reactivation was not specific to any one Xi region (Fig. 3D). Most effective were shRAD21, shSMC3, shSMC1a, shSMARCA4, shTOP2a, and shAURKB drug combinations. Genic examination confirmed increased representation of mus-specific tags (red) relative to the shControl (Fig. 3E). Such allelic effects were not observed at imprinted loci and other autosomal genes (fig. S7), further suggesting Xi-specific allelic effects. The set of reactivated genes varied among drug treatments, although some genes (e.g., Rbbp7, G6pdx, and Fmr1) appeared more prone to reactivation. Thus, the Xi is maintained by multiple synergistic pathways, and Xi genes can be reactivated preferentially by targeting two or more synergistic Xist interactors.

Xist interaction leads to cohesin repulsion

To investigate the mechanism, we focused on one group of interactors—the cohesins—because they were among the highest-confidence hits and their knockdowns consistently destabilized Xi repression. To obtain Xa and Xi binding patterns, we performed allele-specific chromatin immunoprecipitation sequencing (ChIP-seq) for two cohesin subunits, SMC1a and RAD21, and for CTCF, which works together with cohesins (28, 3335). In wild-type cells, CTCF binding was enriched on Xa (cas) but also showed a number of Xi (mus)-specific sites (Fig. 4A) (25, 36). Allelic ratios ranged from equal to nearly complete Xa or Xi skewing (Fig. 4A). For the cohesins, 1490 SMC1a and 871 RAD21 binding sites were mapped onto the X chromosome in total, of which allelic calls could be made on ~50% of sites (Fig. 4, B and C). Although the Xa and Xi each showed significant cohesin binding, Xa-specific sites greatly outnumbered Xi-specific sites. For SMC1a, 717 sites were called on Xa, of which 589 were Xa-specific; 203 sites were called on Xi, of which 20 were Xi-specific. For RAD21, 476 sites were called on Xa, of which 336 were Xa-specific; 162 sites were called on Xi, of which 18 were Xi-specific. Biological replicates showed similar trends (fig. S8, A and B).

Fig. 4 Ablating Xist in cis restores cohesin binding on the Xi.

(A) Allele-specific ChIP-seq results: Violin plots of allelic skew for CTCF, RAD21, and SMC1a in wild-type (WT) and XiΔXist/XaWT (ΔXist) fibroblasts. Fraction of mus reads [mus/(mus + cas)] is plotted for every peak with ≥10 allelic reads. P values are determined by the Kolmogorov-Smirnov (KS) test. (B) Differences between SMC1a or RAD21 peaks on the XiWT versus XaWT. Black diagonal, 1:1 ratio. Plotted are read counts normalized for all SMC1a or RAD21 peaks. Allele-specific skewing is defined as ≥threefold skew toward either Xa (cas, blue dots) or Xi (mus, red dots). Biallelic peaks, gray dots. (C) Table of total, Xa-specific, and Xi-specific cohesin binding sites in WT versus ΔXist (XiΔXist/XaWT) cells. Significant SMC1a and RAD21 allelic peaks with ≥5 reads were analyzed. Allele-specific skewing is defined as ≥threefold skew toward Xa or Xi. Sites were considered “restored” if XiΔXist’s read counts were ≥50% of Xa’s. X-total, all X-linked binding sites; allelic peaks, sites with allelic information; Xa-total, all Xa sites; Xi-total, all sites; Xa-spec, Xa-specific; Xi-spec, Xi-specific; Xi-invariant, Xi-specific in both WT and XiΔXist/XaWT cells. There is a net gain of 96 sites on the Xi in the mutant, a number different from the number of restored sites (106). This difference arises because restored peaks are defined as sites that are heavily Xa-skewed in WT but acquire substantial Xi-binding after Xist deletion; thus, the number of restored sites is not simply the net change in Xi sites. (D) Partial restoration of SMC1a or RAD21 peaks on the XiΔXist to an Xa pattern. Plotted are peaks with read counts with ≥threefold skew toward XaWT (Xa-specific). x axis, normalized XaWT read counts; y axis, normalized XiΔXist read counts; black diagonal, 1:1 XiΔXist/XaWT ratio; red diagonal, 1:2 ratio. (E) Xi-specific SMC1a or RAD21 peaks remained on XiΔXist. Plotted are read counts for SMC1a or RAD21 peaks with ≥threefold skew toward XiWT (“Xi-specific”). (F) Comparison of fold changes for CTCF, RAD21, and SMC1 binding in XΔXist cells relative to WT cells. Shown are fold changes for Xi versus Xa. The Xi showed significant gains in RAD21 and SMC1a binding, but not in CTCF binding. Method: XWT and XΔXist ChIP samples were normalized by scaling to equal read counts. Fold changes for Xi were computed by dividing the normalized mus read count in XΔXist by the mus read count XWT; fold changes for Xa were computed by dividing the normalized cas read count in XΔXist by the cas read count XWT. To eliminate noise, peaks with <10 allelic reads were eliminated from analysis. P values were determined by a paired Wilcoxon signed rank test. (G) The representative examples of cohesin restoration on XiΔXist. Arrowheads, restored peaks. (H) Allelic-specific cohesin binding profiles of Xa, XiWT, and XiΔXist. Shown below restored sites are regions of Xi reactivation after shSMC1a and shRAD21 triple-drug treatments, as defined in Fig. 3.

Cohesin’s Xa preference was unexpected in light of Xist’s physical interaction with cohesins—an interaction suggesting that Xist might recruit cohesins to the Xi. We therefore conditionally ablated Xist from the Xi (XiΔXist) and repeated ChIP-seq analysis in the XiΔXist/XaWT fibroblasts (37). Surprisingly, XiΔXist acquired 106 SMC1a and 48 RAD21 sites in cis at positions that were previously Xa-specific (Fig. 4, C and D). Biological replicates trended similarly (figs. S8 and S9). In nearly all cases, acquired sites represented a restoration of Xa sites, rather than binding to random positions. By contrast, sites that were previously Xi-specific remained intact (Fig. 4, C and E, and fig. S8B), suggesting that they do not require Xist for their maintenance. The changes in cohesin peak densities were Xi-specific and significant (Fig. 4F). Cohesin restoration occurred throughout XiΔXist, resulting in domains of biallelic binding (Fig. 4G and figs. S10 to S12), and often favored regions that harbor genes that escape XCI (e.g., Bgn) (38, 39). There were also shifts in CTCF binding, more noticeable at a locus-specific level than at a chromosomal level (Fig. 4, A and G), suggesting that CTCF and cohesins do not necessarily track together on the Xi. The observed dynamics were X chromosome–specific and were not observed on autosomes (fig. S13). To determine whether there were restoration hotspots, we plotted restored SMC1a and RAD21 sites (Fig. 4H, purple) on XiΔXist and observed clustering within gene-rich regions. We conclude that Xist does not recruit cohesins to the Xi-specific sites. Instead, Xist actively repels cohesins in cis to prevent establishment of the Xa pattern.

Xist RNA directs an Xi-specific chromosome conformation

Cohesins and CTCF have been shown to facilitate formation of large chromosomal domains called TADs (topologically associated domains) (27, 28, 34, 35, 4042). The function of TADs is currently not understood, because TADs are largely invariant across development. However, X-linked domains are exceptions to this rule and are therefore compelling models to study function of topological structures (4346). By carrying out allele-specific Hi-C, we asked whether cohesin restoration altered the chromosomal architecture of XiΔXist. First, we observed that, in wild-type cells, our TADs called on autosomal contact maps at 40-kb resolution resembled published composite (nonallelic) maps (27) (Fig. 5A, bottom). Our X chromosome contact maps were also consistent, with TADs being less distinct due to a summation of Xa and Xi reads in the composite profiles (Fig. 5A, top). Using the 44% of reads with allelic information, our allelic analysis yielded high-quality contact maps at 100-kb resolution by combining replicates (Fig. 5B and fig. S14A) or at 200-kb resolution with a single replicate. In wild-type cells, we deduced 112 TADs at 40-kb resolution on the X chromosome using the method of Dixon et al. (27). We attempted TAD calling for the Xi on the 100-kb contact map but were unable to obtain obvious TADs, suggesting that the 112 TADs are present only on the Xa. The Xi instead appeared to be partitioned into two megadomains at the DXZ4 region (fig. S14A) (46). Thus, although the Xa is topologically organized into structured domains, the Xi is devoid of such megabase-scale structures across its full length.

Fig. 5 Ablating Xist results in Xi reversion to an Xa-like chromosome conformation.

(A) Chr13 and X chromosome contact maps showing triangular domains representative of TADs. Purple shades correspond to varying interaction frequencies (dark, greater interactions). TADs called from our composite (nonallelic) HiC data at 40-kb resolution (blue bars) are highly similar to those (gray bars) called previously (27). (B) Allele-specific HiC-seq analysis: Contact maps for three different X chromosome regions at 100-kb resolution comparing XiΔXist (red) to XiWT (orange), and XiΔXist (red) versus Xa (blue) of the mutant cell line. Our TAD calls are shown with reference sequence (RefSeq) genes. (C) Fraction of interaction frequency per TAD on the Xi (mus) chromosome. The positions of our TAD borders were rounded to the nearest 100 kb, and submatrices were generated from all pixels between the two end points of the TAD border for each TAD. We calculated the average interaction score for each TAD by summing the interaction scores for all pixels in the submatrix defined by a TAD and dividing by the total number of pixels in the TAD. We then averaged the normalized interaction scores across all bins in a TAD in the Xi (mus) and Xa (cas) contact maps and computed the fraction of averaged interaction scores from mus chromosomes. The X chromosome and a representative autosome, Chr5, are shown for the WT cell line and the XistΔXist/+ cell line. P values were determined by paired Wilcoxon signed rank test. (D) Violin plots showing that TADs overlapping restored peaks have larger increases in interaction scores relative to all other TADs. We calculated the fold change in average interaction scores on the Xi for all X-linked TADs and intersected the TADs with SMC1a sites (XiΔXist/XiWT). Thirty-two TADs occurred at restored cohesin sites; 80 TADs did not overlap restored cohesin sites. Violin plot shows distributions of fold change average interaction scores between XiWT and XiΔXist. P values were determined by Wilcoxon ranked sum test. (E) Restored TADs overlap regions with restored cohesins on across XiΔXist. Several data sets were used to call restored TADs, each producing similar results. Restored TADs were called in two separate replicates (Rep1 and Rep2) where the average interaction score was significantly higher on XiΔXist than on XiWT. We also called restored TADs based on merged Rep1 + Rep2 data sets. Finally, a consensus between Rep1 and Rep2 was derived. Method: We calculated the fold change in mus or cas for all TADs on the X chromosome and on a control, Chr5, then defined a threshold for significant changes based on either the autosomes or the Xa. We treated Chr5 as a null distribution (few changes expected on autosomes) and found the fraction of TADs that crossed the threshold for several thresholds. These fractions corresponded to a FDR for each given threshold. An FDR of 0.05 was used.

When Xist was ablated, however, TADs were restored in cis, and the Xi reverted to an Xa-like conformation (Fig. 5B and fig. S14B). In mutant cells, ~30 TADs were gained on XiΔXist in each biological replicate. Where TADs were restored, XiΔXist patterns (red) became nearly identical to those of the Xa (blue), with similar interaction frequencies. These XiΔXist regions now bore little resemblance to the Xi of wild-type cells (XiWT, orange). Overall, the difference in the average interaction scores between XiWT and XiΔXist was highly significant (Fig. 5C and fig. S15A). Intersecting TADs with SMC1a sites on XiΔXist revealed that 61 restored cohesin sites overlapped restored TADs (61 did not overlap). In general, restored cohesin sites occurred both within TADs and at TAD borders. TADs overlapping restored peaks had larger increases in interaction scores relative to all other TADs (Fig. 5D and fig. S15B), and we observed an excellent correlation between the restored cohesin sites and the restored TADs (Fig. 5E and fig. S15C), consistent with a role of cohesins in reestablishing TADs after Xist deletion. Taken together, these data uncover a role for RNA in establishing topological domains of mammalian chromosomes and demonstrate that Xist must actively and continually repulse cohesins from the Xi, even during the maintenance phase, to prevent formation of an Xa chromosomal architecture.

Discussion

Using iDRiP, we have identified a comprehensive Xist interactome and revealed multiple synergistic pathways to Xi repression (Fig. 6). With Xist physically contacting 80 to 250 proteins at any given time, the Xist ribonucleoprotein particle may be as large as the ribosome. Our study supports a model in which Xist RNA simultaneously acts as (i) scaffold for the recruitment of repressive complexes (such as PRC1, PRC2, ATRX, mH2A, and SmcHD1) to establish and maintain the inactive state; and as (ii) a repulsion mechanism to extrude architectural factors such as cohesins to avoid acquisition of a transcription-favorable chromatin conformation. Without Xist, cohesins return to their default Xa binding state. Repulsion could be based on eviction, with Xist releasing cohesins as it extrudes them, or on sequestration, with Xist sheltering cohesins to prevent Xi binding. Our study shows that the Xi harbors three types of cohesin sites: (i) Xi-specific sites that do not depend on Xist; (ii) biallelic sites that are also Xist-independent; and (iii) Xa-specific sites, many of which cannot be established on the Xi because of active repulsion by Xist. The type (i) and type (iii) sites likely explain the paradoxical observations that, on the one hand, depleting cohesins leads to Xi reactivation but, on the other, loss of Xist-mediated cohesin recruitment leads to an Xa-like chromosome conformation that is permissive for transcription. In essence, modulating the type (i) and type (iii) sites both have the effect of destabilizing the Xi, rendering the Xi more accessible to transcription. Disrupting type (i) sites by cohesin knockdown would change the repressive Xi structure, while ablating Xist would restore the type (iii) sites that promote an Xa-like conformation. Our study has focused on cohesins, but RNA-mediated repulsion may be an outcome for other Xist interactors and may be as prevalent an epigenetic mechanism as RNA-mediated recruitment (47).

Fig. 6 The Xi is suppressed by multiple synergistic mechanisms.

Xist RNA (red) suppresses the Xi by either recruiting repressive factors (e.g., PRC1 and PRC2) or expelling architectural factors (e.g., cohesins).

The robustness of Xi silencing is demonstrated by the observation that we destabilized the Xi only after pharmacologically targeting two or three distinct pathways. The fact that the triple-drug treatments varied with respect to reactivated loci and depth of derepression creates the possibility of treating X-linked disease in a locus-specific manner by administering unique drug combinations. Given the existence of many other disease-associated lncRNAs, the iDRiP technique could be applied systematically toward identifying new drug targets for other diseases and generally for elucidating mechanisms of epigenetic regulation by lncRNA.

Materials and methods

Identification of Direct RNA interacting Proteins (iDRiP)

Mouse embryonic fibroblasts (MEFs) were irradiated with UV light at 200 mJ energy (Stratagene 2400) after rinsing with phosphate-buffered saline (PBS). The pellets were resuspended in cytoskeleton buffer with 0.5% Triton X-100 (CSKT)-0.5% [10 mM piperazinediethanesulfonic acid, pH 6.8, 100 mM NaCl, 3 mM MgCl2, 0.3 M sucrose, 0.5% Triton X-100, 1 mM phenylmethylsulfonyl fluoride (PMSF)] for 10 min at 4°C followed by a spin. The pellets were again resuspended in nuclear isolation buffer (10 mM Tris pH 7.5, 10 mM KCl, 0.5% Nonidet-P 40, 1x protease inhibitors, 1 mM PMSF), and rotated at 4°C for 10 min (optional step). The pellets were collected after a spin, weighed, flash frozen in liquid nitrogen, and stored at –80°C until use.

Approximately equal amounts of female and male UV cross-linked pellets were thawed and resuspended for treatment with Turbo DNase I in the DNase I digestion buffer (50 mM Tris pH 7.5, 0.5% Nonidet-P 40, 0.1% sodium lauroyl sarcosine, 1x protease inhibitors, SuperaseIn). The tubes were rotated at 37°C for 45 min with intermittent mixing or vortexing. The nuclear lysates were further solubilized by adding 1% sodium lauroyl sarcosine, 0.3 M lithium chloride, 25 mM EDTA, and 25 mM EGTA to final concentrations. After brief vortexing, continue incubation at 37°C for 15 min. The lysates were mixed with biotinylated DNA probes (table S3) prebound to the streptavidin magnetic beads (MyOne streptavidin C1 Dyna beads, Invitrogen) and incubated at 55°C for 1 hour before overnight incubation at 37°C in the hybridization chamber. The beads were washed three times in wash buffer (10 mM Tris, pH 7.5, 0.3 M LiCl, 1% LDS, 0.5% Nonidet-P 40, 1x protease inhibitor) at room temperature followed by treatment with Turbo DNase I in DNase I digestion buffer with the addition of 0.3 M LiCl, protease inhibitors, and superaseIn at 37°C for 20 min. Then, beads were resuspended and washed two more times in the wash buffer. For MS analysis, elution was done in elution buffer (10 mM Tris, pH 7.5, 1 mM EDTA) at 70°C for 4 min followed by brief sonication in Covaris. For the quantification of pulldown efficiency, MEFs, without cross-linking, were used and elution was done at 95°C. The elute was used for RNA isolation and reverse transcription qPCR (RT-qPCR). When cross-linked MEFs were used, elute was subjected for proteinase-K treatment (50 mM Tris pH 7.5, 100 mM NaCl, 0.5% SDS, 10 μg proteinase K) for 1 hour at 55°C. RNA was isolated by Trizol and quantified with SYBR green qPCR. Input samples were used to make standard curve by 10-fold dilutions, to which the RNA pulldown efficiencies were compared and calculated. The efficiency of Xist pulldown was relatively lower after UV cross-linking, similar to (48, 49).

Quantitative proteomics

Proteins co-enriched with Xist from female or male cells were quantitatively analyzed either using a label-free approach based on spectral counting (21) or by multiplexed quantitative proteomics using tandem-mass tag (TMT) reagents (50, 51) on an Orbitrap Fusion mass spectrometer (Thermo Scientific). Disulfide bonds were reduced with ditheiothreitol (DTT) and free thiols alkylated with iodoacetamide as described previously (22). Proteins were then precipitated with tricholoracetic acid, resuspended in 50 mM HEPES (pH 8.5) and 1 M urea, and digested first with endoproteinase Lys-C (Wako) for 17 hours at room temperature and then with sequencing-grade trypsin (Promega) for 6 hours at 37°C. Peptides were desalted over Sep-Pak C18 solid-phase extraction (SPE) cartridges (Waters), and the peptide concentration was determined using a bicinchoninic acid (BCA) assay (Thermo Scientific). For the label-free analysis, peptides were then dried and resuspended in 5% formic acid (FA) and 5% acetonitrile (ACN), and 5 μg of peptides were analyzed by mass spectrometry as described below. For the multiplexed quantitative analysis, a maximum of 50 μg of peptides were labeled with one out of the available TMT-10plex reagents (Thermo Scientific) (50). To achieve this, peptides were dried and resuspended in 50 μl of 200 mM HEPES (pH 8.5) and 30% (ACN), and 10 μg of the TMT in reagent in 5 μl of anhydrous ACN was added to the solution, which was incubated at room temperature (RT) for 1 hour. The reaction was then quenched by adding 6 μl of 5% (w/v) hydroxylamine in 200 mM HEPES (pH 8.5) and incubated for 15 min at RT. The labeled peptide mixture was then subjected to a fractionation using basic pH reversed-phase liquid chromatography (bRPLC) on an Agilent 1260 Infinity high-performance liquid chromatography (HPLC) system equipped with an Agilent Extend-C18 column (4.6 x 250 mm; particle size, 5 μm), basically as described previously (52). Peptides were fractionated using a gradient from 22 to 35% ACN in 10 mM ammonium bicarbonate over 58 min at a flow rate of 0.5 ml/min. Fractions of 0.3 ml were collected into a 96-well plate to then be pooled into a total 12 fractions (A1 to A12, B1 to B12, etc.) that were dried and resuspended in 8 μl of 5% FA and 5% ACN, 3 of which were analyzed by microcapillary liquid chromatography tandem mass spectrometry on an Orbitrap Fusion mass spectrometer, and a recently introduced multistage (MS3) method was used to provide highly accurate quantification (53).

The mass spectrometer was equipped with an EASY-nLC 1000 integrated autosampler and HPLC pump system. Peptides were separated over a 100-μm inner diameter microcapillary column in-house packed with first 0.5 cm of Magic C4 resin (5 μm, 100 Å, Michrom Bioresources), then with 0.5 cm of Maccel C18 resin (3 μm, 200 Å, Nest Group) and 29 cm of GP-C18 resin (1.8 μm, 120 Å, Sepax Technologies). Peptides were eluted applying a gradient of 8 to 27% ACN in 0.125% formic acid over 60 min (label-free) and 165 min (TMT) at a flow rate of 300 nl/min. For label-free analyses, we applied a tandem-MS method where a full-MS spectrum [MS1; mass-to-charge ratio (m/z) 375 to 1500; resolution 6 × 104; automated gain control (AGC) target, 5 × 105; maximum injection time, 100 ms] was acquired using the Orbitrap, after which the most abundant peptide ions where selected for linear ion trap CID-MS2 in an automated fashion. MS2 scans were done in the linear ion trap using the following settings: quadrupole isolation at an isolation width of 0.5 Th; fragmentation method, CID; AGC target, 1 × 104; maximum injection time, 35 ms; normalized collision energy, 30%). The number of acquired MS2 spectra was defined by setting the maximum time of one experimental cycle of MS1 and MS2 spectra to 3 s (top speed). To identify and quantify the TMT-labeled peptides, we applied a synchronous precursor selection MS3 method (22, 53, 54) in a data-dependent mode. The scan sequence was started with the acquisition of a full MS or MS1 spectrum acquired in the Orbitrap (m/z range, 500 to 1200; other parameters were set as described above), and the most intense peptide ions detected in the full MS spectrum were then subjected to MS2 and MS3 analysis, while the acquisition time was optimized in an automated fashion (top speed, 5 s). MS2 scans were performed as described above. Using synchronous precursor selection, the 10 most abundant fragment ions were selected for the MS3 experiment after each MS2 scan. The fragment ions were further fragmented using the higher-energy collisional dissociation (HCD) fragmentation (normalized collision energy, 50%), and the MS3 spectrum was acquired in the Orbitrap (resolution, 60,000; AGC target, 5 × 104; maximum injection time, 250 ms).

Data analysis was performed on an in-house generated SEQUEST-based (55) software platform. RAW files were converted into the mzXML format using a modified version of ReAdW.exe. MS2 spectra were searched against a protein sequence database containing all protein sequences in the mouse UniProt database (downloaded 4 February 2014), as well as that of known contaminants such as porcine trypsin. This target component of the database was followed by a decoy component containing the same protein sequences but in flipped (or reversed) order (56). MS2 spectra were matched against peptide sequences, with both termini consistent with trypsin specificity and allowing two missed trypsin cleavages. The precursor ion m/z tolerance was set to 50 parts per million; TMT tags on the N terminus and on lysine residues (229.162932 Da, only for TMT analyses), as well as carbamidomethylation (57.021464 Da) on cysteine residues were set as static modification; and oxidation (15.994915 Da) of methionines was set as variable modification. Using the target-decoy database search strategy (56), a spectra assignment false discovery rate (FDR) of less than 1% was achieved through using linear discriminant analysis, with a single discriminant score calculated from the following SEQUEST search score and peptide sequence properties: mass deviation, XCorr, dCn, number of missed trypsin cleavages, and peptide length (57). The probability of a peptide assignment to be correct was calculated using a posterior error histogram, and the probabilities for all peptides assigned to a protein were combined to filter the data set for a protein FDR of less than 1%. Peptides with sequences that were contained in more than one protein sequence from the UniProt database were assigned to the protein with the most matching peptides (57).

For a quantitative estimation of protein concentration using spectral counts, we counted the number of MS2 spectra assigned to a given protein (table S1). TMT reporter ion intensities were extracted as that of the most intense ion within a 0.03 Th window around the predicted reporter ion intensities in the collected MS3 spectra. Only MS3 with an average signal-to-noise value of larger than 28 per reporter ion, as well as with an isolation specificity (22) of larger than 0.75, were considered for quantification. Reporter ions from all peptides assigned to a protein were summed to define the protein intensity. A two-step normalization of the protein TMT intensities was performed by first normalizing the protein intensities over all acquired TMT channels for each protein based to the median average protein intensity calculated for all proteins. To correct for slight mixing errors of the peptide mixture from each sample, a median of the normalized intensities was calculated from all protein intensities in each TMT channel, and the protein intensities were normalized to the median value of these median intensities.

UV RIP

The protocol followed is similar to the one described in (18). Briefly, MEFs were cross-linked with UV light at 200 mJ and collected by scraping in PBS. Cell pellets were resuspended in CSKT-0.5% for 10 min at 4°C followed by a spin. The nuclei were resuspended in the UV RIP buffer [PBS buffer containing 300 mM NaCl (total), 0.5% Nonidet-P 40, 0.5% sodium deoxycholate, 200 U Protector RNase Inhibitor and 1x protease inhibitors] with Turbo DNase I 30 U/IP for 30 min at 37°C. Supernatants were collected after a spin and incubated with 5 μg specific antibodies prebound to 40 μl protein-G magnetic beads (Invitrogen) at 4°C overnight. Beads were washed three times with cold UV RIP buffer. The beads were resuspended in 200 μl Turbo DNase I buffer with 20 U Turbo DNase, SuperaseIN, 1x protease inhibitors for 30 min at 37°C. The beads were resuspended and washed three more times in the UV RIP washing buffer containing 10 mM EDTA. The final three washes were given after threefold dilution of UV RIP washing buffer. The beads were resuspended in 200 μl proteinase-K buffer with 10 μg proteinase-K and incubated at 55°C for 1 hour. RNA was isolated by Trizol, and pulldown efficiencies were calculated by SYBR qPCR using input for the standard curve.

Generation of Xi-TgGFP clonal fibroblasts

Xi-TgGFP (68-5-11) tail-tip fibroblasts (TTF) were initially derived from a single female pup, a daughter of a cross between a M. castaneus male and a M. musculus female, homozygous for an X-linked GFP transgene driven by a strong, ubiquitous promoter (58). The fibroblasts were immortalized by SV40 transformation, and clonal lines were derived from individual GFP-negative cells selected by fluorescence-activated cell sorting. In our experience, occasional clones with undetectable GFP expression nevertheless have the transgene located on the active X chromosome. Thus, we confirmed the GFP transgene location on the inactive X for the particular clone used here, 68-5-11 (see fig. S2).

Generation of stable KD of Xi-TgGFP TTF and 16.7 ES cells

The protocol is as described in http://www.broadinstitute.org/rnai/public/resources/protocols

A cocktail of three shRNA viruses was used for infections (table S2) followed with puromycin selection. In all the experiments, nonclonal knockdown cells were used.

Assay for the reactivation of Xi-TgGFP

About 125,000 to 150,000 Xi-TgGFP (68-5-11) cells were plated along with control (shControl) cells treated with dimethyl sulfoxide or stable KD cells treated with 0.3 μM azacytidine and 0.3 μM Etoposide for 3 days in six well plates. RNA was isolated by Trizol twice, with an intermittent TurboDNase treatment after the first isolation for 30 min at 37°C. One μg RNA was used for each of the RT+ and RT– reactions (Superscript III, Invitrogen) followed by the SYBR green qPCR using the primers listed in table S3, with annealing temperature of 60°C for 45 cycles. The relative efficiency of Xi-TgGFP reactivations was calculated by comparing to U1 snRNA as the internal control.

ImmunoFISH

Cells were grown on coverslips, rinsed in PBS, pre-extracted in 0.5% CSKT on ice, and washed once in CSK, followed by fixation with 4% paraformaldehyde in PBS at room temperature. After blocking in 1% bovine serum albumin in PBS for 20 min supplemented with 10 mM ribonucleoside-vanadyl complex (VRC) (New England Biolabs) and RNase inhibitor (Roche), incubation was carried out with primary antibodies (table S3) at room temperature for 1 hour. Cells were washed three times in PBST-0.02% Tween-20. After incubating with secondary antibody at room temperature for 30 min, cells were washed three times by PBS/0.02% Tween-20. Cells were fixed again in 4% paraformaldehyde and dehydrated in ethanol series. RNA FISH was performed using a pool of Cy3B- or Alexa 568–labeled Xist oligonucleotides for 4 to 6 hours at 42°C in a humid chamber. Cells were washed three times in 2X SSC, and nuclei were counter-stained by Hoechst 33342. Cells were observed under Nikon 90i microscope equipped with 60X/1.4 N.A. objective lens, Orca ER charge-coupled device camera (Hamamatsu), and Volocity software (Perkin Elmer). Xist RNA FISH probes, a set of total 37 oligonucleotides with 5′ amine modification (IDT), were labeled with NHS-Cy3B (GE Healthcare) overnight at room temperature followed by ethanol precipitation. In the case of confirmation of Xi-TgGFP cells, probes were made by nick translation of a GFP PCR product with Cy3-dUTP and of a plasmid containing the first exon of the mouse Xist gene, with FITC-dUTP.

Allelic ChIP-seq

Allele-specific ChIP-seq was performed according to the method of Kung et al. (25), in two biological replicates. To increase available read depth, we pooled together two technical replicates for XiΔxist/XaWT Rad21 replicate 1 sequenced on a 2 x 50 bp HiSeq. 2500 rapid run, and we also pooled two technical replicates of wild-type Rad21 replicate 1, one sequenced on a HiSeq. 2 x 50 bp run and one on a MiSeq. 2 x 50 bp run. All other libraries were sequenced on using 2 x 50 bp HiSeq. 2500 rapid runs. To visualize ChIP binding signal, we generated fragments per million (fpm)-normalized bigWig files from the raw ChIP read counts for all reads (comp), mus-specific (mus), and cas-specific reads separately. For Smc1a, CTCF, and Rad21, peaks were called using macs2 with default settings. To generate consensus peak sets for all three epitopes, peaks for the two wild-type and XiΔxist/XaWT replicates were pooled, and peaks present in at least two experiments were used as the common peak set. To make comparisons between allelic read counts between different experiments, we defined a scaling factor as the ratio of the total read numbers for the two experiments and multiplied the allelic reads for each peak in the larger sample by the scaling factor. We plotted the number of reads on Xi vs Xa in wild-type for all peaks on the X-chromosome to determine whether there is a general bias toward binding to the Xa or the Xi. To evaluate allelic skew on an autosome, we generated plots of mus read counts versus cas read counts for all peaks on chromosome 5 from 1 to 140,000,000. We used this particular region of chromosome 5 because XiΔxist/XaWT is not fully hybrid, and this is a large region of an autosome that is fully hybrid based on even numbers of read counts from input and from our Hi-Cs over this region in XiΔxist/XaWT (data not shown). To identify peaks that are highly Xa-skewed in wild-type but bind substantially to the Xi in XiΔxist/XaWT (restored peaks), for Xa-skewed peaks in wild-type, we plotted normalized read counts on Xi in XiΔxist/XaWT versus read counts on Xa in wild-type. We defined restored peaks as peaks that are (i) more than 3X Xa-skewed in wild-type, (ii) have at least five allelic reads in wild-type, and (iii) exhibit normalized read counts on Xi in XiΔxist/XaWT that are at least half the level of Xa in wild-type. This threshold ensures that all restored peaks have at least a 2X increase in binding to the Xi in XiΔxist/XaWT relative to wild-type. We identified restored peaks using these criteria in both replicates of Smc1a and Rad21 ChIP separately, and to merge these calls into a consensus set for each epitope, we took all peaks that met criteria for restoration in at least one replicate and had at least 50% wild-type Xa read counts on Xi in XiΔxist/XaWT in both replicates.

Allele-specific RNA-seq

Xi-TgGFP TTFs (68-5-11) with the stable knockdown of candidates were treated with 5′-azacytidine and etoposide at 0.3 μM each for 3 days. Strand-specific RNA-seq, the library preparation, deep sequencing, and data analysis was followed as described in (59). Two biological replicates of each drug treatment were produced. All libraries were sequenced with Illumina HisEq. 2000 or 2500 using 50 cycles to obtain paired-end reads. To determine the allelic origin of each sequencing read from the hybrid cells, reads were first depleted of adaptors dimers and PCR duplicates, followed by the alignment to custom mus/129 and cas genomes to separate mus and cas reads. After removal of PCR duplicates, ~90% of reads were mappable. Discordant pairs and multimapped reads were discarded. Reads were then mapped back to reference mm9 genome using Tophat v2.0.10 (-g 1–no-coverage-search–read-edit-dist 3–read-mismatches 3–read-gap-length 3–b2-very-sensitive–mate-inner-dist 50–mate-std-dev 50–library-type fr-firststrand), as previously described (25, 32, 59). After alignment, gene expression levels within each library were quantified using Homer v4.7 (rna mm9 -count genes -strand + -noadj -condenseGenes) (59), and the normalized differential expression analyses across samples were performed by using EdgeR (60).

Hi-C library preparation and analysis

Hi-C libraries were generated according to the protocol in Lieberman-Aiden et al. (61). Two biological replicate libraries were prepared for wild-type and XiΔxist/XaWT fibroblasts each. We obtained 150 to 220 million 2 x 50 bp paired-end reads per library. The individual ends of the read-pairs were aligned to the mus and cas reference genomes separately using novoalign with default parameters for single-end alignments, and the quality score of the alignment was used to determine whether each end could be assigned to either the mus or the cas haplotype (62). The single-end alignments were merged into a Hi-C summary file using custom scripts. Reads were filtered for self-ligation events and short fragments (less than 1.5X the estimated insert length) likely to be random shears using Homer (59, 63). Hi-C contact maps were generated using Homer. “Comp” maps were made from all reads. “Xi” and “Xa” reads were from reads where at least one read-end could be assigned to either the mus or cas haplotype, respectively. A small fraction of reads (~5% of all allelic reads) aligned such that one end aligned to mus, the other to cas. These “discordant” reads were excluded from further analysis, because they are likely to be noise arising due to random ligation events and/or improper single-nucleotide polymorphism (SNP) annotation (46, 64). All contact maps were normalized using the matrix balancing algorithm of Knight and Ruiz (65), similar to iterative correction (46, 66), using the MATLAB script provided at the end of their paper. We were able to generate robust contact maps using the comp reads in one replicate at 40-kb resolution, but because only ~44% of reads align allele-specifically, we were only able to generate contact maps for the cas and mus haplotypes at 200 kb. To increase our resolution, we pooled together both biological replicates and analyzed the comp contact map at 40-kb resolution and the mus and cas contact maps at 100 kb. We called TADs at 40 kb on the X chromosome Chr5, and Chr13 using the method of Dixon et al. (27). Specifically, we processed the normalized comp 40-kb contact maps separately into a vector of directionality indices using DI_from_matrix.pl with a bin size of 40,000 and a window size of 200,000. We used this vector of directionality indices as input for the HMM_calls.m script, and after HMM_generation, we processed the HMM and generated TAD calls by passing the HMM output to file_ends_cleaner.pl, converter_7col.pl, hmm_probablity_correcter.pl, hmm-state_caller.pl, and, finally, hmm-state_domains.pl. We used parameters of min = 2, prob = 0.99, binsize = 40,000 as input to the HMM probability correction script.

To create a general metric describing interaction frequencies within TADs at resolution available in the allele-specific interaction maps, for each TAD, on the X chromosome and Chr5 we averaged the normalized interaction scores for all bins within each TAD, excluding the main diagonal. To make comparisons between interaction frequency over TADs between the cas (Xa) and mus (Xi) haplotypes at the resolution available with our current sequencing depth, we defend the “fraction mus” as the average interaction score for a TAD in the mus contact map divided by the sum of the average interaction scores in the mus and cas contact maps.

To discover TADs that show significantly increased interaction frequency in XiΔxist/XaWT, we generated a null distribution of changes in average normalized interaction scores for all TADs on chromosome 5, 1 to 140 Mb using the cas and mus contact maps. We reasoned that there would be few changes in interaction frequency on an autosome between the mus or cas contact maps for wild-type and XiΔxist/XaWT; thus, the distribution of fold changes in interaction score on an autosome constitutes a null distribution. Using this distribution of fold changes allowed us to calculate a threshold fold change for an empirical FDR of 0.05, and all TADs that had a greater increase in average normalized interaction score on Xi between wild-type and XiΔxist/XaWT were considered restored TADs. We performed this analysis of restored TADs separately in each biological replicate using the 200-kb contact maps to generate interaction scores over TADs and using the combined data at 100-kb resolution.

Supplementary Materials

www.sciencemag.org/content/349/6245/aab2276/suppl/DC1

Supplementary Text

Figs. S1 to S15

Tables S1 to S3

References (67, 68)

References and Notes

  1. ACKNOWLEDGMENTS: We thank S. Gygi for access to computational resources for proteomic analysis and all members of the Lee and Haas laboratories for valuable discussions. This work was supported by grants from NIH (R01-DA-38695 and R03-MH97478), the Rett Syndrome Research Trust, and the International Rett Syndrome Foundation to J.E.T.L.; a National Science Foundation predoctoral award to J.E.F.; and the MGH Fund for Medical Discovery to H.S. J.T.L. is an Investigator of the HHMI. J.T.L., A.M., J.E.F., C.W., and the Massachusetts General Hospital have filed patent applications (USSN 62/144,219 and 62/168,528) that relate to leveraging the Xist interactome to reactivate the Xi. The GEO accession code for data in the paper is GSE67516. RNA-seq, ChIP-seq, and HiC-seq data are deposited in GEO.
View Abstract

Navigate This Article