Report

Three-dimensional genome structures of single diploid human cells

See allHide authors and affiliations

Science  31 Aug 2018:
Vol. 361, Issue 6405, pp. 924-928
DOI: 10.1126/science.aat5641

The structure of the genome

Beyond the sequence of the genome, its three-dimensional structure is important in regulating gene expression. To understand cell-to-cell variation, the structure needs to be understood at a single-cell level. Chromatin conformation capture methods have allowed characterization of genome structure in haploid cells. Now, Tan et al. report a method called Dip-C that allows them to reconstruct the genome structures of single diploid human cells. Their examination of different cell types highlights the tissue dependence of three-dimensional genome structures.

Science, this issue p. 924

Abstract

Three-dimensional genome structures play a key role in gene regulation and cell functions. Characterization of genome structures necessitates single-cell measurements. This has been achieved for haploid cells but has remained a challenge for diploid cells. We developed a single-cell chromatin conformation capture method, termed Dip-C, that combines a transposon-based whole-genome amplification method to detect many chromatin contacts, called META (multiplex end-tagging amplification), and an algorithm to impute the two chromosome haplotypes linked by each contact. We reconstructed the genome structures of single diploid human cells from a lymphoblastoid cell line and from primary blood cells with high spatial resolution, locating specific single-nucleotide and copy number variations in the nucleus. The two alleles of imprinted loci and the two X chromosomes were structurally different. Cells of different types displayed statistically distinct genome structures. Such structural cell typing is crucial for understanding cell functions.

The nucleus of a human diploid cell contains 46 chromosomes, 23 maternal and 23 paternal, together carrying 6 Gb of genomic DNA. The three-dimensional (3D) genome structure is thought to be crucial for the regulation of gene expression and other cellular functions (1). For example, the nuclei of sensory neurons assume unusual architectures in the mouse visual (2) and olfactory systems (3). Chromatin conformation capture assays, such as 3C (4) and Hi-C (5), allow for studies of 3D genome structures in bulk samples through proximity ligation of DNA (6). However, the difference between cells can only be observed by single-cell measurements. Single-cell chromatin conformation capture methods avoid ensemble averaging (712) and have yielded 3D genome structures of haploid mouse cells (10, 11). However, characterizing the 3D genome structures of diploid mammalian cells remains challenging (13). Here, we used an improved chromatin conformation capture method and phased (haplotype-resolved) single-nucleotide polymorphisms (SNPs) to distinguish between the two haplotypes of each chromosome. This allowed us to examine the cell type dependence of 3D genome structures of diploid cells.

Obtaining high-resolution 3D genome structures of single diploid cells requires resolving a large number of chromatin “contacts”—pairs of genomic loci that are joined by proximity ligation. We developed a chromatin conformation capture method, termed Dip-C (Fig. 1A), that can detect more contacts than existing methods with minimal false positives. In particular, we omitted biotin pulldown (8, 9) and conducted high-coverage whole-genome amplification with multiplex end-tagging amplification (META), which introduced few artifactual chimeras (14, 15). We detected a median of 1.04 million contacts per single cell (n = 17, minimum = 0.71 million, maximum = 1.48 million) from GM12878, a female human lymphoblastoid cell line, and a median of 0.84 million contacts (n = 18, minimum = 0.67 million, maximum = 1.08 million) from peripheral blood mononuclear cells (PBMCs) of a male human donor (16). This exceeds the medians achieved with existing methods by a factor of ~5 (fig. S4 and table S1). Most cells were in the G1 or G0 phase of the cell cycle. In addition, we simultaneously detected copy number variations (CNVs), losses of heterozygosity (LOHs), DNA replication, and V(D)J recombination with a 10-kb bin size (figs. S2 and S3).

Fig. 1 Single-cell chromatin conformation capture and haplotype imputation by Dip-C.

(A) Schematics of the chromatin conformation capture protocol. The 3D information of chromatin structure was encoded in the linear genome through proximity ligation of chromatin fragments, as in 3C (4) and Hi-C (5, 19). The ligation product was then amplified by META (15) and sequenced. Colors represent genomic coordinates. Note that ligation products may be linear (illustrated here) or circular (not shown). (B) Imputation of the two chromosome haplotypes linked by each chromatin “contact” (red dots) in a representative single cell.

Another challenge in reconstructing diploid genomes is to determine which haplotypes are involved in each chromatin contact (1720) (table S1). To assign haplotypes, we developed an imputation algorithm (Fig. 1B). We reasoned that unknown haplotypes can be imputed from “neighboring” (in terms of genomic distances) contacts by assuming that the two homologs would typically contact different partners. Using a statistical property of interchromosomal and long-range intrachromosomal contacts (15), we defined a contact neighborhood as a superellipse with an exponent of 0.5 and a radius of 10 Mb, where haplotypes of nearby contacts were weighted in imputing the haplotypes of each contact (fig. S7). In the Dip-C algorithm, after removing 3C/Hi-C artifacts [contacts with few neighbors (11)] and initial imputation, haplotypes can be optionally refined through a series of draft 3D models (15) (fig. S5). Imputation accuracy was estimated to be ~96% for each haplotype by cross-validation (15) (table S1). Regions harboring CNVs or LOHs, as well as an apparently damaged GM12878 cell, were excluded from reconstruction (table S1).

We reconstructed the 3D diploid human genomes at 20-kb resolution. Reconstruction was successful without supervision for 94% (15 of 16) of the GM12878 cells and 67% (12 of 18) of the PBMCs, and after removal of small problematic regions for 6% (1 of 16) of the GM12878 cells and 22% (4 of 18) of the PBMCs (table S1 and fig. S8) (15). Note that because chromatin conformation capture—the process of converting 3D coordinates to chromatin contacts—is intrinsically lossy and noisy, our 3D structures harbored additional uncertainties including perturbations of chromatin structures during the experiments, inaccuracies in the energy function used by 3D modeling, and nuclear volumes inaccessible to DNA sequencing (e.g., centromeres, nucleoli, and nuclear speckles). These uncertainties are common to all 3C/Hi-C studies and are difficult to estimate, and imputation may be less successful when two homologs are nearby or adopt similar shapes. Therefore, other problematic regions might persist even after manual removal.

Figure 2A shows a representative cell. Each particle, displayed as a colored point, represents 20 kb of chromatin, or a radius of ~100 nm. A lower bound for reconstruction uncertainty was estimated from the median deviation of ~0.4 particle radii (~40 nm) across all 20-kb particles between three replicates (fig. S9 and table S1). Well-known nuclear morphologies were observed in an M/G1-phase GM12878 cell, where chromosomes retained their characteristic V shapes after recent mitosis, and in several PBMCs, where multiple nuclear lobes were reminiscent of the partially segmented nuclei of low-density neutrophils and other blood cell types (Fig. 2B).

Fig. 2 3D genome structures of single diploid human cells.

(A) 3D genome structure of a representative GM12878 cell. Each particle represents 20 kb of chromatin, or a radius of ~100 nm. (B) Peculiar nuclear morphology in a cell that recently exited mitosis (top) and in a cell with multiple nuclear lobes (bottom). (C) Serial cross sections of a single cell showing compartmentalization of euchromatin (green) and heterochromatin (magenta), visualized by CpG frequency as a proxy (21). (D) Radial preferences across the human genome, as measured by average distances to the nuclear center of mass. Our results (black dots, smoothed by 1-Mb windows) agree well with published DNA FISH data (gray lines) on whole chromosomes (22) (shifted and rescaled) and provide fine-scale information. Lower and upper axis limits were 20 and 50 particle radii, respectively, for the black dots. GM12878 cell 4 (extensive chromosomal aberrations) and cell 16 (M/G1 phase) were excluded. (E) Example radial preferences of two chromosomes. The gene-rich chromosome 19 preferred the nuclear interior (left), whereas the gene-poor chromosome 18 almost always resided on the nuclear surface (right). (F) Stochastic fractal organization of chromatin was quantified by a matrix of radii of gyration of all possible subchains of each chromosome (heat maps). We identified a hierarchy of single-cell domains across genomic scales (black trees). A subtree was simplified as a black triangle if either of its two subtrees was below a certain size (from left to right: 10 Mb, 2 Mb, 500 kb, 100 kb). In each panel, the region from the previous panel is shown in transparent gray. In the rightmost panel, thick sticks (top) and circles (bottom) highlight the formation of a known CTCF loop (19). Spheres with arrows (top) indicate the positions and orientations of the two converging CTCF sites. Genomic coordinates are for the human genome assembly hg19.

We also used published data on mouse embryonic stem cells (mESCs) (10) to reconstruct 3D diploid mouse genomes despite fewer contacts (~0.3 million per cell, or ~0.2 million under our definition) (table S1), because the mouse line harbored more SNPs than humans (15).

Similar to previously described haploid mouse genomes (10, 11), the diploid human genomes exhibited chromosome territories (Fig. 2A) and chromatin compartments [visualized by CpG frequency as a proxy (21)], with the heterochromatic compartment B (5) concentrated at the nuclear periphery and around foci in the nuclear center (Fig. 2C). Spatial clustering of DNA sequences with similar CpG frequencies suggests a correlation between primary sequence features and 3D genome folding (1).

Our 3D structures revealed different radial preferences across the human genome (black dots in Fig. 2D). Our results agree well with whole-chromosome painting data by DNA fluorescence in situ hybridization (FISH) (22) (gray lines in Fig. 2D). Both methods show that the gene-rich chromosome 19 prefers the nuclear interior, while the gene-poor chromosome 18 prefers the nuclear periphery (Fig. 2E). Within each chromosome, different segments could have distinctly different radial preferences, which were correlated with chromatin compartments (fig. S11A). For example, the CpG-rich euchromatic end (left) of chromosome 1 was heavily biased toward the nuclear center, whereas some other regions on the same chromosome were biased toward the nuclear periphery (Fig. 2D). Such fine-scale information cannot be obtained from whole-chromosome painting (22, 23) experiments.

Our Dip-C results provide a holistic view of the stochastic, fractal organization of chromatin across different genomic scales. Bulk Hi-C suggests that chromatin forms a “fractal globule” with compartments (5, 19) and domains such as topologically associating domains (TADs) (24) and CCCTC-binding factor (CTCF) loop domains (19). However, such fractal organization has not been visualized in single human cells in a genome-wide manner. We observed spatial clustering (globules) and segregation (insulation) of consecutive chromatin particles along each chromosome (Fig. 2F, upper panels). Such organization could be quantified by a matrix of radii of gyration of all possible subchains in each chromosome (Fig. 2F, lower panels). Single-cell domains could then be identified as squares that had relatively small radii [partly similar to (8)] (15). We found single-cell domains across all genomic scales and therefore identified them through hierarchical merging, yielding a tree of domains [partly similar to (25, 26) in bulk Hi-C] (Fig. 2F). On the smallest scale, some domains coincided with CTCF loop domains from bulk Hi-C (19) (rightmost panels in Fig. 2F). Single-cell domains were highly heterogeneous between cells, frequently breaking and merging bulk domains (fig. S19), consistent with a recent study on tetraploid mouse cells (8).

Traditional methods such as bulk Hi-C and two-color DNA FISH are pairwise measurements and thus cannot study multichromosome intermingling. In our 3D models, we quantified multichromosome intermingling by the diversity of chromosomes (Shannon index) near each 20-kb particle (fig. S20A), revealing genomic regions that frequently contacted multiple chromosomes (fig. S20B). These regions were similar between the human cell types despite their different average extents of intermingling (fig. S10), and they were mostly euchromatic (CpG-rich) (fig. S11B) for two reasons: (i) Euchromatin more frequently resided on the surface of chromosomes than did heterochromatin [consistent with (7)] (fig. S11D), and (ii) even when heterochromatin resided on the surface, it tended to face the nuclear periphery (11) (fig. S11A) and thus had no partners to intermingle with. The intermingling regions partially overlapped with “hubs” identified by a recent report (27).

We examined the structural relationship between the maternal and paternal alleles, which can only be studied in diploid cells. Our data captured the structural difference between the two alleles caused by genomic imprinting. At imprinted loci, the two alleles can differ drastically in transcriptional activity (28). Near the maternally transcribed H19 gene and the paternally transcribed IGF2 gene, bulk Hi-C identified different contact profiles and different use of CTCF loops between the two homologs (19). We directly visualized this ~0.6-Mb region in single cells (Fig. 3A). Despite cell-to-cell heterogeneity, the maternal allele more frequently separated IGF2 from both H19 and the nearby HIDAD site and disrupted the IGF2-HIDAD CTCF loop, whereas the paternal allele more frequently stayed fully intermingled.

Fig. 3 Distinct 3D structures of the maternal and paternal alleles.

(A) Structural difference between the two alleles of the imprinted H19/IGF2 locus. Despite cell-to-cell heterogeneity, the maternal allele more frequently separated IGF2 from both H19 and the nearby HIDAD site and disrupted the IGF2-HIDAD CTCF loop (white and red circles). Spheres highlight three CTCF sites from bulk Hi-C. Heat maps show the root-mean-square average pairwise distances between all 20-kb particles. Haplotype-resolved bulk Hi-C (black heat map with 25-kb bins) is adapted from figure 7C of (19). (B) Active (red) and inactive (blue) X chromosomes prefer extended and compact morphologies, respectively, as shown by cross sections of two representative cells. (C) Individual active and inactive X chromosomes can be distinguished by PCA of single-cell chromatin compartments, defined for each 20-kb particle as the average CpG frequency of nearby (within 3 particle radii) particles. (D) The inactive X chromosome tends to form the previously reported “superloops,” 27 very-long-range (5 to 74 Mb) chromatin loops identified by bulk Hi-C (19, 20, 29). Superloops are sorted by size. (E) Haplotype-resolved contact maps (red dots) and 3D structures of the two X chromosomes in an example cell. Black circles denote all superloops (19). White spheres denote four example superloop anchors (DXZ4, x75, ICCE, and FIRRE). GM12878 cells 4 and 16 are excluded from (C) and (D).

X chromosome inactivation (XCI) presents a striking example of the difference between two homologs (28). As expected, we found in the female GM12878 cell line that the active X chromosome [the maternal allele based on RNA expression (15)] tended to exhibit an extended morphology, and the inactive X a compact one (Fig. 3B), although in some cells this morphological difference was not obvious. More consistently, the two X chromosomes in each cell were characterized by their distinct patterns of chromatin compartments. The active X featured clear compartmentalization of euchromatin and heterochromatin, resembling that of the male X (in PBMCs); in contrast, compartments along the inactive X were more uniform (fig. S12E). Individual X chromosomes could be clearly separated into active and inactive clusters by principal components analysis (PCA) of single-cell compartments (Fig. 3C). Our conclusion held if single-cell compartments were defined on the basis of contacts [partly similar to (10)] rather than 3D structures (fig. S15, A and B). We also visualized the simultaneous formation of multiple “superloops” (19, 20, 29) in the inactive X chromosome (Fig. 3, D and E). Averaged contact matrices of the inactive and active X chromosomes agreed well with bulk Hi-C (19) (fig. S15, C and D).

In contrast to XCI, it is unknown whether single-cell compartments of two autosomal alleles may vary in a coordinated manner. By decomposing the variability of single-cell compartments into between-cell and within-cell differences (fig. S12A), we found that autosomal alleles fluctuate (with respect to their median compartments) almost independently from each other, exhibiting on average near-zero Spearman correlation (fig. S12D). Our conclusion held if compartments were defined on the basis of contacts (fig. S16).

We can pinpoint genomic changes, such as SNPs and CNVs, to their precise spatial locations in the cell nucleus. The donor of the GM12878 cell line carried a heterozygous G-to-A mutation (rs4244285) in the cytochrome P450 gene CYP2C19, leading to a truncated, nonfunctional protein variant CYP2C19*2 and affecting metabolism of hormones and drugs (30). Figure S18A shows the 3D localization of this drug-response SNP on the paternally inherited chromosome 10 of a GM12878 cell. In addition to inherited mutations, single cells also harbor somatic changes. In lymphocytes, somatic V(D)J recombination generates diversity of immunoglobulins and T cell receptors by DNA deletions and inversions. Figure S18B shows the 3D localization of two V(D)J recombinations at a T cell receptor locus, leading to two different DNA deletions on the two alleles of chromosome 14 of a T lymphocyte. The capability to spatially localize genomic changes is important for studying cancers and inherited diseases, where mutations can have severe consequences and may disrupt the chromatin structure of nearby regions.

We also examined the cell type dependence of 3D genome structures. Similar to haploid mESCs (11), chromosomes in diploid mESCs preferred the Rabl configuration (centromeres pointing toward one side of the nucleus and telomeres toward the other), albeit to a different extent in each cell (Fig. 4A). In contrast, we found the Rabl configuration to be weak in most GM12878 cells and PBMCs. Most PBMCs pointed their centromeres toward the nuclear periphery and telomeres toward the nuclear center, consistent with previously reported arrangements in human lymphocytes (31). By contrast, the M/G1-phase GM12878 cell pointed centromeres toward the outer rim of a characteristic mitotic rosette.

Fig. 4 Cell type–specific chromatin structures.

(A) Quantification of the organization of centromeres and telomeres. The mESCs exhibit stronger Rabl configuration (horizontal axis; the length of summed centromere-to-telomere vectors normalized by the total particle number, which differs between human and mouse; axis limit = 0.005 particle radii), whereas the PBMCs tend to point centromeres outward relative to telomeres (vertical axis; the summed centromere-to-telomere difference in distance from the nuclear center of mass normalized by the total particle number; axis limit = 0.007 particle radii). Each marker represents a single cell and was inferred by V(D)J recombination in PBMCs (table S1 and fig. S3B). (B) Quantification of chromosome intermingling (vertical axis; the average fraction of nearby particles that are not from the same chromosome) and chromatin compartmentalization (horizontal axis; Spearman correlation between each particle’s own CpG frequency and the average of nearby particles). (C) Example cross sections of three cell types, colored according to chromosome (left) or by the multichromosome intermingle index (right). (D) Among the human cells, four cell type clusters (shaded)—B lymphoblastoid cells, presumable T lymphocytes, B lymphocytes, and presumable monocytes/neutrophils (PBMC cells 9, 14, and 18)—could be distinguished from the differential formation (defined as end-to-end distance ≤ 3 particle radii) of known cell type–specific promoter-enhancer loops from published bulk promoter capture Hi-C (35). (E) The same four clusters could also be distinguished by unsupervised clustering via PCA of single-cell chromatin compartments, without the need for bulk data. The two alleles of each locus were treated as two different loci. GM12878 cell 16 was excluded from (D) and (E). (F) An example region that was differentially compartmentalized between two cell types (black, B lymphoblastoid cells; red, presumable T lymphocytes). Right panels visualize the configuration of the ~0.5-Mb region (chr 13: 62.5 to 63 Mb, thick yellow sticks) with respect to the rest of the genome (transparent, colored by CpG frequencies) in two representative cells. Only the paternal alleles are shown. Bulk Hi-C (black heat map with 50-kb bins) is from (19, 41). GM12878 cell 4 was excluded.

The overall extent of chromosome intermingling also differed among the cell types. Chromosomes tended to intermingle less in mESCs and more in PBMCs, with GM12878 intermediate between them (Fig. 4, B and C), consistent with previous reports that chromosomes intermingle less in the pluripotent mESCs than in terminally differentiated fibroblasts (32) and that chromosomes intermingle more in resting human lymphocytes than in activated ones (which resembled GM12878) (33). As expected (10, 34), the M/G1-phase cell exhibited a low level of chromosome intermingling and the lowest level of chromatin compartmentalization.

Cell type–dependent promoter-enhancer looping has been suggested to underlie differential gene expression (35). Among the human cells, differential formation of known cell type–specific promoter-enhancer loops [based on cell type–purified bulk Hi-C (15, 35)] clearly separated the single cells into four cell type clusters: B lymphoblastoid cells (GM12878), presumable T lymphocytes, B lymphocytes, and presumable monocytes/neutrophils (Fig. 4D). Defining loop formation on the basis of contacts rather than 3D structures yielded similar results (fig. S17A).

Cell type clusters could be equally well separated in an unsupervised manner, without prior knowledge of the cell types. Unlike ensemble-averaged structures such as protein crystal structures, single-cell 3D genomes are intrinsically stochastic and dynamic. Statistical characterization such as PCA is necessary to distinguish different cell types, in which clusters of single cells correspond to valleys in a Waddington landscape (36) of certain cellular phenotypes. This kind of cell typing has been carried out using phenotype variables such as single-cell transcriptomes (37) and open chromatin regions (38, 39), each of which must have underlying structural differences in the 3D genome.

With Dip-C, we are in a position to carry out cell typing with genome structure as the sole variable. Given the high information content of 3D structures, many possible features might be used in cluster analysis. Here, we chose single-cell chromatin compartments as the input variable of PCA. The four cell type clusters were clearly separated (Fig. 4E), with one of the most differentially compartmentalized regions shown in Fig. 4F. Our conclusion held if compartments were defined on the basis of contacts (fig. S17, B and C). Previous reports (7, 8, 1012) had focused on defining the width (or spread) of a single Waddington valley, studying, for example, cell cycle dynamics within a cell type and domain stochasticity within a cell cycle phase. Our PCA result, in contrast, highlighted the consistent difference among cell types, signifying the separation between Waddington valleys.

Our initial examination of only a handful of cell types has clearly shown the tissue dependence of 3D genome structures. A systematic survey of more cell types under various conditions will likely lead to new discoveries in cell differentiation, carcinogenesis, learning and memory, and aging.

Supplementary Materials

www.sciencemag.org/content/361/6405/924/suppl/DC1

Materials and Methods

Figs. S1 to S20

Tables S1 and S2

References (4244)

References and Notes

  1. See supplementary materials.
Acknowledgments: We thank C. Zong (Harvard University, currently Baylor College of Medicine) for his involvement at the early stage; E. Lieberman Aiden (Baylor College of Medicine) and E. Stamenova (Broad Institute) for advice about their in situ Hi-C protocol; T. Stevens (University of Cambridge) for help on his simulated annealing software (“nuc_dynamics”); M. Yang (Peking University) for published phased genotypes of the blood donor; T. Nagano and S. Wingett (Babraham Institute) for answering questions about published raw data; and Y. Gao, L. Meng, and S. Liu (Peking University) for helpful discussion. Funding: Supported by the Beijing Advanced Innovation Center for Genomics at Peking University, an NIH Director’s Pioneer Award (DP1 CA186693), a Harvard Brain Initiative (HBI) Collaborative Seed Grant, and two grants from the National Science Foundation of China (21390412 and 21327808) (X.S.X.); an HHMI International Student Research Fellowship (L.T.); and NHGRI grant R01 HG010040 (H.L.). Author contributions: L.T., D.X., C.-H.C., and X.S.X. designed the experiments; L.T. and D.X. performed the experiments; L.T. and H.L. analyzed the data. L.T., D.X., and X.S.X. wrote the manuscript. Competing interests: L.T., D.X., C.-H.C., and X.S.X. are inventors on a provisional patent application US 62/509,981 filed by Harvard University that covers META and Dip-C. Data and materials availability: Raw sequencing data were deposited at the National Center for Biotechnology Information with accession number SRP149125 at www.ncbi.nlm.nih.gov/sra/SRP149125. Processed data were deposited with GEO Series accession number GSE117876 at www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE117876. Code is available at GitHub (https://github.com/tanlongzhi/dip-c and https://github.com/lh3/hickit). Chromatin contacts can be viewed interactively in the Juicebox web browser (40); please see the manual page of our codes.
View Abstract

Navigate This Article