Research Article

Different mutational rates and mechanisms in human cells at pregastrulation and neurogenesis

See allHide authors and affiliations

Science  02 Feb 2018:
Vol. 359, Issue 6375, pp. 550-555
DOI: 10.1126/science.aan8690

Brain mutations, young and old

Most neurons that make up the human brain are postmitotic, living and functioning for a very long time without renewal (see the Perspective by Lee). Bae et al. examined the genomes of single neurons from the prenatal developing human brain. Both the type of mutation and the rates of accumulation changed between gastrulation and neurogenesis. These early mutations could be generating useful neuronal diversity or could predispose individuals to later dysfunction. Lodato et al. also found that neurons take on somatic mutations as they age by sequencing single neurons from subjects aged 4 months to 82 years. Somatic mutations accumulated with increasing age and accumulated faster in individuals affected by inborn errors in DNA repair. Postmitotic mutations might only affect one neuron, but the accumulated divergence of genomes across the brain could affect function.

Science, this issue p. 550, p. 555; see also p. 521


Somatic mosaicism in the human brain may alter function of individual neurons. We analyzed genomes of single cells from the forebrains of three human fetuses (15 to 21 weeks postconception) using clonal cell populations. We detected 200 to 400 single-nucleotide variations (SNVs) per cell. SNV patterns resembled those found in cancer cell genomes, indicating a role of background mutagenesis in cancer. SNVs with a frequency of >2% in brain were also present in the spleen, revealing a pregastrulation origin. We reconstructed cell lineages for the first five postzygotic cleavages and calculated a mutation rate of ~1.3 mutations per division per cell. Later in development, during neurogenesis, the mutation spectrum shifted toward oxidative damage, and the mutation rate increased. Both neurogenesis and early embryogenesis exhibit substantially more mutagenesis than adulthood.

Somatic mutagenesis is one of the emerging areas of vertebrate genome biology. Several studies revealed extensive genomic mosaicism marked by hundreds of single-nucleotide variants (SNVs) per cell in somatic tissues of the human body, such as skin fibroblasts, intestine, liver, and colon (13). Mosaic copy-number alterations are also common, and insertions of retrotransposable elements have been detected (410). Mosaicism is prominent in the central nervous system, with implications for brain evolution and the genomic underpinnings of human neuropsychiatric disorders (11, 12). Roughly 1500 SNVs might be present in mature neurons from the adult human cortex, which are only detectable in the analyzed cell and are thought to be related to transcriptional activity (13). However, the temporal origin of these SNVs during development is unknown. Furthermore, the use of in vitro whole-genome amplification (WGA) from DNA of single nuclei is prone to experimental artifacts mimicking SNVs (14, 15). Here we describe the discovery and analysis of mosaic SNVs in neuronal progenitor cells in three fetal human brains. Individual progenitor cells were allowed to proliferate into clonal cell populations, which yielded insights into the genomes of the founder cells (fig. S1) and provided an estimation of the frequency and mutation spectrum of mosaic mutations in human development while avoiding WGA-associated artifacts.

Discovery and validation of mosaic SNVs

Brains were collected from phenotypically normal postmortem human fetuses ranging in age from 15 to 21 weeks postconception. Based on a comparison of counts of germline SNVs (3,809,591 for subject 316; 4,316,547 for subject 320; and 3,746,847 for subject 275) to those derived by the 1000 Genomes Project across different human subpopulations, we concluded that subjects 316 and 275 were of non-African origin, whereas subject 320 (male, 17 weeks postconception) was of African descent.

From a bulk culture of dissociated cells of the ventricular and subventricular zones (VZ-SVZ) of the frontal region of the cerebral cortex, parietal cortex, or basal ganglia (BG), we grew 31 single-cell–derived clonal cell populations, each containing a few thousand cells, using the limiting dilution approach (fig. S1). A few possible divisions before dilution are not likely to notably contribute to the mutation landscape in each cell. DNA extracted from the individual clones, the source tissue of germinal zones, and the spleen was sequenced to a minimum of 30x genome coverage (fig. S1). For three clones, we could not derive enough cells; hence, DNA was amplified by mutliple displacement amplification before sequencing.

Mosaic SNVs present in the founder cell of the clones were discovered by comparing genomes of clones both to each other and to genomes of the germinal zone tissue and spleen (table S1). We selected those calls with greater than 35% variant-allele frequency (VAF) in clones as candidate mosaic SNVs. This limit was chosen to exclude mutations arising during culture, which should have a VAF of 25% or less. The distribution for the SNV discovery data set is centered around a VAF of 50%, as expected for true mosaic variants (figs. S1 and S2).

When comparing clones to each other, we filtered the resulting calls on the basis of the conformity of their recurrence to clones that are expected and are not expected to carry the same mosaic SNVs (fig. S3). Calls from such clone-to-clone comparisons were 98.9% concordant with calls from comparing clones to VZ-SVZ brain tissue or spleen (Fig. 1A). However, among 68 calls made exclusively from clone-to-clone comparisons, 31 (46%) were missing from clone-to–brain tissue or clone-to-spleen comparisons because they corresponded to SNVs present in tissues at high frequency (Fig. 1C), demonstrating the advantage of the clone-to-clone comparison approach. Therefore, the clone-to-clone comparison represents an alternative design to the use of familial trios (1) for the study of mosaicism.

Fig. 1 SNV discovery in brains.

(A) Three approaches of discovering mosaic SNVs were contrasted: comparing clones to the VZ-SVZ tissue of origin, comparing clones to the spleen, and comparing clones with each other (see fig. S3). The three approaches give largely concordant calls. The comparison is for calls from the brains of all three subjects. (B) Calls specific to clone-to-original tissue (in blue) and clone-to-spleen (in red) discovery approaches are notably enriched for bases with less confident calling (as defined by the mask of the 1000 Genomes Project). These residual calls were not included in the final call set. Colors as in (A). (C) VAF of genotyped SNVs from deep resequencing in all three brains. The clone-to-clone discovery approach allows for finding high-frequency mosaic SNVs in brain tissue (green line) that are missed from clone-to-tissue or clone-to-spleen comparisons. Colors as in (A). (D) Counts of mosaic SNVs (for subjects 316, 320, and 275) per clone increase linearly with fetal age (w, weeks; d, days). BG, basal ganglia; FR, frontal region; PA, parietal region. (E) Contribution of each substitution type to the mutation spectrum is not different between different fetuses and brain regions. Colors as in (D). Bars in (D) and (E) indicate mean ± SEM.

Eight randomly selected SNVs were all confirmed in the clones using polymerase chain reaction (PCR) and Sanger sequencing (table S2). As an additional validation strategy, we designed an oligonucleotide library complementary to the loci of all 6288 SNVs comprising the discovery data set and performed capture and deep resequencing (to ~1000x coverage) in the DNA from 10 clones. This confirmed the 50%-centered VAF distribution for a majority of SNVs, with a minority (5.1%) having a VAF lower than 35%, perhaps indicating that these variations could have arisen during cell culturing (figs. S1 and S4). Accordingly, we estimated our false-positive rate at around 5%. From an in silico comparison of our clones with the unrelated and well-characterized cell line NA12878, we estimated that the sensitivity for discovering mosaic SNVs in the clones was ~83% (fig. S5).

Mosaic SNV counts, mutation spectra, and distributions across brain regions

SNVs were found at rates of 108 to 572 per clone, with clones from older brains containing more variants (Fig. 1D), which averages to 200 to 400 SNVs after adjustment for discovery sensitivity and false positives. No differences in SNV counts for clones from frontal and parietal cortex and from frontal cortex and basal ganglia of the same brain were noticeable (Fig. 1D). Similarly, the relative contributions of substitution types to the mutation spectrum were the same for clones from different brains and from different brain regions (Fig. 1E). Overall, the transition-to-transversion ratio (Ti/Tv) was 0.6, with the most frequent substitution type being a C:G→A:T transversion. This perhaps reflects DNA damage by oxidation, resulting in 8-oxoguanine that is later fixated to threonine through incorrect base pairing with adenine (16). The second most common substitution was a C:G→T:A transition, which is thought to be caused by deamination of cytosine and 5-methylcytosine (16). Linear approximation of the increase in SNV counts over time allowed estimation of a mutation rate of 5.1 (95% CI, 1.5 to 9) SNVs per day per progenitor during neurogenesis. This projects to a rate of roughly 8.6 (95% CI, 1.6 to 20) SNVs per division per progenitor, assuming that the length of the cell cycle of cortical progenitors is between 27 and 54 hours, which is based upon studies in primates (17). The large interval of this estimation is due to uncertainties in both the per-day increase in SNVs and the length of the cell cycle. Interindividual variability may further widen the confidence interval of this estimate.

To genotype the presence of variants across brains, we used the above-referenced capture library to conduct deep sequencing (~1000x coverage) in the source tissue (dorsal and basal germinal layers), in the corresponding outer layers containing mature neurons (frontal, parietal cortex and basal ganglia), in other brain regions (occipital cortex, cerebellum), and in a peripheral tissue, the spleen (table S3). A total of 144 SNVs were reliably genotyped, from 11 to 68 in each tissue, including spleen, with VAFs between 0.3 and 30% (Fig. 2A and fig. S6). High VAFs for seven such SNVs were further cross-confirmed with an orthogonal technique, droplet digital PCR (table S2 and figs. S7 to S17). However, for hundreds of SNVs at much lower VAFs (typically below 1%), the evidence for their presence in each tissue was not significant, likely because of their low VAFs in tissues.

Fig. 2 Genotyping of SNV in original tissues.

(A) Several dozens of mosaic SNVs with VAFs of 0.3 to 30% in tissues from various brain regions and the spleen are genotyped by the capture-resequencing approach (green line). For hundreds more SNVs, the evidence for presence in tissue is indistinguishable from background noise (blue line). BG, basal ganglia; FR and PA, frontal and parietal region of the cerebral cortex, respectively; VZ-SVZ, ventricular germinal layers; CX, outer cortical layer (see fig. S1). (B) Venn diagram of genotyped mosaic SNVs across brain regions and spleen for subject 316. Almost 60% of mosaic SNVs could be genotyped in one or more brain regions and spleen, and 44% could be genotyped in all brain regions and spleen. (C and D) Comparative VAFs for mosaic SNVs across different brain regions and spleen for the same subject. Many SNVs are shared by multiple brain regions and by brain and spleen with similar VAFs (SNVs shared across two tissues are indicated by green, red, and blue circles, whereas SNVs shared across three tissues are indicated by magenta circles). Black and gray circles indicate SNVs genotyped in only one region.

Almost 60% of the genotyped SNVs (1.4% of total SNVs) and 92% of the SNVs with VAF above 2% in at least one brain region had a nonzero VAF in the spleen (Fig. 2, B and C, and figs. S6 and S18). Because the brain is of neuroectodermal origin and the spleen is of mesodermal origin, these shared SNVs likely occurred before gastrulation, when the mesoderm, ectoderm, and endoderm differentiate from a single-layer epithelium. This suggestion is consistent with the range of VAFs of these shared SNVs, as there are about 12 cell divisions before gastrulation (18), which corresponds to expected VAFs from 0.03 to 25% in somatic tissues, depending on how early in embryogenesis variants have occurred. Some SNVs were shared between the spleen and only some of the brain regions (Fig. 2, B and C, and figs. S6 and S18). This could indicate regional sublineages, i.e., that nonmixing sets of progenitors generate neurons in different brain regions and that these distinct populations of progenitors may not share a common ancestor with cells in the spleen. However, conducting more sensitive assays is necessary to exclude the possibility of incomplete genotyping. Another observation pointing to an early origin of the mutations is that the overlap between SNVs in the basal ganglia VZ-SVZ and the cortical VZ-SVZ with their respective differentiated regions was less than the overlap between the four regions together (fig. S18), suggesting that most of the genotyped mutations found in the brain arise at least before the splitting between basal and dorsal regions of the telencephalon, i.e., at around the neural plate stage or earlier.

Genotyped mosaic SNVs clearly cluster by similar VAFs in each brain (Fig. 3 and fig. S19). On the basis of the average values of the frequencies for each cluster and sharing of such SNVs between clones, we concluded that these clusters likely represent variants created during sequential postzygotic cleavages. Assuming equal contribution of dividing cells to tissues, we reconstructed the cell-progeny tree and determined the precise origin of 84 mutations during the first five postzygotic cleavages (Fig. 3B and fig. S19). These mutations typically had VAFs above 1%, whereas the remaining ones, typically with VAFs below 1%, were assigned to later divisions. Only two SNVs had conflicting assignment between clusters and clones, which perhaps could be explained by misclustering or incorrect discovery and/or genotyping in clones. Alternatively, these conflicts, along with the high VAFs for the very first SNV in the tree (Fig. 3A), may indicate an unequal lineage contribution to tissues due to asymmetric division, unequal proliferation, or positive or negative selection (19).

Fig. 3 Reconstruction of mosaic SNV mutations during early development of subject 316.

(A) Hierarchical clustering of SNVs genotyped in the different brain regions and spleen by their VAFs revealed grouping consistent with SNV sharing between clones (white squares represent zero VAF). Black and gray squares denote, respectively, SNVs discovered in clones and SNVs missed during discovery but genotyped afterwards. For completeness, five SNVs (marked with *) were included in the analysis if present in multiple clones but the corresponding VAF estimation from capture resequencing was not available. Their VAFs were estimated from whole-genome tissue sequencing. On the basis of the corresponding average VAF (shown underneath each cluster), each cluster was assigned to consecutive postzygotic divisions: D1 (no SNVs observed), D2, D3, D4, and D5. The # notation indicates the ID number of each clone. (B) The reconstructed cell-progeny tree during those divisions had only two conflicts of SNV assignment, denoted by “?”, between clusters and clones. “Expected VAF” denotes VAF of mutations arising at each stage, assuming equal contribution of all progenies to tissues. (C) Mutational spectra of likely early mosaic SNVs (darker color shades) and presumably later arising SNVs (lighter color shades) are different. The difference in the spectra is due to the shift in frequency of C:G→T:A transitions, particularly in CpG motifs, and C:G→A:T transversions. The spectrum of early SNVs is much closer to the spectrum for de novo SNVs in the human population (triangle with Pearson’s correlation coefficient r values). Random distribution represents correlation coefficients when randomly, but proportionally, subsampling early and late mutations.

Using the trees, we estimated the average mutation rates per division per daughter cell in the early human embryo as 1.66 ± 0.24, 1.18 ± 0.33, and 1.05 ± 0.22 for brains of subjects 316, 320, and 275, respectively. The weighted average and variance of the three measurements is 1.3 ± 0.15, consistent with the rate of 1.2 estimated from analysis of de novo SNVs in familial trios (20), and lower than the lower-bound estimate for mutability of neuronal progenitors, thereby suggesting that the mutation rate during neurogenesis is higher than that in the early embryo.

We then split the set of mosaic SNVs into early origin—those genotyped in tissues by capture experiments—and likely late origin—those not genotyped. The spectrum of early mosaic mutations (i.e., the frequency of nucleotide substitution in the context of trinucleotide motifs) bears little resemblance to the spectrum of SNVs occurring later in neuronal progenitors, revealing a shift in mutagenesis during development (Fig. 3C). Early mutations had the same 2.2 Ti/Tv ratio as germline variants and had a larger fraction of C:G→T:A transitions overall (P value of 2.2 × 10−16, by Fischer’s exact test), particularly in CpG motifs (P value of 4.3 × 10−5, by Fisher’s exact test), implicating the spontaneous deamination of 5-methylcytosine as a contributor to the mutagenic process (16). The signature of earlier mutations was also similar to the signature of de novo mutations in the human population (20). As some of the early mutations can be passed to the next generation through the germline lineage, the convergence of their spectrum with de novo and germline variants is expected. Later mutations, on the other hand, had a larger contribution of C:G→A:T transversions (P value of 8.0 × 10−12, by Fisher’s exact test), implicating oxidative damage as a significant contributor to the mutagenic process (16). Furthermore, the mutation spectrum for these transversions was most similar (Pearson’s correlation coefficient r = 0.90) to the spectrum observed in colorectal cancer that results from a deficiency in the DNA glycosylase MUTYH, where repair of 8-oxoguanine, the product of oxidative damage, is compromised (21).

Properties of mosaic SNVs

For the following analyses, we used all mosaic SNVs in the discovery data set from all clones, totaling 6288, as mutation spectra for the three brains were extremely similar (fig. S20). The distribution between neighboring mosaic SNVs was consistent with uniform random placement across the genome (fig. S21). In line with this, no enrichment of exonic and intronic SNVs in any gene ontology category was observed either when assuming uniform background mutation rate or when using, as a background mosaic, SNVs from liver, intestine, colon, or fibroblasts (1, 2). Roughly 3% of SNVs may have a functional consequence by affecting either protein-coding sequence or gene regulation (fig. S21 and table S4). This projects to about 12 nonbenign SNVs per progenitor cell at 20 postconception weeks. A significant depletion of mosaic SNVs was observed in deoxyribonuclease (DNase)–hypersensitive sites relative to flanking regions (Fig. 4A). The depletion was more pronounced (10 versus 5%) when utilizing DNase-hypersensitive sites for fetal brain rather than for a lymphoblastoid cell line, suggesting a relation between a cell’s epigenome and the genesis of mutations. Because no such depletion was observed in coding relative to intronic gene regions, the depletion is not the result of negative selection and, rather, likely reflects better repair efficiency in open DNA regions, as was observed for somatic mutations in cancers (22, 23).

Fig. 4 Properties of mosaic SNVs in brain.

(A) Depletion of mosaic SNVs in DNase-hypersensitive sites, possibly indicating a better efficiency of DNA-repair pathways in those regions (22, 23). Kb, kilobase. (B) Density of mosaic SNVs correlates negatively with histone marks in embryonic stem cells (ESCs) and fetal brain, revealing similarity to somatic SNVs in cancers. (C) Mutational signatures 8 and 18 (orange) found in brain cancers have the highest two correlations with the mutation spectrum of mosaic SNVs. (D) Exhaustive combinations of pairs of signatures consistently show that signatures 1B and 5 also contribute to the description of the mutation spectrum in combination with signature 18. Thus, signature 18 is the best descriptor of mosaic SNVs in developing brain.

Similar to somatic SNVs in cancers (24) and mosaic SNVs in skin fibroblasts (1), we found that the density of mosaic SNVs in the brain correlates negatively with most histone marks in fetal brain and embryonic stem cells (Fig. 4B). Comparison of our mosaic SNVs with mutational signatures found in cancer (25, 26) revealed that signatures 18 and 8—found in neuroblastoma and medulloblastoma, respectively—as well as their combination, are the best descriptors for the spectrum of mosaic mutations in the developing brain (Fig. 4, C and D). Mosaic SNVs were equally well described by the combination of signatures 5 with 18 and 1B with 18. Therefore, signature 18, with suspected etiology of oxidative damage (27), consistently contributed to the mutation spectrum of mosaic SNVs in fetal brain progenitors. This signature was mostly similar to late SNVs, whereas signature 1B was mostly similar to the early ones (fig. S22).

Implications for development and disease

Our study uncovered extensive mosaicism in human fetal brain, with 200 to 400 SNVs present per brain progenitor cell at 15 to 21 weeks of gestation. This amount of mosaicism is likely inherited by cortical postmitotic neurons, as neurogenesis ends at around 20 weeks in humans (28). Indeed, our estimate is in good agreement with the estimate that postmitotic neurons have ~300 to 900 mosaic SNVs within one year of birth (29). There is an order-of-magnitude difference between numbers of mosaic SNVs and de novo single-nucleotide polymorphisms (SNPs) (20), implying a higher effect of mosaic SNVs on normal brain development and disease. Indeed, we estimate that up to 12 nonbenign mutations can be present in neuronal progenitors and consequently transmitted to a sizable fraction of daughter neurons. It is conceivable that, in rare cases, some of these mutations may have a strong deleterious effect, for example, initiating overgrowth (30, 31) or neoplastic transformation by knocking out key genes. Indeed, the resemblance of mosaic SNVs in fetal brain to somatic mutations in brain cancers and, particularly, to medulloblastoma supports the theory that cancer-driving mutations can happen by chance during background mutagenesis (32).

As dozens of discovered mutations happen before gastrulation, our study demonstrates that early postzygotic mutations can be reconstructed from the analysis of a handful of clones and tissues, opening an avenue for charting individualized mosaicism maps. As mosaic variants can contribute to interindividual phenotypic differences and have been implicated in an individual’s disease risk, we suggest that knowing the individual “mosaicome” could be as important as knowing the individual germline genome, particularly given the much stronger selection acting on germline variants and the lower penetrance of mosaic variants that is likely to be translated in milder phenotypes.

We also discovered a shift in mutagenesis during development that is characterized by an increased mutation rate and a change in frequency of substitution types. We cannot rule out that the increased mutation rate can be partially explained by interindividual variation, although we have no evidence for such variability. The shift occurs sometime between early cleavages and neurogenesis and may be the consequence of physiological, biochemical, and gene-expression changes related to the generation of neurons from neural stem cells. Alternatively, the shift may reflect more general developmental processes common to all tissues during organogenesis, and, on the basis of increased counts of mutations related to oxidative damage, could be coupled to a higher availability of radical oxygen species after development of the cardiovascular system of the embryo. If this is the case, we predict that mutation spectra and rates per division undergo a similar shift during development across all somatic lineages.

Our estimated average mutation rate of 5.1 SNVs per day per neuronal progenitor during neurogenesis implies that neurons generated at early and later stages of neurogenesis will carry different burdens of mosaic variants. This rate is three orders of magnitude higher than 0.4 to 2 mutations per year accumulated in the germline lineage of adults (20, 33, 34). It is also 50 times higher than the rate in postnatal stem cells of the small intestine, colon, and liver, estimated to be 36 mutations per year (2). Therefore, our results show that the prenatal period is intrinsically highly mutagenic, likely the consequence of oxidative damage coupled with more frequent cell divisions.

We found no difference in SNV counts between progenitors from cortex and from basal ganglia, implying that mosaicism accumulates at similar rates across the brain during neurogenesis. With the observed rate of 5.1 SNVs per day per neuronal progenitor, one can project that cells in the forebrain subventricular zone and hippocampal subgranular zone, where neurogenesis and gliogenesis continues for more extended time periods (3537), would accumulate about 1000 mosaic mutations by the time of birth. This estimate is consistent with the estimates of about 1000 mosaic SNVs present in skin fibroblast cells and in stem cells of the colon and intestine in children (1, 2); indeed, mutation rates in all somatic proliferative cell lineages during prenatal development may be similar.

Supplementary Materials

Materials and Methods

Figs. S1 to S24

Tables S1 to S4

References (3841)

References and Notes

Acknowledgments: This work was supported by the high-performance computing (HPC) facilities operated by the Yale Center for Research Computing and Yale’s W. M. Keck Biotechnology Laboratory, as well as their respective staff. This work is also supported by NIH grants RR19895 and RR029676-01, which helped fund the cluster. The sequencing data from this study have been deposited to the NIH National Institute of Mental Health (NIMH) Data Archive ( under collection ID #2330 and DOI: 10.15154/1410419. This work was funded by the Mayo Clinic Center For Individualized Medicine and by NIH grants R01 MH100914 (F.M.V.), U01 MH106876 (F.M.V., A.A., A.E.U.), U01 MH106874 (N.S.), P50 MH106934 (N.S.), and R03 CA191421 (A.A.). A.A. is also a Visiting Professor at Yale Child Study Center. The supplementary materials contain additional data.

Stay Connected to Science

Navigate This Article