Transgenerational Epigenetic Instability Is a Source of Novel Methylation Variants

See allHide authors and affiliations

Science  21 Oct 2011:
Vol. 334, Issue 6054, pp. 369-373
DOI: 10.1126/science.1212959


Epigenetic information, which may affect an organism’s phenotype, can be stored and stably inherited in the form of cytosine DNA methylation. Changes in DNA methylation can produce meiotically stable epialleles that affect transcription and morphology, but the rates of spontaneous gain or loss of DNA methylation are unknown. We examined spontaneously occurring variation in DNA methylation in Arabidopsis thaliana plants propagated by single-seed descent for 30 generations. We identified 114,287 CG single methylation polymorphisms and 2485 CG differentially methylated regions (DMRs), both of which show patterns of divergence compared with the ancestral state. Thus, transgenerational epigenetic variation in DNA methylation may generate new allelic states that alter transcription, providing a mechanism for phenotypic diversity in the absence of genetic mutation.

Cytosine methylation is a DNA base modification with roles in development and disease in animals as well as in silencing transposons and repetitive sequences in plants and fungi (1). In plants, CG methylation is commonly found within gene bodies (25), whereas non-CG methylation, CHG and CHH (where H is A, C, or T), is enriched in transposons and repetitive sequences (1). The RNA-directed DNA methylation (RdDM) pathway targets both CG and non-CG sites for methylation and is commonly associated with transcriptional silencing (6). This pathway can also target and silence protein-coding genes, giving rise to epigenetic alleles or so-called epialleles that can be heritable through mitosis and/or meiosis (7, 8) and can be dependent on the methylation of a single CG dinucleotide (9).

Two meiotically heritable epialleles resulting in morphological variation are the peloric (Linaria vulgaris) and colorless non-ripening (Solanum lycopersicum) loci (10, 11). Both show spontaneous epigenetic silencing events within their respective populations (10, 12). However, the frequency at which such spontaneous meiotically heritable epialleles naturally arise in populations is unknown. Although epiallelic variation has been identified between genetically diverse populations within Arabidopsis thaliana (13), it is unclear whether these identified epialleles are due to underlying genetic variation. Epialleles have also been artificially generated after mutagenesis or because of mutations in the cellular components required for the maintenance of DNA methylation (1416).

An A. thaliana (Columbia-0) population, the MA lines, derived by single-seed descent for 30 generations (17) was used to examine the extent of naturally occurring variation in DNA methylation and the frequency at which spontaneous epialleles emerge over time. We used the MethylC-Seq method (3) to determine the whole-genome base resolution DNA methylomes for three ancestral MA lines (numbers 1, 12, and 19) and five descendant MA lines (numbers 29, 49, 59, 69, and 119) (fig. S1). We refer to lines 1, 12, and 19 as ancestors throughout this study, although they are not direct ancestors because they are three generations removed from the original founder line (fig. S1). These specific descendant lines were selected because their genomes have been sequenced and they have a known level of spontaneous mutation (18). Biological replicates (sibling plants) for each leaf methylome were sequenced to an average of ~34-fold coverage, which allowed for an average per line examination of 39,897,093 (96.35%) uniquely mapped cytosines and 5,307,077 (98.39%) uniquely mapped CGs (table S1).

A total of 1,730,761 CGs were methylated (mCGs) in at least one MA line (Fig. 1A), and about 91% of the covered mCGs were invariably methylated across all eight lines (19). The variable mCGs revealed a set of 114,287 high-confidence CG single methylation polymorphisms (SMPs) that showed a consensus of the methylation status of CG dinucleotides between biological replicates (Fig. 1A). Next, a reference MA founder DNA methylome was created by pooling the completely conserved mCG site calls for all ancestral MA lines and used to determine the frequency of discordant CG-SMP sites within the descendant population (Fig. 1B). Within the descendant lines, ~1.62% of the CG methylome shows susceptibility to dynamic acquisitions and losses of mCGs over time (table S2). On average, ~66,000 methylated CG-SMPs (mCG-SMPs) were identified for each ancestral and descendant line (fig. S2). Although the total number of mCG-SMPs was similar between all lines, the conservation of these polymorphisms among and between ancestral and descendant populations was different (Fig. 1C and table S3). A pairwise comparison of both populations for methylation conservation, estimated by global similarity of mCG-SMP sites (19), revealed that all of the ancestral lines are highly similar (table S4). Descendant lines showed greater similarity in CG-SMPs methylation status to ancestral lines than to other descendant lines (table S4).

Fig. 1

Epigenetic variation of CG-SMPs. (A) An example of a CG-SMP. Gold lines indicate CG methylation, maroon rectangle indicates the untranslated regions, and green rectangles indicated exons. (B) A breakdown of the methylation distribution of CG dinucleotides among all samples. (C) A heatmap indicating the number of CG-SMPs that differ between two samples (table S3).

We calculated an estimate of the epimutation rate per generation in this population by using linear regression and TREE PUZZLE, which revealed 704 and 2876 methylation changes each generation, respectively (19). We estimated a lower bound of the epimutation rate with the linear regression results, which revealed 4.46 × 10−4 methylation polymorphisms per CG site per generation (P < 0.0000216) (table S5). This finding contrasts with the previously reported spontaneous genetic mutation rate of 7 × 10−9 base substitutions per site per generation for these same MA lines (18). The TREE PUZZLE analysis revealed higher estimated epimutation rates in earlier generations (19). One possible source of this variation could be due to seed age, storage, and/or selection for seed survival. Therefore, although DNA methylation is predominantly static over relatively long periods of time, changes in cytosine methylation do occur and at a frequency greater than that of mutation observed at the DNA sequence level.

By using CG-SMPs derived from both ancestral and descendant populations, we carried out a genome-wide analysis of differentially methylated regions (DMRs) and identified 2485 CG-DMRs that ranged in size from 11 to 1110 base pairs (bp) (Fig. 2A and table S6). Hierarchical clustering of CG-DMRs in this population, calculated solely on the basis of the methylation density, revealed that the ancestral lines segregate as an independent cluster from the descendant lines (Fig. 2B and fig. S3). Multivariate distance-based regression (MDMR) (20, 21) confirmed this finding, indicating a statistically significant (P < 0.00005) association between ancestor or descendant status and methylation density of the CG-DMR profiles. The ancestor or descendant status explained 47% of the variance in the dissimilarity in methylation density of CG-DMRs between pairs of samples, indicating that, over time, there is a divergence of DNA methylation patterns in both formation and elimination of CG-DMRs. Furthermore, the genome-wide locations of these CG-DMRs were not uniformly distributed (P < 2.20 × 10−16), because 60.5% (1504/2485) were found in genic regions compared with 3.3% (82/2485) and 36.2% (899/2485) located in intergenic regions and transposons, respectively (Fig. 2B).

Fig. 2

CG-DMRs diverge over time and are enriched in gene bodies. (A) Example CG-DMR present in an unmethylated state in both replicates of line 69. (B) A heatmap representation of a two-dimensional hierarchical clustering based on DMRs. Columns represent samples. Rows indicate DMRs. The column to the left of the heatmap indicates the genomic location of the DMR (blue, gene body; gold, transposon; gray, intergenic; red, transposon in gene body). (C) The average distribution of CG-DMRs (red) and nonCG-DMRs (blue) across gene bodies (from the start of the 5′ UTR to the end of the 3′ UTR, including 500 bp up- and downstream). (D) CG gene-body DMRs are specifically depleted in exons. (E) Genome-wide distributions of mCG (red), CG-SMPs (green), and CG-DMRs (blue) across chromosome I. (F) Genome-wide distributions of methylated nonCGs (mnonCG, red) and nonCG-DMRs (green) across chromosome I. The centromere is indicated by the pink vertical bar for (E) and (F).

Next, we performed a genome-wide survey for nonCG-DMRs and uncovered a total of 284 among all eight lines (table S7). In general, the nonCG-DMRs were largely localized to intergenic regions (141/284) of the genome, because only 57/284 overlapped with genes and 86/284 overlapped with transposons. The size ranges of the nonCG-DMRs were similar to those of the CG-DMRs because the vast majority occurred in smaller segments of the genome (10 to 682 bp). Therefore, variation in DNA methylation appears to occur in all three methylation sequence contexts.

CG methylation is present within gene bodies and is enriched toward the 3′ end (25), whereas CG and nonCG methylation is associated with heterochromatin, transposons, and repetitive sequences (1). In agreement with these findings, we observed that the 3′ portion of genes contained the greatest source of CG-DMRs and that the majority of nonCG-DMRs were enriched outside of the gene bodies (Fig. 2C). Furthermore, we observed a ~twofold depletion of CG-DMRs in exons compared with introns (Fig. 2D). The genome-wide distributions of CG-SMPs, CG-DMRs, and nonCG-DMRs were depleted in heterochromatic regions in the genome (Fig. 2, E and F). These depletions were mostly observed at the pericentromeres and centromeres (Fig. 2, E and F, and figs. S4 and S5). CG-DMRs are enriched in transposons located in euchromatin but depleted in transposons present near the centromere. Because the centromeric regions of the genome contain the highest density of DNA methylation (Fig. 2, E and F), these observations combined with the observations that CG-DMRs are enriched in intron sequences may indicate that DNA methylation that is associated with nucleosomes (22) (i.e., exons or tightly packaged chromatin in the pericentromeres and centromeres) may be maintained at a higher fidelity and that DNA methylation not associated with nucleosomes may undergo greater epigenetic drift.

A genome-wide screen for DMRs simultaneously occurring in all three methylation sequence contexts (C-DMRs are CG, CHG, and CHH) was performed to assess the extent of epiallelic variation that is characteristic of RdDM across the MA population. In total, 72 C-DMRs were identified, of which functional categorization revealed that two-thirds overlapped with transposon and intergenic sequences whereas about one-third overlapped with gene bodies and promoters (Fig. 3A and table S8). To determine whether transposition-induced methylation could potentially give rise to the methylated C-DMRs (mC-DMRs) (23), genomic DNA encompassing all C-DMRs was amplified and compared in all ancestral and descendant lines. In every case, the observed amplicon size was identical for all MA lines and was equal to the expected size of the locus (table S8), indicating that these C-DMRs are unlinked to cis-genetic variation located within 500 bp, a distance that would be expected to reveal methylation induced by transposon insertions at these loci (23). Additionally, none of the genetic variants identified by genome resequencing of this population (18) overlapped with any of these C-DMRs. Lastly, restriction enzyme digestion and Southern blot analyses were performed to rule out the possibility that copy number variants were the cause of spontaneous epiallele formation, as is the case for the PAI epialleles (24). In all cases examined, the observed hybridization pattern and gene copy number were identical for each of the MA lines (fig. S6). Therefore, we conclude that the 72 C-DMRs represent a set of spontaneously occurring epialleles within the MA lines, because they were not associated with any genetic variation.

Fig. 3

Epiallelic variation at protein-coding loci is associated with transcriptional variation. (A) Classification of C-DMRs and their genomic locations. (B) The number of descendant lines discordant with the ancestral C-DMR state and the C-DMR methylation status. The black portions of the bar indicate the descendant C-DMRs that became methylated, whereas the white portions indicate regions that became unmethylated, compared with the ancestral population. (C) The 24-nt smRNA levels are associated with increasing methylation density. The 24-nt smRNA RPKCMs for all 576 C-DMRs (8 MA lines by 72 C-DMRs) were ranked and binned into 10% quantiles, and then the average mC densities were plotted. (D) A representative C-DMR at At5g24240 in which both biological replicates of descendant line 59 were unmethylated. (E) qRT-PCR analysis of At5g24240 reveals >50-fold increase in mRNA abundance in unmethylated line 59. Error bars indicate SEM. (F) The 24-nt smRNAs are enriched specifically in the MA lines that are transcriptionally silenced in (E) for the At5g24240 locus with the exception of line 59, which is abundantly expressed in (E).

By using a set of C-DMRs that exhibited an identical methylation status (fig. S7), we determined the frequency of discordance of the ancestral state with the descendant lines and found that 29 of the C-DMRs were highly variable (>1 descendant line was discordant with the ancestral state) (Fig. 3B). C-DMRs discordant in only one of the five descendant lines were the most frequent class, but there was an unexpectedly high number of C-DMRs (63%) that were discordant in more than one descendant (Fig. 3B). Within the set of 576 C-DMRs identified (eight lines by 72 C-DMRs), 7 were discordant between the biological replicates (table S8). These data suggest that, although many C-DMRs represent the formation of spontaneous epialleles, a small subset may reflect the presence of “hotspots” (metastable epialleles).

We sequenced small RNA (smRNA) populations for all eight lines and found that smRNAs [represented as RPKCMs (reads per kilobase of each C-DMR per million reads) in tables S9 to 12] were associated with an increase in the average methylation density of C-DMRs (Fig. 3C). Furthermore, this association resembled a binary switch, because the most densely methylated C-DMRs contained abundant 24-nucleotide (nt) smRNAs (Fig. 3C).

Of the eight previously documented plant epialleles resulting in phenotypic variation, all affected transcriptional output of the differentially methylated locus (911, 2328). mRNA abundance was measured in all eight lines with quantitative reverse transcription polymerase chain reaction (qRT-PCR) at eight C-DMRs that overlapped with protein-coding regions. In four of these genes, the gain or loss of DNA methylation was correlated with a large decrease or increase in mRNA abundance, respectively, and with the presence of 24-nt smRNAs at each silenced epiallele (Fig. 3, D to F, and fig. S8). These findings reveal that changes in epiallelic state can lead to major effects on transcriptional output (fig. S9).

We also observed that the methylation status of one C-DMR resulted in alternative promoter usage of ACTIN RELATED PROTEIN 9 (At5g43500) (fig. S10C). The loss of DNA methylation within the 5′ untranslated region (UTR) of the At5g43500.1 isoform led to an increase in mRNA expression, whereas expression of isoform At5g43500.2, with a transcriptional start site located further downstream, was unaffected (fig. S10, D and E).

Although epialleles can have major impacts on phenotypic diversity, until now their identification was not trivial. Even more puzzling is the origin of “pure” alleles, which are defined by their formation in the absence of any genetic variation in cis or trans (8). One route to epiallele formation may be the failure to correctly maintain the proper methylation status throughout epigenetic reprogramming that occurs postfertilization (29, 30). It is noteworthy that 63 of the 72 C-DMRs overlap with regions previously shown to have altered methylation patterns in methylation enzyme mutants (Fig. 4) (3). Of the 14 C-DMRs that overlap with genes, 5 become reexpressed in met1-3 and 1 transcript becomes silenced in rdd (3). These results suggest that a failure to faithfully maintain genome-wide methylation patterns by MET1 and/or RDD is likely one source of spontaneous epiallele formation.

Fig. 4

Methylation status of all 72 epialleles in methylation and demethylation mutant backgrounds. Most of the epialleles become unmethylated in met1-3, whereas a smaller number become remethylated in the DNA demethylase triple mutant rdd.

Regardless of their origin, the majority of epialleles identified in this study are meiotically stable and heritable across many generations in this population. Understanding the basis for such transgenerational instability and the mechanism(s) that trigger and/or release these epiallelic states will be of great importance for future studies.

Supporting Online Material

Materials and Methods

SOM Text

Figs. S1 to S11

Tables S1 to S16


References and Notes

  1. Additional experiments and descriptions of methods used to support our conclusions are presented as supporting material on Science Online.
  2. Acknowledgments: We thank M. White, R. Lister, M. Galli, and R. Amasino for discussions; R. Shaw and E. Darmo for seeds; J. Nery for sequencing operations; and M. Axtell for Southern blot protocol. R.J.S. was supported by an NIH National Research Service Award postdoctoral fellowship (F32-HG004830). M.D.S. was supported by a NSF Integrative Graduate Education and Research Traineeship grant (DGE-0504645). M.G.L. was supported by an European Union Framework Programme 7 Marie Curie International Outgoing Fellowship (project 252475). O.L. and N.J.S. are supported by NIH/National Center for Research Resources grant number UL1 RR025774. This work was supported by the Mary K. Chapman Foundation, the NSF (grants MCB-0929402 and MCB1122246), the Howard Hughes Medical Institute, and the Gordon and Betty Moore Foundation (GBMF) to J.R.E. J.R.E. is a HHMI–GBMF Investigator. Analyzed data sets can be viewed at Sequence data can be downloaded from National Center for Biotechnology Information Sequence Read Archive (SRA035939). Correspondence and requests for materials should be addressed to J.R.E. (ecker{at}
View Abstract

Stay Connected to Science

Navigate This Article