Widespread Monoallelic Expression on Human Autosomes

See allHide authors and affiliations

Science  16 Nov 2007:
Vol. 318, Issue 5853, pp. 1136-1140
DOI: 10.1126/science.1148910


Monoallelic expression with random choice between the maternal and paternal alleles defines an unusual class of genes comprising X-inactivated genes and a few autosomal gene families. Using a genome-wide approach, we assessed allele-specific transcription of about 4000 human genes in clonal cell lines and found that more than 300 were subject to random monoallelic expression. For a majority of monoallelic genes, we also observed some clonal lines displaying biallelic expression. Clonal cell lines reflect an independent choice to express the maternal, the paternal, or both alleles for each of these genes. This can lead to differences in expressed protein sequence and to differences in levels of gene expression. Unexpectedly widespread monoallelic expression suggests a mechanism that generates diversity in individual cells and their clonal descendants.

In diploid eukaryotic organisms, it is generally assumed that the maternally and paternally derived copies of each gene are simultaneously expressed at comparable levels. However, there are exceptions where only one of the alleles is expressed. Monoallelically expressed genes fall into three distinct classes. One class is the autosomal imprinted genes (such as IGF2 and H19) whose monoallelic expression is regulated in a parent-of-origin–specific manner (1). A second class is X-inactivated genes regulated by a random process: Early in development, around the time of implantation, half of the cells inactivate the maternal X chromosome and half inactivate the paternal X chromosome (2). A third class is autosomal genes subject to random monoallelic expression (39), including the odorant receptor genes, as well as genes encoding the immunoglobulins, T cell receptors, interleukins, and natural killer cell receptors. For genes in this class, some cells express the maternal allele and other cells express the paternal allele. For some genes, cells expressing both alleles are also present [e.g., members of the interleukin gene family (5, 6)]. This third class was considered to comprise isolated examples of genes involved in the immune or nervous systems. Here, we present the development of a method for genome-wide identification of such genes, which revealed that more than 5% of assessed genes were subject to random monoallelic expression.

To carry out a genome-wide search for genes subject to random monoallelic expression, we used the Affymetrix Human Mapping 500 K array set, modifying the protocol to allow examination of RNA rather than DNA (10) (fig. S1). The locations of single-nucleotide polymorphisms (SNPs) on the Affymetrix 500 K SNP array are arbitrary with respect to locations within genes, with ∼1% falling within exonic sequence and ∼36% within intronic sequence. To allow the use of intronic SNPs, we purified nuclei so as to enrich intronic RNA. This RNA was then converted into double-stranded cDNA and used in place of genomic DNA in an Affymetrix genotyping protocol (figs. S1 to S4 and table S1). The search also took advantage of the observation that once a cell has decided to express one of two alleles (both in the case of X-inactivation and in the examples of autosomal monoallelic expression), the clonal descendants of this cell maintain the choice (2, 5, 6, 8, 9, 11). Because normal human B-lymphoblastoid cell lines are polyclonal, we derived clonal cell lines using single-cell cloning.

The overall approach thus generated “transcriptome-derived genotypes,” which we could compare to the regular genotype obtained from genomic DNA from the same clonal cell line. A homozygous transcriptome-derived genotype in the context of the same SNP yielding a heterozygous genomic DNA-derived genotype served to identify monoallelic expression. We developed an algorithm that used genotyping calls from independent replicate hybridizations of cDNA from each clonal cell line. To be conservative, we applied a number of filters to the data (12). These filters discarded potentially interesting observations (imprinting and allelic imbalance due to cis regulatory sequence polymorphism) but were essential to avoid possible cDNA genotyping artifacts. Overall, ∼10% of SNPs were reliably called from cDNA; this was expected, because most of the other SNPs are likely present in regions of the genome with insufficient transcription in the B cell lines we analyzed (13).

In a proof-of-principle analysis, our approach detected the chromosome-wide nature of X-inactivation in female clonal cell lines [including its subtle properties (1416 and supporting online text)] while detecting biallelic expression in nonclonal cells (Fig. 1). The X chromosome is represented by 5710 SNPs on the Nsp 250 K array. Analyzing individual H, 1294 of 5710 X-linked SNPs were heterozygous, and 135 of these heterozygous SNPs were called in cDNA from two or more clonal lines (17). X-inactivation was also observed in clonal cell lines from a second female (fig. S5). The majority (88 of 135) of these X-linked SNPs were within known annotated genes, whereas the rest were in areas currently labeled as intergenic regions. Nonetheless, these intergenic SNPs correctly report X-inactivation. Although allele-specific reverse transcription polymerase chain reaction (RT-PCR) experiments confirmed monoallelic expression for intergenic SNPs both on the X chromosome and on the autosomes, we focus here on the SNPs that were within genes.

Fig. 1.

Proof-of-principle: detection of X-inactivation. (A) SNP-level view of chromosome-wide X-inactivation in individual clonal lines. Each column corresponds to cDNA genotypes from one cell line from the same female; each row, to one SNP. Only informative SNPs are shown; top to bottom, in the order of ascending coordinate on the X chromosome [National Center for Biotechnology Information (NCBI) 35 assembly]. SNPs located within known genes are marked black in the rightmost column; the rest are in the intergenic regions. Ψ denotes pseudoautosomal region. H is the line prior to subcloning. (B) Gene-level view of X-inactivation in individual clonal lines. Each oval represents a gene. The active allele was assigned based on complete agreement of all informative SNPs within the gene. The center of the oval corresponds to position of the gene on the NCBI 35 assembly, and all ovals are of equal size. Brown boxes denote centromeres.

Having tested the cDNA genotyping approach on the X chromosome, we set out to systematically search for autosomal genes displaying random monoallelic expression. Examination of all the SNPs within a gene allowed an assignment of allele-specific expression to each gene in each clone (Fig. 1B and fig. S6). Across the entire genome, for the genes whose transcription we could assess, an average of 2.75 informative SNPs were called per gene. The overall agreement between SNPs present within a single gene was >98% (based on analyses of >10,000 instances in clones from each of three people). The agreement among SNPs present in a given gene extended to both exonic SNPs and intronic SNPs (fig. S5).

For a given gene subject to monoallelic expression, there are three possible types of expressing clones: monoallelic-paternal, monoallelic-maternal, and (for some genes) clones expressing both alleles. The APP (amyloid precursor protein) gene serves as an example of how we interpreted the cDNA genotyping data in general. Additionally, APP is important because of its involvement in Alzheimer's disease. Analyses of cDNA from clones of individual H revealed monoallelic expression of APP (Fig. 2, A and B); some clonal cell lines displayed monoallelic expression (either the maternal or the paternal allele), whereas another clonal cell line revealed biallelic expression. Individual A also displayed clear evidence of monoallelic expression of the APP gene, whereas individual M had only two clonal cell lines, each of which displayed biallelic expression. Although this might imply that in individual M the APP gene is expressed exclusively biallelically, it could also be due to the small number of clones we were able to analyze for this individual. Direct sequencing of RT-PCR products, and genotyping of RT-PCR products using primer-extension, both served to confirm monoallelic expression of APP (Fig. 2C). These experiments also showed that the active allele has at least 50 times as much expression as the silent allele (10). SNPs reporting monoallelic expression of APP were located in multiple introns along most of the length of the gene (Fig. 2D).

Fig. 2.

Monoallelic expression in autosomal genes. (A) Monoallelic expression of APP. Columns, uncloned cell line (H) or one of the independent clones; rows, SNPs, identified by color as in Fig. 1; gray, homozygous genomic DNA. (B) Shows only the SNPs heterozygous in genomic DNA and with cDNA calls. Data from two additional individuals (A and M) are included. (C) Sequencing of RT-PCR products surrounding an intronic SNP, RS1783026 (left). Mass spectrograms of single-base extension reactions on PCR products surrounding the same SNP (Sequenom, right). “Pause” denotes the position of a peak distinct from either allele. (D) Positions of assessed SNPs within APP; homozygous (blue) or heterozygous (red) in individual H. Exons demarcated below the line in black. (E) Relative levels of APP expression in monoallelic and biallelic clonal lines as measured by real-time RT-PCR, normalized to TFRC transcription; an average of triplicate measurement ± SD. (F) A clone derived from primary lung fibroblasts (WI38.4) also showed monoallelic expression of the EBF gene. (See also fig. S7.) Parental genotypes for WI38 line were not available. (G) Confirmation of monoallelic expression in mature EBF mRNA, using intron-spanning primers for amplification; SNP RS1368298 assessed. (H) Monoallelic expression of EBF and DAPK1 in fresh PBMCs. Sequential in situ hybridization with probes for nascent RNA (red) and DNA (green) shows a large fraction of nuclei with transcription from only one allele (n = 50). Nuclei counterstained with 4′,6′-diamidino-2-phenylindol (blue). (I) Mosaic of monoallelic expression in fresh tissue. Small (∼1 mm3) samples, dissected from female placenta, were essentially clonal judging by X chromosome methylation assay (fig. S10). Examples of monoallelic (in lymphoblasts) genes assessed by Nsp 250 K array in placenta are shown. Direct sequencing of cDNA flanking RS1440284 (TEAD1) and RS2208798 (MYO6) confirmed allelic bias in expression (arrows), compared with gDNA and cDNA prepared from a larger sample of the same tissue.

How is the level of expression of a given gene affected by its transcription from one allele or both? One possibility is that the transcriptional activity of each allele is regulated independently, such that having both alleles active leads to higher overall transcript level. The alternative possibility is that the cell maintains a desired level of transcript, whether one or two alleles are transcribed. When assessed by quantitative real-time RT-PCR, clones expressing APP from one allele, either paternal or maternal, had a lower level of expression than the biallelic clones (Fig. 2E). In the case of APP, this is especially pertinent because higher levels of expression and APP gene duplication are associated with early-onset Alzheimer's disease (18). Therefore, by contributing to diversity in levels of expression in individual cells and their clonal descendants, monoallelic expression could play a role in pathogenesis.

Another example of a gene displaying monoallelic expression is the early B cell factor (EBF) gene. Although lymphoblasts are relatively easy to subclone and thus are a natural subject for our analysis, we were also able to analyze a clone from the WI38 primary fibroblast line. This clone also revealed monoallelic expression of EBF and a number of other genes (Fig. 2F, fig. S7, and table S5). Analyses of EBF also included a control experiment showing that, as expected, the mature mRNA and the unspliced RNA display monoallelic expression of the same allele (Fig. 2G and fig. S8).

To verify our results in vivo, we used RNA fluorescence in situ hybridization (RNA-FISH) to detect nascent transcripts in nuclei of monocytes from fresh peripheral blood mononuclear cells (PBMCs). DNA-FISH allowed the localization of the two alleles. We analyzed the EBF gene as well as the death-associated protein kinase 1 (DAPK1) gene (Fig. 2H and fig. S9). For both of these genes, about one-third of the cells revealed monoallelic transcription, consistent with the results from clonal cell lines, indicating that a fraction of clones display monoallelic expression.

Another in vivo experiment analyzed small (1 mm3) patches of tissue from a female placenta. Analyses of large samples showed the expected biallelic expression of X-linked and autosomal genes. Patches displaying complete skewing of X-inactivation (fig. S10) were tentatively considered “clonal,” and RNA from these patches was analyzed with the Affymetrix 500 K SNP platform. Figure 2I shows examples of genes displaying monoallelic expression in small patches of placental tissue. The data obtained from these in vivo “clones” were qualitatively similar to the data from the B lymphoblast clones (fig. S10 and table S6). These data provide another independent, in vivo confirmation of random monoallelic expression.

Having presented a few autosomal examples and control experiments, we now turn to a genome-wide view of monoallelic expression (Fig. 3A). On the Nsp 250 K array, there are SNPs present within 11,401 genes. Overall, we were able to assess allele-specific transcription for 3939 genes in two or more clonal cell lines; to be assessable, a given gene had to have at least one heterozygous SNP that gave a genotype call from cDNA. We devised a metric that assigns a score to each gene, reflecting the extent to which the data support the conclusion that the gene is monoallelically expressed (10).

Fig. 3.

Widespread monoallelic expression on the human autosomes. (A) Of 11,401 genes, 3939 were assessable; a metric G (10) assigned a monoallelic quality score for each gene for each individual. Genes with high (Gmax > 1, dark green) or medium (Gmax = 1, light green) scores are shown and are included in further analyses. Genes with only biallelic expressing clones are in gold. The complete list of assessed genes is in database S1. (B) Distribution along autosomes of randomly monoallelically expressed and biallelic genes we identified. Each marker corresponds to the position of the gene on the NCBI 35 assembly. Brown boxes mark centromeres. (C) Gene-level view of allele-specific expression on chromosome 18 in clonal cell lines from three individuals.

Of the 3939 genes, 2.2% (85) were called as monoallelically expressed with multiple informative SNPs per gene per clone; these genes were grouped together as class I. Eight arbitrarily selected genes from class I were checked by Sequenom genotyping and were confirmed (table S2). An additional 7.3% of assessed genes (286) were called as monoallelically expressed based on a single informative SNP per gene per clone, or their score was reduced for other reasons described in (10); these we group together as class II. Of six genes from class II checked by Sequenom genotyping, five were confirmed (table S2). The genes from class I and II include both B cell–specific genes and ubiquitously expressed genes expressed at typical levels in B cells (Table 1). Although extrapolating to the entire transcriptome is complicated because the arrays we used are biased toward larger genes (as they have more SNPs; see fig. S11), conservative interpretation of our data still suggests that more than 1000 human genes are subject to random monoallelic expression.

Table 1.

Partial list of genes subject to random monoallelic expression. Gmax is the highest score for the metric G among the three assessed individuals. A measure of expression level in lymphoblasts is given in Sanger expression: Average raw expression level for lymphoblasts in HapMap set (28). Tissue specific: Ratio of the level of expression of the highest expressing tissue to the median expression for all tissues in GNF Gene Expression Atlas (29). Using the averages for all probes representing a given gene, gene expression was determined to be tissue-specific if the ratio was >4. B cell specific: Ratio of the expression (averaged over all probes for the gene and over all replicate samples) in peripheral blood CD19+ B cells and B721 cells to the median expression of the gene in all tissues. See also table S4 and database S1.

View this table:

For greater than four-fifths of the monoallelic genes, some clonal cell lines displayed biallelic expression. These observations are consistent with prior studies of other monoallelic genes [for example, interleukins 2 and 4 (6, 8), as well as p120 catenin (9)]. Thus, the genes that display only monoallelic expression, such as odorant receptors and immunoglobulins, are the exception rather than the rule.

The genes subject to monoallelic expression were scattered throughout the genome (Fig. 3B). Within a given clonal cell line, the choice of the expressed allele (maternal, paternal, or both) was made independently for each gene (Fig. 3C). This is in contrast to the chromosome-wide coordination observed in X-inactivation; it is also interesting given the chromosome-wide coordination of asynchronous replication observed for autosomes (19, 20). Chromosome-wide coordination of asynchronous replication timing therefore does not lead to chromosome-wide coordination of random monoallelic expression. Thus, individual clones displayed epigenetic heterogeneity, in terms of which autosomal genes were monoallelic and which were biallelic (as well as the parent-of-origin of the expressed allele in the cases of monoallelic expression). Given the large number of genes involved, numerous combinations are potentially generated by monoallelic expression (fig. S12).

The newly identified monoallelically expressed genes encode proteins of widely varying functions and tissue specificities, and some of them have been studied for roles in human disease (Table 1; see also supporting online text). This diversity notwithstanding, a disproportionately large fraction of these monoallelically expressed genes encode cell surface proteins (table S3). For example, examining the Gene Ontology category “transmembrane receptor” using GOstat ( (21), the expected number of monoallelically expressed genes would be 8 (3.0%), although 24 (8.8%) were observed (P = 2 × 10–6 after Benjamini correction for multiple hypothesis testing). The overrepresentation of receptors and other surface proteins suggests a role for monoallelic expression in each given cell's interactions with other nearby cells.

The evolutionary history and species specificity of random monoallelic expression remain to be determined. It is thus intriguing that the genes we identified are more than twice as likely as biallelic genes to be located near presumed regulatory conserved noncoding sequences that were recently suggested to undergo human lineage-specific accelerated evolution (22) (P < 10–6; fig. S13). This observation suggests the possibility of human-lineage–specific random monoallelic expression.

Using a general strategy for genome-wide analysis of monoallelic expression, we showed that ∼5 to 10% of assessed autosomal genes display random monoallelic transcription in human cells with stability in each clonal cell line. Corroborating data were obtained from in vivo experiments examining fresh individual blood cells with RNA-FISH and from allele-specific RT-PCR analyses of small patches of apparently clonally derived tissue from the placenta. RNA-FISH analyses of clonal cell lines biallelic for a given gene revealed biallelic expression in individual cells ruling out a rapid switching mechanism. Further evidence for stability comes from our previous demonstration in mouse clonal cell lines that the choice of a single active allele is highly stable and that biallelic clonal cell lines give rise only to biallelic subclones (9). A number of unusual regulatory mechanisms have been observed for monoallelically expressed genes (20, 23, 24), the most famous of which is immunoglobulin gene DNA rearrangement (25). For the immunoglobulin gene and few other monoallelically expressed genes, allele-specific DNA methylation has been observed (26, 27). Given the heterogeneity in the new genes we have identified, diverse mechanisms likely impact their allele-specific expression. Conservative extrapolation from our data suggests that at least 1000 autosomal human genes are subject to random monoallelic expression. This monoallelic expression can impact on biological function by creating three distinct cell states for each given gene when the two alleles encode functionally different proteins. These states (for each given gene) would be defined by expression of the maternal allele, the paternal allele, or both alleles. Monoallelic expression can also contribute to cellular (or clonal) diversity even without polymorphism, as demonstrated by our observation of higher expression levels in clones expressing both alleles as compared with clones with monoallelic expression. Previous examples wherein monoallelic expression has been shown to be essential for the proper functioning include the immune system's antigen receptors (4) and the olfactory system's odorant receptors (7). The stability of the allele-specific choice in a given clone taken together with clonal expansion to form tissues can lead to macroscopic patches of tissue with subtly different properties, as we have observed in analyses of placental tissue. For any given tissue, the size of patches will be dependent on the time in development that allelic choice is made. Thus, in the brain and other tissues, each individual would be predicted to be a mosaic with respect to allele-specific expression of numerous autosomal genes, providing an epigenetic basis for functional differences amongst individuals.

Supporting Online Material

Materials and Methods

SOM Text

Figs. S1 to S13

Tables S1 to S6

Database S1

References and Notes

Stay Connected to Science

Navigate This Article