Single-Cell Gene Expression Profiling

See allHide authors and affiliations

Science  02 Aug 2002:
Vol. 297, Issue 5582, pp. 836-840
DOI: 10.1126/science.1072241


A key goal of biology is to relate the expression of specific genes to a particular cellular phenotype. However, current assays for gene expression destroy the structural context. By combining advances in computational fluorescence microscopy with multiplex probe design, we devised technology in which the expression of many genes can be visualized simultaneously inside single cells with high spatial and temporal resolution. Analysis of 11 genes in serum-stimulated cultured cells revealed unique patterns of gene expression within individual cells. Using the nucleus as the substrate for parallel gene analysis, we provide a platform for the fusion of genomics and cell biology: “cellular genomics.”

The first step in the translation of genomic sequence into physiology or pathophysiology is transcription. Transcriptional regulation has been studied almost exclusively on nucleic acids extracted from cultured cells or tissues by Northern blot (1), differential display (2), serial analysis of gene expression (SAGE) (3), or forms of microarray (4, 5). Here, we describe a complementary approach that monitors mRNA synthesis by visualizing specific sites of transcription (6, 7). The use of transcription sites for expression profiling allows analysis of coordinated transcription events and organization of gene expression. To achieve sensitive and specific detection of RNAs by fluorescence in situ hybridization (FISH), we used oligomer DNA probes that were each tagged with a single fluorophore at multiple sites (8). To detect many mRNAs simultaneously, we used combinations of these probes labeled with spectrally distinct colors (Fig. 1A) (9). A combinatorial approach to labeling probes (Fig. 1B) provided a large number of virtual “colors” for distinguishing many transcripts (supporting online text and table S1). Spectral “barcodes” with a minimum of two distinct fluorophores were used to increase specificity. Schemes based on color combinations have been applied to detection of entire chromosomes for cytogenetics (10,11) and for analysis of subchromosomal regions of DNA (12). However, the targets in these genomic assays are orders of magnitude larger than transcripts and do not contain functional information.

Figure 1

Conceptual diagram of barcoding. (A) Schematic representation of the use of color groups to encode unique gene identities. Combinations shown are only two colors chosen from a total of four used throughout our experiments. A minimum of two colors is used to reduce the chance of single-color false positive signals. Each detected site is therefore representative of two or more independent hybridization events that are spatially colocalized, thus increasing fidelity. (B) Schema of the actual probe hybridizations at the transcription site leading to detection by barcode. The gene locus is represented (green) with several polymerases (blue) transcribing nascent messages (pink). A shotgun approach is used such that each specific probe may be of any of the colors in the barcode (red, yellow, magenta). The transcription site is shown below with hybridization data against the nuclear background (blue). Scale bar, 3 μm. (C) An example of the signal readout interpreted by the transcription detection algorithm, showing the pixel intensities for the area of the transcription sites in all five color bandwidths. In this example, there is signal for each of the probe components used (red, yellow, and magenta) and only background levels for the color not in the barcode (purple).

Cell culture and preparation, fluorescence microscopy, and image acquisition were performed and images were subjected to computational analysis to detect sites of transcription (9). Transcription sites have a discernible three-dimensional volume and shape, as well as the highest mRNA signal due to the presence of multiple nascent transcripts at the location of the transcription unit in chromatin (Fig. 1B). The color combination, or spectral barcode, defined transcript identity (Fig. 1C). Colocalization of multiple colors in the image represented independent hybridization events that signified the presence of a transcription site. Signals were analyzed for individual cells on the basis of contiguity of the 4′,6′-diamidino-2-phenylindole (DAPI) counterstain (9). We identified 10 sites of transcription simultaneously in starved and serum-stimulated human colon adenocarcinoma (DLD-1) cells (13) (Fig. 2, A to C). Even accessibility to probe was assured by presence of transcription sites in more than 97% of cells (176/181). Sites of the most prevalent gene, γ-actin, occurred in 80% of nuclei (144/181). Before the mix of probes for 10 genes was made, each gene was first hybridized individually to ensure that measured expression was independent of multiplexing (14).

Figure 2

Simultaneous detection of many genes. (A) A single human colon adenocarcinoma cell (DLD-1) G2 nucleus (by DAPI signal) with a pseudo-colored representation of 17 transcription sites detected in situ. The image is “flattened” such that all 12 0.5-μm Z-sections are displayed on the background, which is the DAPI counterstain from the middle image of the stack. Gene identity is denoted by color and the Z-location is recorded by the adjoining number. Lower numbers represent closer proximity to the cover slip. Scale bar, 3 μm. (B) Three G1 DLD-1 nuclei from the same field, which together express all 10 genes assayed. Arrows indicate sites that are shown below magnified from the original data. From left to right, the 10 marked transcription sites are IL-8, MCL-1, DUSP-1, cyclin D1, γ-actin, EGR-1, TIEG-1, β-actin, c-myc, and c-jun. Scale bar, 3 μm. (C) Chart of the 10 genes detected in (B). The “Pseudo” column shows the arbitrary pseudo-color used to denote the gene identity of each transcription site in the renderings above. Columns at right show the actual signal recorded at the appropriate Z-section for the transcription sites shown with an arrow in (B). Each band of data of fluorescein isothiocyanate (FITC), Cy3, Cy3.5, and Cy5 is shown, with the positively scored signals highlighted by surrounding boxes. Each area of the unprocessed image shown is 1 μm2. Observed misalignment is due to chromatic shift between filter sets.

Assaying expression by transcription site visualization preserves population heterogeneity. For the DLD-1 cell line, the number of transcription sites per cell was not uniform throughout the population. There was a strong correlation with nuclear cross-sectional area and total DNA signal, measured as fluorescence intensity of DAPI counterstain (Fig. 3A) (r 2 value, 0.56). One additional transcription site was observed for each 11.9 μm2 of nuclear area [95% confidence interval (CI), 10.5 to 13.7 μm2]. In contrast, starved and stimulated normal fibroblasts did not show a correlation of sites and size. Analysis of 162 fibroblast nuclei fit a linear correlative model quite poorly (r 2 value, 0.0025). Starvation of the cancer cells did not completely abolish increased numbers of transcription sites associated with increased DNA content, as it did in normal fibroblasts. Although levels of transcription were far reduced in starved DLD-1 and cell cycle state was grossly synchronized, differences among cells were detectable in situ. The ability to assay this heterogeneity is a clear advantage in the evaluation of mixed tissue samples, such as is common in clinical pathology.

Figure 3

Population statistics from single cells. (A) Scatterplot showing the correlation between nuclear cross-sectional area and number of detected transcription sites in 181 DLD-1 nuclei. The linear trend shows that for each additional 11.9 μm2 of nuclear area, another transcription site is detected (r 2 value, 0.56). (B) Histogram of the distribution of genes and alleles in a population of expressing cells. For 10 genes assayed, the highest occurrences were five genes and six alleles. Cells show considerable heterogeneity in their gene patterns. (C) High and low levels of transcription activity as measured by transcription site detection. The γ-actin, TIEG, and EGR-1 genes showed a similar pattern of activation, with about 35% of cells with one site and 35% of cells with both alleles. In contrast, the less robustly transcribed species c-myc, cyclin D1, and MCL-1 showed lower levels, with roughly 15% of cells containing a single site and fewer than 5% with two.

Population profiles were determined for cells that were previously subjected to microarray and other molecular analyses (15, 16). This provided a direct comparison of data based on extracted RNA with single-cell expression levels, a process we termed “FISH & Chips.” Thirty min after serum stimulation, the average DLD-1 nucleus had detectable expression of five individual genes and roughly six to seven transcription sites, including both alleles of some genes (Fig. 3B). Genes with frequently occurring sites often had two active alleles, with an approximate 1:1 ratio of two-allele to one-allele nuclei. Genes with lower levels usually had only a single allele expressed (Fig. 3C). This showed that variable expression levels can be visualized in situ and scored for intensity, number of alleles, or dosage. Possible applications include monitoring allele silencing and determining the ploidy of heterogeneous samples.

The ability to collect a binary “snapshot” of genes that are “on” or “off” at a single moment in time for many nuclei offered new insights into transcript regulation. Of the 10 mRNAs analyzed, several pairs of genes were coexpressed with higher probability than would be predicted by random association (table S2). Relative to β-actin–negative cells, nuclei positive for β-actin expression were 5.2 times as likely to express γ-actin (99% CI, 1.6 to 16.7) and 3.2 times as likely to express EGR-1 (99% CI, 1.3 to 7.7). Remarkably, all three of these assayed genes contain one or more sequences to which serum response factor binds (17–20). This implies that genes with similar promoter elements exhibit correlated activation at the single-cell level. Such activity could result from nonhomogeneous distribution of a transcription factor in certain cells within the population or increased accessibility of similarly regulated genes to their activating factors. Except for the genes just described, cell expression profiles were variable, without a consistent pattern of inclusion and exclusion. If early serum-activated transcription resulted from a distinctly ordered pathway, subsets of cells would have been in different stages of the response. This was not the case. Pairwise analyses of transcription sites showed no distinct mutual exclusions or anticorrelations (the 99% CI for the odds ratio of each of the 45 measured gene pairs exceeded 1). Hierarchical log-linear models showed no higher order associations; cells were positive for different sets of transcripts. At the single-cell level, physiological and random variation, possibly based on the level or activity of a transcription factor, may cause the serum-induced genes to be expressed at different times.

To assess timed expression activation, we analyzed the transcription of 11 serum-responsive genes over 14 time points in normal human fibroblasts (21). We profiled 2199 nuclei that contained 10,134 transcription sites. For each time point, 55 gene-gene comparisons were made to explore the possibility of ordering of transcription timing (a total of 770 pairwise comparisons). As in the DLD-1 experiments, there were no anticorrelated or mutually excluding gene pairs, indicating that a linear pathway of gene induction is highly unlikely among the early response genes.

All the induced genes showed similar transcription site activation and deactivation kinetics over a 90-min period (Fig. 4). Several transcripts we visualized easily showed no changes by microarray [less than a factor of 2.2, the detection limit of the technology (16)]. These include β-actin (Fig. 4A), γ-actin, and c-jun—all shown to be serum-responsive genes (22). Incremental changes may not be detectable using total mRNA. However, with transcription site analysis, small changes in expression could be observed, irrespective of the total abundance of the RNA. Therefore, microarray and in situ assays yield complementary information. A metaphor for the difference is that transcription site profiling measures the transcriptional “thermostat” and microarrays view the “ambient temperature” of whole-cell RNA levels.

Figure 4

Transcriptional response of fibroblasts to serum. (A) The kinetics of activation of transcription for the 11 genes assayed was relatively similar, in contrast with results obtained through microarray analysis. Here, two serum-responsive genes, β-actin and c-fos, are shown to be similar, yet only c-fos activation was detected on microarray (16). (B) Increases in visualized transcription sites 10 min after serum induction, relative to starved controls. (C) Activation curve for all assayed genes over a time course of serum induction, showing how allele and gene prevalence change as the cascade progresses. In (A) and (C), extrapolated time points are shown in brackets.

All genes showed measurable activation in the population of cells at 5 or 10 min after induction, relative to starved, unstimulated cells (Fig. 4B). Over the first 10 min of induction, one allele was activated. The average cell assayed had 3.2 different genes activated with a total of only 4.0 transcription sites; single allele expression outnumbered the expression of both alleles by a ratio of 3 to 1. By 40 min of induction, the distribution was even: Cells averaged 3.4 genes with a single allele and 2.8 with both alleles active, a total of 9.0 sites per nucleus. From 60 min to 90 min, the number of sites dropped to 2.9. By 120 min, nearly all sites had returned to baseline prevalence (Fig. 4C). This time course fit well with the previous results from a single time point in DLD-1 cells. At 30 min of stimulation, DLD-1 cells showed an average of 6.5 genes and fibroblasts showed 6.1 genes. This implies that serum-responsive transcription follows similar kinetics in different cell types. Differences were also noted, as DLD-1 cells expressed considerably more β-actin (1.2 sites per cell versus 0.33) and somewhat more γ-actin (0.69 sites per cell versus 0.45) than fibroblasts.

Lags were apparent between the kinetics of transcription site activations and their effect on cellular levels of transcripts as assessed by microarray (14). This is a function of transcription rate, message stability, and abundance level at the start of induction. Previous analyses measured a combination of these factors, whereas FISH detects only new transcription. Levels of cyclin D1 transcripts reportedly increase later than the early immediate response genes, and it was thus classified as a delayed immediate response gene (23). In our studies, cyclin D1 was just as early as other genes with maximal activation at 50 min after stimulation, as compared to the 11-gene average of 46 min. C-myc showed fairly low levels of induction on microarray (16), although we have observed frequent c-myc sites (14). Levels of cmyc are therefore likely to be controlled posttranscriptionally.

These studies demonstrate the power of cellular transcriptional profiling. The instantaneous transcriptional activity of genes in single cells allows observation of causes and effects of expression. Eventually, the physiological state of cells within tissues will become synonymous with a pattern of gene expression. This will provide a quantitative approach to factors influencing gene expression patterns, such as occur in cytopathology, development and cell differentiation, infectious disease, or response to drug treatment. Investigation of functional genomics may now be approached at the cellular level. We expect that the enormous information inherent to the expression of many genes in large cell populations will aid the understanding of relationships among genes in single nuclei and their cooperative and cumulative roles in physiology and disease.

Supporting Online Material

Materials and Methods

SOM Text

References and Notes

Tables S1 and S2

  • * To whom correspondence should be addressed. E-mail: rhsinger{at}


View Abstract

Navigate This Article