Gene Expression Profiles in Normal and Cancer Cells

See allHide authors and affiliations

Science  23 May 1997:
Vol. 276, Issue 5316, pp. 1268-1272
DOI: 10.1126/science.276.5316.1268


As a step toward understanding the complex differences between normal and cancer cells in humans, gene expression patterns were examined in gastrointestinal tumors. More than 300,000 transcripts derived from at least 45,000 different genes were analyzed. Although extensive similarity was noted between the expression profiles, more than 500 transcripts that were expressed at significantly different levels in normal and neoplastic cells were identified. These data provide insight into the extent of expression differences underlying malignancy and reveal genes that may prove useful as diagnostic or prognostic markers.

Much of cancer research over the past 50 years has been devoted to analyses of genes that are expressed differently in tumor cells as compared with their normal counterparts. Although hundreds of studies have pointed out differences in the expression of one or a few genes, no comprehensive study of gene expression in cancer cells has been reported. It is therefore not known how many genes are expressed differentially in tumor versus normal cells, whether the bulk of these differences are cell-autonomous rather than dependent on the tumor microenvironment, and whether most differences are cell type–specific or tumor-specific. Technological advances have made it possible to answer such questions through simultaneous analysis of the expression patterns of thousands of genes (1, 2). In this study, using normal and neoplastic gastrointestinal tissue as a prototype, we analyzed global profiles of gene expression in human cancer cells.

We used the recently developed method called serial analysis of gene expression (SAGE) (2) to identify and quantify a total of 303,706 transcripts derived from human colorectal (CR) epithelium, CR cancers, or pancreatic cancers (Table1) (3). These transcripts represented about 49,000 different genes (4) that ranged in average expression from 1 copy per cell to as many as 5300 copies per cell (5). The number of different transcripts observed in each cell population varied from 14,247 to 20,471. The bulk of the mRNA mass (75%) consisted of transcripts expressed at more than five copies per cell on average (Table 2). In contrast, most transcripts (86%) were expressed at less than five copies per cell, but in aggregate this low-abundance class represented only 25% of the mRNA mass. This distribution was consistently observed among the different samples analyzed and was consistent with previous studies of RNA abundance classes based on RNA-DNA reassociation kinetics (Rot curves) (6). Monte Carlo simulations revealed that our analyses had a 92% probability of detecting a transcript expressed at an average of three copies per cell (7).

Table 1

Overall summary of SAGE analysis.

View this table:
Table 2

Summary of SAGE analysis by abundance classes.

View this table:

Many of the SAGE tags appeared to represent previously undescribed transcripts, as only 54% of the tags matched GenBank entries (Tables 1and 2). Twenty percent of these matching transcripts corresponded to characterized mRNA sequence entries, whereas 80% matched uncharacterized expressed sequence tag (EST) entries. As expected, the likelihood of a tag being present in the databases was related to abundance; GenBank matches were identified for 98% of the transcripts expressed at >500 copies per cell but for only 51% of the transcripts expressed at ≤5 copies per cell. Because the SAGE data provide a quantitative assay of transcript abundance, unaffected by differences in cloning or polymerase chain reaction efficiency, these data provide an independent and relatively unbiased estimate of the current completeness of publicly available EST databases.

Comparison of expression patterns between normal colon epithelium and primary colon cancers revealed that most transcripts were expressed at similar levels (Fig. 1). However, the expression profiles also revealed 289 transcripts that were expressed at significantly different levels [P < 0.01 (8)]; 181 of these 289 were decreased in colon tumors as compared with normal colon tissue (average decrease, 10-fold; examples in Fig.2A). Conversely, 108 transcripts were expressed at higher levels in the colon cancers than in normal colon tissue (average increase, 13-fold; examples in Fig. 2A). Monte Carlo simulations indicated that the analysis would have detected >95% of transcripts expressed at a sixfold or greater level in normal versus tumor cells or vice versa (9). Because relatively stringent criteria were used for defining differences [P < 0.01 (8)], the number of differences reported above is likely to be an underestimate.

Figure 1

Comparison of expression patterns in CR cancers and normal colon epithelium. A semilogarithmic plot reveals 51 tags that were decreased more than 10-fold in primary CR cancer cells (green), whereas 32 tags were increased more than 10-fold (red); 62,168 and 60,878 tags derived from normal colon epithelium and primary CR cancers, respectively, were used for this analysis. The relative expression of each transcript was determined by dividing the number of tags observed in tumor and normal tissue as indicated. To avoid division by 0, we used a tag value of 1 for any tag that was not detectable in one of the samples. We then rounded these ratios to the nearest integer; their distribution is plotted on the abscissa. The number of genes displaying each ratio is plotted on the ordinate. TU, CR tumors; NC, normal colon.

Figure 2

Northern blot analysis of genes differentially expressed in gastrointestinal neoplasia. Northern blot analysis was performed on total RNA (5 μg) isolated from primary CR carcinomas (T) and matching normal colon epithelium (N) or pancreatic carcinomas. The top line of gels in each panel shows ethidium bromide–stained gels before transfer. The number of SAGE tags observed in the original analysis is indicated to the right of each blot. (A) Examples of transcripts that were decreased or increased in CR cancers. (B) Examples of transcripts increased in pancreatic cancers (11). (C) Examples of transcripts increased in cancer that were or were not cancer type–specific. The following probes were used for Northern blot analysis [human SAGE tag identifier, gene product name (GenBank accession number)]: (A) H204104, guanylin (M95714); H259108 (see Tables 3 and 4); H1000193 (see Tables 3 and 4); H998030 (see Tables 3and 4). (B) H294155, RIG-E (U42376); H560056, TIMP-1 (S68252). (C) H802810, EST338411 (W52120); H85882, 1-8D (X57351); H618841, GA733-1 (X13425). An additional 19 examples of Northern blots are available on the Internet at;molgen-g/home.htm.

To determine how many of the 289 differences were independent of the cellular microenvironment of cancers in vivo, we compared SAGE data from CR cancer cell lines with that from primary CR cancer tissues (10). Perhaps surprisingly, 130 of 181 transcripts that were expressed at reduced levels in cancer cells in vivo were also expressed at significantly lower levels in the cell lines (Table3). Likewise, a significant fraction (47 of 108) of the transcripts expressed at increased levels in primary cancers were also expressed at higher levels in the CR cancer cell lines (Table4). Thus, many of the gene expression differences that distinguish normal from tumor cells in vivo persist during in vitro growth. However, despite these similarities, there were also many differences. For example, only 47 of 228 genes expressed at higher levels in CR cancer cell lines were also expressed at high levels in the primary CR cancers.

Table 3

Transcripts decreased in CR cancer. The 20 transcripts displaying the largest decrease in expression in CR cancers (in vivo and in vitro) are listed by fold reduction. The tag sequence represents the 10–base pair SAGE tag, and SAGE UID is the human SAGE tag identifier. Probable GenBank matches are listed and those in boldface were confirmed by Northern blot analysis or by cloning and sequence analysis. Fold changes in expression were calculated as described in Fig. 1. TU, colon tumors; CL, colon cell lines; NC, normal colon. Tables of all 548 differentially expressed genes are available on the Internet at;molgen-g/home.htm.

View this table:
Table 4

Transcripts increased in CR cancer. The 20 transcripts displaying the greatest increase in CR cancers (in vivo and in vitro) are listed by fold induction. Conditions are as described in Table 3.

View this table:

In combination, comparison of the expression pattern of CR cancer cells (in vivo or in vitro) to that of normal colon cells revealed 548 differentially expressed transcripts (Tables 3 and 4). The average difference in expression for these transcripts was 15-fold. Although the ability to detect differences is influenced by the magnitude of the variance, with the power to detect smaller differences being less, 92 transcripts that were less than threefold different were identified among the 548 transcripts. However, the genes exhibiting the greatest differences in expression are likely to be the most biologically important.

To determine whether the changes noted in CR cancers were neoplasia- or cell type–specific, we performed SAGE on mRNA derived from pancreatic cancers. A total of 404 transcripts were expressed at higher levels in pancreatic cancers as compared with normal colon epithelium (examples in Fig. 2B). Most (268) of these transcripts were pancreas-specific (11) (see example in Fig. 2C), although 136 were also expressed at high levels in CR cancers. These 136 transcripts constituted 47% of the 289 transcripts that were increased in CR cancers relative to normal colon tissue and are likely to be related to the neoplastic process rather than to the specific cell type of origin.

One question that arose from these data is the potential heterogeneity of expression between individual tumors. The SAGE data were acquired from two samples of each tissue type (normal colon, primary CR cancer, CR cancer cell line, and so on). To examine the generality of these expression profiles, we arbitrarily selected 27 differentially expressed transcripts and evaluated them in 6 to 12 samples of normal colon and primary cancers by Northern (RNA) blot analysis (12). In general, expression patterns were very reproducible among different samples. Of 10 genes with elevated expression in normal colon relative to CR cancers as determined by SAGE, each was detected in the normal colon samples and was expressed at considerably lower levels in tumors (Fig. 2A). Similarly, most of the genes identified by SAGE as increased in CR or pancreatic cancers were confirmed to be reproducibly expressed in most primary cancers examined by Northern blot analysis (Fig. 2, A and B). It is important to note, however, that there were differences among the cancers, with a few cancers exhibiting particularly large or small amounts of individual transcripts. Such differences in gene expression undoubtedly contribute to the observed heterogeneity in the biological properties of cancers derived from the same organ (13).

What are the identities of the differentially expressed genes? Of the 548 differentially expressed transcripts, 337 were tentatively identified through database comparisons. When tested, most (93%) of these identifications proved to be legitimate (14), as was expected from previous SAGE analyses (2). Although a large number of differentially expressed genes were identified, some simple patterns did emerge. For example, genes that were expressed at higher levels in normal colon epithelium than in CR tumors were often related to differentiation. These genes included fatty acid–binding protein (15), cytokeratin 20 (16), carbonic anhydrase (17), guanylin (18), and uroguanylin (19), which are known to be important for the normal physiology or architecture of colon epithelium (Tables 3 and 4). On the other hand, genes that were increased in CR cancers were often related to the robust growth characteristics that these cells exhibit. For example, gene products associated with protein synthesis, including 48 ribosomal proteins, five elongation factors, and five genes involved in glycolysis were observed to be elevated in both CR and pancreatic cancers as compared with normal colon cells. Although most of the transcripts could not have been predicted to be differentially expressed in cancers, several have previously been shown to be dysregulated in neoplastic cells. The latter included IGFII (20), B23 nucleophosmin (21), the Pi form of glutathione-S-transferase (22), and several ribosomal proteins (23), all of which were increased in cancer cells, as previously reported. Likewise, Dra (24) and gelsolin (25) were decreased in cancer cells, as previously reported. Surprisingly, two widely studied oncogenes, c-fos and c- erbb3 , were expressed at much higher levels in normal colon epithelium than in CR cancers, in contrast to their up-regulation in transformed cells (26).

These data provide basic information necessary for understanding the gene expression differences that underlie cancer phenotypes. They also provide a necessary framework for interpreting the significance of individual differentially expressed genes. Although this study demonstrated that a large number of such differences exist (about 500 at the depth of analysis used), it was equally remarkable that the fraction of transcripts exhibiting significant differences was relatively small, representing 1.5% of the transcripts detected in any given cell type (27). The fact that many, but not all, of the differences were preserved during in vitro culture demonstrates the utility of cultured lines for examination of some aspects of gene expression but also provides a note of caution about relying on such lines to perfectly mimic tumors in their natural environment. Finally, the finding that hundreds of specific genes are expressed at different levels in CR cancers, and that some of these are also expressed differentially in pancreatic cancers, provides a wealth of reagents for future biologic and diagnostic experimentation.

  • * These authors contributed equally to this work.


Stay Connected to Science

Navigate This Article