Genomic Analysis of Gene Expression in C. elegans

See allHide authors and affiliations

Science  27 Oct 2000:
Vol. 290, Issue 5492, pp. 809-812
DOI: 10.1126/science.290.5492.809


Until now, genome-wide transcriptional profiling has been limited to single-cell organisms. The nematodeCaenorhabditis elegans is a well-characterized metazoan in which the expression of all genes can be monitored by oligonucleotide arrays. We used such arrays to quantitate the expression of C. elegans genes throughout the development of this organism. The results provide an estimate of the number of expressed genes in the nematode, reveal relations between gene function and gene expression that can guide analysis of uncharacterized worm genes, and demonstrate a shift in expression from evolutionarily conserved genes to worm-specific genes over the course of development.

The nematode C. elegansis a genetically accessible model organism that is widely used to study genetics, development, and other biological processes (1,2). In 1998, the genome of this organism was completely sequenced, and the presence of 19,099 genes, or open reading frames (ORFs), was predicted. This made it possible to use oligonucleotide arrays for genome-wide gene expression monitoring in this metazoan (3,4). We designed three oligonucleotide arrays (denoted A, B, and C) to monitor the mRNA expression levels of 18,791 (98%) of the predicted worm ORFs; the remaining ORFs were not included because they were almost identical to one or more of the selected ORFs (5). To maximize the number of detected transcripts, we quantitated nematode gene expression in six developmentally staged populations from eggs to adult worms, in isolated oocytes, and in aged worms near the end of their ∼2-week life-span (6). We anticipated that the resulting data set would provide insight into gene expression in the nematode and also serve as a baseline for further experiments in C. elegans.

The number of ORFs called “present” by the Affymetrix GeneChip software (7) in any readout across all RNA samples is summarized in Table 1. In total, 10,747 ORFs (56%) were detected in at least one hybridization. This number of detected genes is comparable to the ∼10,000 independent genes represented by the current set of C. elegans expressed sequence tags (ESTs) (8). We detected most ORFs with ESTs, and most of the ORFs that we did not detect are likely transcripts in very low abundance. For example, we detected 78% of the ORFs on the A array, which had sequence homology to C. elegans cDNAs. In contrast, 90% of the ORFs that we failed to detect are represented by at most a single cDNA in the C. elegans database ACEDB (version WS6).

Table 1

Summary of the number of ORFs detected by array experiments. An ORF was scored as detected if it was called “present” (7) one or more times across all samples. The number of distinct ORFs detected in each worm sample for each array design (A, B, or C) is tabulated. A total of 10,747 distinct genes was detected. The sensitivity of arrays is indicated in ppm (e.g., 10 ppm = 1:100,000).

View this table:

Our ability to detect genes as expressed is dependent on many factors, but the most important two are the sensitivity of the oligonucleotide chip and the relative abundance of the transcripts. To estimate the sensitivity of the oligonucleotide arrays, we included in vitro synthesized transcripts in each hybridization (Table 1) (9). As determined by the signal response from these control transcripts, the sensitivity of detection of the arrays ranged between ∼1:300,000 and 1:50,000. By way of example, we estimate that embryos contain ∼107 transcripts (10); thus, a sensitivity level of 1:300,000 corresponds to detecting a gene expressed at an average of 30 transcripts per embryo. To be reliably detected, transiently expressed genes would need to be expressed at higher levels. Furthermore, as the animal grows, rare transcripts, particularly those expressed in only a few or single cells, would be further diluted by ubiquitous and abundant transcripts. Indeed, preliminary data from isolated gonads indicates the presence of many gonad- and germ line–specific messages that were at low or undetectable levels in whole worms (11). Thus, many of the ∼8400 ORFs that were not detected in any experiment are probably genes that are expressed at levels below our limits of detection. Consistent with this, in a separate series of experiments that monitored expression in worms grown under stressful conditions (which should induce gene expression), we detected 611 additional transcripts that were not detected during the unperturbed life cycle (11).

By applying a one-way analysis of variance (ANOVA) to our replicated measurements, we isolated the subset of transcripts that had a significant (P < 1 × 10−3) increase or decrease in frequency (12) at some point during the life cycle and that were called “present” at least once; 4221 (22%) of the ORFs met these criteria. The expression profile of each of these developmentally modulated genes was normalized to have a mean value of zero and a variance of one, and the normalized profiles were clustered by means of a self-organizing map (SOM) (13). Examples of selected clusters are shown in Fig. 1[see supplemental information for a complete listing of all cluster and frequency data (14)]. A number of previously characterized genes had life-cycle expression profiles that matched our expectations. For example, K07H8.6 (vit-6) (cluster A) is a vitellogenin, known to be expressed abundantly in the intestine of the late larval/adult hermaphrodite C. elegans(2). F30B5.1 (dpy-13) (cluster B) is a cuticular collagen, expressed in waves during the four larval stages as cuticle is synthesized and in lower levels before and after the larval stages (15).

Figure 1

Selected SOM clusters of expression profiles for developmentally modulated genes. A SOM was used to partition a total of 4221 genes that were significantly modulated (ANOVA, P < 1 × 10−3) over the worm life cycle into 36 clusters with similar normalized expression profiles. Four representative clusters are shown. The heading of each panel indicates the index of the cluster in the full SOM (e.g., 2, 1), followed by the number of genes in the cluster. Because all profiles have been normalized to the same amplitude, there are no units indicated on the y axes. Complete gene listings for each cluster are available as supplemental information (14). (A) This cluster of 55 genes, with peak expression in the egg-laying young adult, included the vitellogenin vit-6. (B) This cluster of 104 genes included the cuticular collagen dpy-13. The cluster was significantly enriched in several WormPD classes, as described in the text. (C) This cluster of 80 genes was enriched in genes related to embryogenesis, germ line development or function, and cell-cycle progression, as described in the text. (D) This cluster of 215 genes was enriched in known or putative transcription factors, as described in the text.

To interpret the expression profile clusters, we matched the gene classifications in the Proteome WormPD database (16) to the genes in each cluster. Examination of the gene clusters indicated several expression profiles that were readily interpretable in terms of worm biology. Cluster B contained 104 genes whose expression increased through development to a peak at 60 hours, then declined in 2-week-old worms. This cluster was enriched in markers of metabolic activity, including oxidoreductases (7-fold enrichment as compared to the genome as a whole), amino acid metabolism genes (17-fold enrichment), carbohydrate metabolism genes (15-fold enrichment), and protein synthesis genes (7-fold enrichment). Cluster C contained 80 genes that were up-regulated in eggs and then again later in the reproductively active, egg-laying adult worm. This cluster is enriched in a small set of genes that we classified as being linked to embryogenesis, germ line development or function, and cell-cycle progression. Examples of these genes include F38E1.7 [mom-2, required for polarization of the EMS cell (17)], C08B11.1 [zyg-11, required for zygote formation (18)], and Y39A1A.12 (putative origin recognition complex subunit).

Other patterns of gene distribution among the clusters were observed, at least one of which emphasizes the influence of assay limitations on the clusters and the need to examine the raw data (19) when interpreting clustered profiles. For example, cluster D is enriched three to four times in known or putative transcription factors and in rare messages [212 out of 263 transcripts in this cluster had frequencies that never exceeded 30 parts per million (ppm);P < 0.001 by hypergeometric statistics]. This cluster exhibited the highest expression in the egg and exhibited low or undetectable expression at later times. However, many of these genes were expressed in the egg just slightly above our limit of detection and sank below our sensitivity at later developmental stages. Thus, the prevalence of transcription factors in this cluster is to some extent driven by the limits of our ability to detect rare messages in older animals, as opposed to the biology of specific transcription factors. This finding appears to have a parallel in reports that many messages that are found in embryonic sea urchin cDNA libraries are no longer detectable in adults (20).

Directed searches for genes that were significantly down-regulated in 2-week-old worms also revealed functionally related transcripts. These include the muscle-related genes T22E5.5 (gene,mup-2; protein, troponin-T; 15-fold down-regulated at 2 weeks as compared to its mean level between 0 and 60 hours), M03F4.2 (act-4; actin; 17-fold down-regulated), and F07A5.7 (unc-15; paramyosin; 13-fold down-regulated), as well as Y57G11C.12 (similar to ubiquinone oxidoreductase subunit; fivefold down-regulated) and C44B12.2 (ost-1; osteonectin; 22-fold down-regulated). Together, these results suggest impaired muscle function, reduced metabolic activity, and extracellular matrix defects in aged worms, consistent with the aged worm phenotype.

Recent full-genome sequence comparisons (21, 22) have revealed widespread similarities and important differences between the yeast, fly, and worm genomes. Such sequence comparisons provide us with a glimpse of the evolutionary process and are a useful source of information about the function of uncharacterized genes. However, these comparisons do not address the issue of how yeast, fly, and worm compare at the level of expressed transcripts. We formulated a simple “phylogenetic” model for gene expression in the developing worm. In this model, we partitioned nematode genes into three classes on the basis of sequence similarity: “core” genes (shared among yeast, worm, and fly), “animal” genes (shared between worm and fly), and “worm” genes (unique to the worm) (23). We hypothesized that, during development, the expression of core genes would remain relatively high and constant, reflecting a primary role of these genes in common cellular processes. In contrast, we theorized that animal and worm genes would make up a smaller fraction of the transcriptome and be highly developmentally modulated, reflecting their probable role in defining multicellular processes and worm-specific development.

In agreement with this model, we found that core genes were more likely to be detected and more likely to be highly expressed than animal or worm genes (Table 2). Approximately 80% of core genes were detected, as opposed to only 67% of animal genes and 47% of worm genes. Similar observations have been made from an analysis of EST databases (5). However, the quantitative nature of our data coupled with the temporal information allows a deeper analysis. For example, 4% of core genes were detected at frequencies greater than 30 ppm at each developmental time point, but less than 1% of animal or worm genes were detected (P < 1 × 10−3, χ2test). Similarly, in terms of the fraction of total transcripts, core genes accounted for most of the transcripts among nondevelopmentally modulated, highly expressed genes (Fig. 2A). In contrast, worm genes accounted for a larger proportion of transcripts among the developmentally modulated genes, and this proportion rose during the course of development (0 to 60 hours), whereas the fraction of expression due to core genes concurrently dropped (Fig. 2B). To determine if the increase in worm-specific expression was driven by a small number of abundant, structural transcripts, we examined the likelihood of worm, animal, and core genes being called “present” at each developmental time and found the same trend, with slightly reduced amplitude. This suggests that, although abundant genes contribute importantly to the trends inFig. 2B, lower abundance messages also contribute substantially to the increase in worm-specific expression. Thus, the trends in expression during nematode development are consistent with a simple model that envisages the multicellular organism in terms of an ancient cellular core that is organized and regulated by newer genes that evolved from that core. Although this model is too simplistic to be predictive of individual gene functions, it is striking that groups of genes with similar histories are expressed in accord with such a framework.

Figure 2

(A) The fraction of total gene expression among 728 highly expressed (>30 ppm) nonmodulated genes related to the inferred ancestries of the genes. These genes were defined as those that were called “present” 12 or more times and did not change significantly across the worm life cycle (ANOVA, P > 1 × 10−2). Lines indicate the fraction of total gene expression due to core, animal, and worm genes. Error bars are ±1 SD, based on the variation of subsets of replicated data. Fractions do not add to one because not all genes were classified as core, animal, or worm. (B) Same plot for 4221 modulated genes, defined as those that were detected and changed significantly during the worm life cycle (ANOVA, P < 1 × 10−3).

Table 2

The number of core, animal, and worm genes among detected genes and genes with either high or low expression levels at all developmental time points. Core genes were more likely than animal or worm genes to be detected and were also more likely to be expressed at levels above 30 ppm. Percentages in parentheses are fractions of the total number of genes in each class (core, animal, or worm).f, frequency.

View this table:
  • * To whom correspondence should be addressed. E-mail: ahill{at} (A.A.H.) and hunter{at}

  • Present address: Millennium Predictive Medicine, 700 One Kendall Square, Third Floor, Cambridge, MA 02139, USA.


View Abstract

Stay Connected to Science

Navigate This Article