Report

Gene Expression During the Life Cycle of Drosophila melanogaster

See allHide authors and affiliations

Science  27 Sep 2002:
Vol. 297, Issue 5590, pp. 2270-2275
DOI: 10.1126/science.1072152

This article has a correction. Please see:

Abstract

Molecular genetic studies of Drosophila melanogaster have led to profound advances in understanding the regulation of development. Here we report gene expression patterns for nearly one-third of all Drosophila genes during a complete time course of development. Mutations that eliminate eye or germline tissue were used to further analyze tissue-specific gene expression programs. These studies define major characteristics of the transcriptional programs that underlie the life cycle, compare development in males and females, and show that large-scale gene expression data collected from whole animals can be used to identify genes expressed in particular tissues and organs or genes involved in specific biological and biochemical processes.

Molecular studies of development in multicellular organisms have gone through two major phases during the past three decades. Initially, solution hybridization studies quantitated transcript abundance and showed that large-scale changes in gene expression accompany development (1). In Drosophila, such studies suggested that 5000 to 7000 different polyadenylated RNA species are produced at each stage of the life cycle and that the composition of this set of RNAs shifted during development (1). These analyses gave an overview of genome activity during development, but they could not follow the expression of individual genes or reveal their identities. Later, when it became possible to clone individual genes (2, 3), RNA blots and in situ hybridization revealed when and where individual genes were active. This second phase of analysis allowed an initial determination of the links between molecules and developmental functions. This gene-by-gene approach has dominated developmental biology for the past two decades.

DNA microarrays extend the single-gene approach to the genome level by measuring the transcript levels of thousands of genes simultaneously (4–6). Here we present the transcriptional profiles for about one-third of all predictedDrosophila genes (7) throughout the life cycle, from fertilization to aging adults. cDNA microarrays were used to analyze the RNA expression levels of 4028 genes in wild-type flies examined during 66 sequential time periods beginning at fertilization and spanning the embryonic, larval, and pupal periods and the first 30 days of adulthood, when males and females were sampled separately (Fig. 1A). Early embryos change rapidly, so overlapping 1-hour periods were sampled; adults were sampled at multiday intervals (Fig. 1A) (8). We compared each experimental sample to a common reference sample made from pooled mRNA representing all stages of the life cycle, allowing us to measure each transcript's relative abundance (8). We refer to this relative abundance at each time as a gene's transcript or expression level, and to each gene's overall pattern of expression during development as its transcript or expression profile.

Figure 1

Patterns of gene expression during development. (A) Whole-animal collections were made for embryos (E), larvae (L), pupae (P), and adults (A). Black bars indicate the periods of development that were sampled (8); for all stages, independent samples were collected in duplicate. (B) Gene expression profiles ordered by onset of their first increase in transcript abundance (8). Data for 3219 genes that change expression by more than fourfold during development (P< 0.001, ANOVA) are shown. Each row represents data for one gene, and each column is a developmental time point, as indicated in (A). Expression level relative to the reference sample is indicated with color; blue indicates low levels and yellow indicates high levels. (C) Cumulative fraction of genes that exhibited a strong increase in transcript level over time. (D) Examples of common gene expression patterns. CG5958 (top left) shows induction in early embryogenesis and is maintained. CG1733 (bottom left) has a short peak of intense expression and is not expressed at other points in development. Amalgam (top right) is expressed in early embryogenesis and at the larval/pupal transition, whereas the late reinduced gene CG17814 (bottom right) shows a bimodal pattern in late embryo and late pupa. (E) Postembryonic reinduction of genes initially expressed in early and late embryos. Only the second, postembryonic onset of expression is shown. Genes with initial onset of expression in the first 3 hours after fertilization (0 to 3 hours, blue) are often reexpressed in early pupae (blue bracket), and genes with expression onset in the late embryo (9 to 19 hours after fertilization, purple) are often reexpressed in late pupae (purple bracket). (F) Hierarchical clustering of developmental time points on the basis of their pattern of somatic gene expression. Time points with highly correlated gene expression patterns are grouped adjacently. Embryo expression pattern group with those of pupae, and larvae expression patterns group with those of adults. Adulttudor (At), adult males (Am), adult females (Af), embryonic/larval transition (E/L), larval/pupal transition (L/P).

Expression of most genes assayed (3483 out of 4028, 86%) changed significantly [P < 0.001, analysis of variance (ANOVA)] during the 40-day period surveyed (8). Of these, 3219 genes exhibited at least a fourfold difference between their highest and lowest levels of expression (Fig. 1B and table S1). The vast majority of these developmentally modulated genes (>88%) are expressed during the first 20 hours of development, before the end of embryogenesis (Fig. 1, B and C). To identify patterns of gene reexpression during development, we applied a peak-finding algorithm (8) to each gene's expression profile. We found that 36.3% of the genes (1169 genes) showed a single major peak of expression (Fig. 1D, left panels), whereas 40.3% (1298) showed two peaks (Fig. 1D, right panels) and 23.4% (752) showed three or more peaks (fig. S1 and tables S2 to S6).

Many genes are expressed in two waves during development, with embryonic expression patterns recapitulated in pupae and larval patterns recapitulated in adults. Genes with a first peak in their transcript level at the beginning (0 to 2.5 hours) of embryogenesis commonly have their second peaks during the larva-pupa transition, whereas genes with a first peak of expression at the end of embryogenesis (10 to 21 hours) commonly have their second peak during the late pupal period (Fig. 1E). When overall similarities in somatic gene expression between different developmental stages were compared by hierarchical clustering (8, 9), expression patterns during embryonic stages were most similar to those of pupal time periods, and expression patterns during larval time periods were most similar to those of adult (Fig. 1F). Thus, despite morphological differences between developmental stages, disparate life stages share molecular commonalities.

We analyzed changes in gene expression during each major stage of development. The transcript levels of 2103 genes changed significantly (P < 0.001, ANOVA) during embryogenesis (table S7). A total of 445 genes changed during larval life (table S8), 646 during the pupal stage (table S9), and 118 during adult life (table S10) (8). The transcript levels of only 16 genes changed significantly (P < 0.001, ANOVA) between 5- and 30-day-old adults (table S11) (8). The transcript levels of hundreds of genes changed at least fourfold during five developmental periods that correspond to major morphological changes (the beginning, middle, and end of embryogenesis; the larval-pupal transition; and the end of the pupal period) (Fig. 2, A and D). Transcript levels changed much less during “morphologically quiescent” periods of early larval and adult life.

Figure 2

Stage-specific changes in gene expression. (A) Patterns of stage-specific transcript level decline. Each bar represents the number of genes decreasing by more than fourfold within the four following time points when compared to their average in the previous two time points. Red bars correspond to the developmental interval shown in (B). Dark gray bars indicate intervals spanning major developmental transitions between stages. (B) Early expression profiles of the 322 maternal genes that decrease expression by more than threefold during the first 0 to 6.5 hours of embryonic development, arranged by one-dimensional SOM analysis (table S16). (C) Full expression profiles of the 27 strictly maternal genes identified using criteria optimized on a training set of known maternal genes and with a SOM analysis (8) (table S12). Selected genes are highlighted: swallow (blue),fs(1)Ya (pink), cyclinJ (green), and CG18543 (black), which has the most dramatic reduction in expression. (D) Patterns of stage-specific transcript level increase. Analysis as in (A), showing the number of genes induced above a fourfold threshold. Red bars correspond to the developmental interval shown in (E). (E) One-dimensional SOM analysis of 534 genes induced over 0 to 6.5 hours of embryonic development (table S18). (F) Early expression profiles of 21 transiently expressed zygotic genes identified using criteria optimized on a training set of known maternal and zygotic genes and by a SOM analysis (8) (table S20). Previously identified genes includedblastoderm-specific gene 25D (red), CG9506 [slam, a gene required for polarized membrane growth during cellularization (31); blue], and Sep5, which encodes a septin-like protein (green). Among the 18 newly identified genes in this class is a CG15634 (black), which displayed the most rapid induction and the highest levels of blastoderm-specific expression.

In the first hours of embryonic development between fertilization and gastrulation, gene expression is highly dynamic. Two broad categories of transcripts are present at this time: those deposited into the egg during oogenesis (produced by maternal genes) and those that are expressed only after fertilization (produced by zygotic genes). The expression profiles of 1212 genes were similar to those of known maternal genes (8), indicating that at least 30% of the transcripts analyzed (1212 of 4028) are maternally deposited (tables S12 to S17). Although many maternal transcripts persisted during embryogenesis, 322 (27%) of the 1212 maternal gene transcripts decreased by at least threefold (Fig. 2B), and 36 (3%) decreased by 10-fold or more during the 6.5 hours after egg deposition (fig. S2) (8). A self-organizing map (SOM) algorithm (10), applied to the data from all 1212 maternally deposited genes, identified a cluster of 27 “strictly” maternal genes. Transcripts from almost all 27 of these genes were degraded after fertilization and were not subsequently expressed at high levels until they appeared in the female germ line during oogenesis (Fig. 2C). Of these, 5 were previously known “strictly” maternal genes and 22 were new (table S12).

Early zygotic genes were identified in a similar manner. A total of 534 genes have expression profiles similar to those of known early zygotic genes (Fig. 2E; tables S18 to S22 for zygotic gene lists) (8). Among these genes, 53 increased expression by at least 10-fold in the first 6.5 hours of development, 26 of which were previously characterized (fig. S2). Sixteen of these 26 genes are known to play critical roles in embryonic development and patterning. These include eight transcription factor genes (invected,oddpaired, Antennapedia, tailless,bagpipe, prospero, ribbon, andgrainyhead), and genes encoding two signaling molecules (wingless and decapentaplegic), a signal transduction protein (stumps), a cell adhesion molecule (neurotactin), and a channel protein (big brain). The early developmental gene-regulatory hierarchy, including gap, pair-rule, segment polarity, and homeotic gene induction (11), was recapitulated in the microarray data. Sequence similarities suggest that the 27 uncharacterized, rapidly induced zygotic genes encode cell adhesion molecules (6 genes), channels and transporters (6 genes), metabolic and biosynthetic enzymes (5 genes), or kinases and phosphatases (4 genes). None of these newly identified genes have sequence similarity to transcription factors. Transient early zygotic (“blastoderm-specific”) genes are expressed at high levels only during the critical period of development when cellularization of the syncitial blastoderm embryo occurs. SOM analysis of the expression patterns of early zygotic genes identified 21 such genes, including 3 previously known genes and 18 previously unknown ones (Fig. 2F, table S20).

We investigated whether genes with related biochemical functions are coordinately expressed during development. Genes encoding functionally related proteins were identified by gene ontology (GO) annotations, which classify genes according to the functions of their encoded proteins (8, 12). Genes within a functional group tend to be expressed at similar times (Fig. 3A). For example, most cell cycle genes are expressed at high levels during the first 12 hours of development, when cell division is rapid, and few are expressed at high levels thereafter. In contrast, most metabolic genes are expressed at their highest levels only immediately before and during larval and adult life.

Figure 3

Coordinate expression of genes encoding components of macromolecular complexes or involved in specific physiological processes. (A) For each GO class of protein, open bars below each line indicate the percentage of genes with low expression (bottom 25% of a gene's expression range during development), and filled bars above each line indicate the percentage of genes with high expression (top 25% of a gene's expression range). Colored GO classes correspond to clusters shown in (B). The scale (100% equals all genes in the GO class) is indicated for the endothelial class. (B) Three selected clusters of genes with similar expression profiles and related biological functions: components of mitochondria (Mito), ribosome (Ribo), and cytoskeletal/neural genes (Cyt/Neur). Genes within each cluster that are known to share a common biological function are indicated by a colored bar. Developmental stages as indicated in Fig. 1.

All 4028 genes were grouped by similarity of expression profile with a hierarchical clustering algorithm (9), and clusters of genes with similar expression profiles were examined for genes with related biochemical and cellular functions. Many examples of coexpressed genes that encode components of biochemical pathways or subunits of protein complexes were apparent, including genes not previously known to be developmentally regulated. Distinct clusters were enriched for genes encoding mitochondrial proteins, ribosomal proteins, cytoskeletal/neuronal factors, components of the 26Sproteasome complex, the TCP-1 ring chaperonin complex, coatamer complex, vacuolar adenosine triphosphatases, and antimicrobial peptides (Fig. 3B and fig. S3). These results suggest that new components of biochemical complexes and cellular pathways in Drosophilacan be identified by virtue of their similar expression profiles.

Clusters of coexpressed genes enriched for tissue-specific genes were also identified. One such cluster includes 23 genes, 8 of which were known to be expressed in terminally differentiated muscle (Fig. 4A). The genes in this group have a two-peak expression pattern that coincides with larval and adult muscle development (13). Larval muscle development is initiated in the embryo by the basic helix-loop-helix (bHLH) transcription factor Twist (13), which triggers transcription ofdMef2, a gene encoding a MADS box transcription factor. dMef2 regulates the expression of muscle differentiation genes (14). This muscle regulatory hierarchy was recapitulated in the microarray data: The embryonic peak oftwist transcript preceded that of dMef2, which preceded expression of the genes in a muscle differentiation cluster (Fig. 4A). The same sequence was repeated in the pupal period, indicating that the same regulatory hierarchy controls formation of adult muscle.

Figure 4

Muscle differentiation. (A) A cluster enriched for genes expressed in terminally differentiated muscle (correlation coefficient of 0.862). Pink shading indicates genes that were either previously shown or shown here to be expressed in muscle (*confirmed by whole-mount in situ hybridization,**CG11914 was not tested but is predicted to be expressed in muscle on the basis of homology to muscle LIM proteins). Green shading indicates that in situ hybridization showed neuronal expression. The number of dMEF2 consensus binding site pairs in the vicinity of each gene is shown (8). Red bars highlight the sequential expression of the muscle gene regulatory hierarchy (twist > dMef2 > terminal differentiation genes) during the embryonic development of larval muscles and again during the pupal development of adult muscle. Several known muscle differentiation genes on the array did not group with this cluster, but showed an expression pattern consistent with higher expression during the development of larval (e.g., flap wing) or adult (e.g., flightin) muscle (see also fig. S4). (Note: male and female adult data were averaged after clustering for display purposes.) (B) In situ hybridization showing expression of CG8154 in ventral and lateral muscle fibers. Developmental stages as indicated in Fig. 1. Lateral transverse muscles are labeled 1, 2, and 3.

Fifteen of the 23 genes in this cluster (65%) contained pairs of predicted dMEF2-binding sites (8) (Fig. 4A). Only 5% of other genes on the array contain such pairs (8), so many of the genes in the cluster are likely to be direct targets of dMef2. Six of the seven previously uncharacterized genes in the cluster, all with dMef2-binding sites, were expressed in differentiated muscle (Fig. 4B). The seventh gene, and the two genes without dMef2-binding sites that we tested, were expressed in the central nervous system (table S23). These three neural genes together with one previously known neural gene,Down Syndrome Cell Adhesion Molecule (DSCAM) (15), were activated synchronously with muscle genes and may be involved in neural events that are coordinated with muscle development, such as neuromuscular junction formation.

Hierarchical clustering analysis also revealed two large groups of coexpressed genes that encode either female- or male-enriched transcripts. These genes appear to be sex-specifically expressed in the germ line. When RNA from mutants lacking germline tissue [the adult progeny of tudor mothers, referred to as tudormutants (16)] was analyzed, expression of nearly all genes in the putative male and female germline clusters was substantially reduced (Fig. 5A), demonstrating that these genes are expressed in the germ line or are dependent on the germ line for their expression (8). Indeed, nearly all of the male germline genes identified in thetudor mutant experiment were highly expressed in isolated testes (Fig. 5A). Increased expression of genes in the male cluster (249 genes) (Fig. 5A and table S25) began at the larva-pupa transition and remained high thereafter (Fig. 5A), coincident with meiosis and spermatogenesis in the male germ line (17, 8). Increased expression of genes in the female cluster (1245 genes) (Fig. 5A and table S24) began in 0- to 24-hour adults and continued thereafter (Fig. 5A), coincident with oogenesis (18). Transcripts of most (77%) of the genes in this cluster were present at high levels in early embryos before zygotic transcription began (Fig. 5A), implying that they are maternally provided. RNA blot analysis confirmed sex-specific germline expression of two selected genes in each class (fig. S5).

Figure 5

Sex-enriched germline and somatic genes, and eye differentiation genes. (A) Expression profiles of clusters of genes enriched in the female or male germ line, or both (8). Female (144) and male (215) germline genes were identified in the hierarchical cluster of the full data set (fig. S7); those with a threefold or greater difference in expression between adult males and females are shown. Developmental stages are as indicated in Fig. 1. M, adult male; F, adult female; Mtud, adult maletudor (0- to 24-hour and 5-day adult time points); Ftud, adult female tudor (0- to 24-hour and 5-day adult time points); testes were dissected from adults. (B) Clusters of genes enriched in female and male somatic tissue (8). (C) Eye differentiation genes. Hierarchical cluster of the 33 adult-enriched genes whose expression diminished in eyamutants (8).

Analysis of the tudor data also led to the identification of 111 genes that were expressed in both male and female germ lines, because they were expressed in wild-type adults of both sexes but markedly reduced in tudor mutants (Fig. 5A and tables S26 to S28) (8). Among these 111 genes are known germline factors common to both sexes such as exu (19) andbenign gonial neoplasm (20), whereas dozens of others remain to be characterized. Together, these analyses increase the number of male and female germline genes by an order of magnitude or more and demonstrate a previously unrecognized temporal coordination of germline genes in both sexes.

We identified sex-specific somatic genes by comparing transcript levels in female and male tudor adults. We found that 31 genes had significantly higher expression in the soma of adult females compared with 37 genes in males (8). The male and female somatic gene sets (Fig. 5B) include the previously identified sex-specificYolk protein 1 gene [female (21)] and an accessory gland protein gene Acp 36DE [male (22)]. The rest of the genes in these sets are likely also to be involved in sex-specific adult physiology or function (tables S29 and S30).

Hierarchical clustering identified a small adult-specific set of genes, some of which encode known eye-specific proteins. Using RNA fromeyes absent mutants, we refined this set to 33 genes that included 11 known eye differentiation genes, many of which function in phototransduction (Fig. 5C) (8). Some of the newly identified eye genes may also function in phototransduction, based on the inferred biochemical functions of the encoded proteins. For example, CG10233 and CG3573 encode a putative phosphatidylinositol-4 phosphate 5-kinase and a putative inositol 1,4,5-trisphosphate 5-phosphatase, respectively, and thus may regulate the level of PtdIns(4,5)P2, a key second messenger in invertebrate phototransduction (23).

Hierarchical clusters were examined for biases in the proportion of genes with highly conserved human homologs or for fly-specific genes (8). Sixteen of the 20 largest clusters had no significant bias (P > 0.01) in the relative proportions of conserved or fly-specific genes (fig. S6). Two clusters were significantly enriched (P < 0.001) for fly-specific genes: a cluster of male germline genes and a cluster of genes expressed in larvae that encode peptide hormones, peptidases, and peritrophins. Two other clusters were significantly enriched (P < 0.001) for conserved genes. One of these contained many ribosomal genes (Fig. 3B) and the other included a group of 35 zygotically activated genes, 24 of which are highly conserved. This latter cluster includes Hox genes, wingless,dpp, and several other factors involved in developmental processes shared among metazoans.

Genes that encode homologs of human disease proteins were analyzed to determine whether any disease gene homologs were co-expressed with other genes of related function. More than three-quarters of human disease genes have Drosophilahomologs (25, 26); 240 were present in this data set (27). These homologs were dispersed throughout many clusters. One example cluster containing 21 co-expressed genes, including dPresenilin anddNicastrin, homologs of two subunits of a proteolytic processing complex implicated in Alzheimer's disease (Fig. 3B, cytoskeletal/neuronal cluster). Most of the other known genes in this cluster are implicated in neuronal pathfinding and cell adhesion, including E-cadherin, which encodes a protein associated with the presenilin complex (28), and Notch, which encodes a substrate of the presenilin complex (29,30). The cluster of 21 genes is enriched for components and substrates of the presenilin complex.

These data (24) provide an overview of gene expression profiles during Drosophila development. An unusually high proportion of the genes are developmentally regulated, but of 4028 genes analyzed, only 903 are previously namedDrosophila genes with a known mutant phenotype, biochemical function, or protein homology. Fifty-one percent of the genes fall into 50 clusters with correlation coefficients greater than 0.80 (for an annotated hierarchical cluster, see fig. S7, green bars). Virtually all the clusters contain genes with known or predicted roles in development or physiology, and genes to which a biochemical or cellular function has been assigned by the GO project (12) [all genes in these clusters are listed in the online database (24)]. A large number of the clusters contain genes that are used together in specific developmental or biochemical processes. On the basis of their developmental expression patterns, we have tentatively assigned 53% of the genes to a developmental or biological functional category (for example, male germ line, female germ line, eye, muscle, early zygotic, biochemical complex, or cell biology function).

In addition to providing functional annotation of theDrosophila genome, these studies are a step toward a complete description of the genetic networks that control development.

Supporting Online Material

www.sciencemag.org/cgi/content/full/297/5590/2270/DC1

Materials and Methods

Figs. S1 to S7

Tables S1 to S30

  • * Co-first authors

  • To whom correspondence should be addressed. E-mail: kevin.white{at}yale.edu

REFERENCES AND NOTES

View Abstract

Navigate This Article