The Drosophila Genome Sequence: Implications for Biology and Medicine

See allHide authors and affiliations

Science  24 Mar 2000:
Vol. 287, Issue 5461, pp. 2218-2220
DOI: 10.1126/science.287.5461.2218


The 120-megabase euchromatic portion of theDrosophila melanogaster genome has been sequenced. Because the genome is compact and many genetic tools are available, and because fly cell biology and development have much in common with mammals, this sequence may be the Rosetta stone for deciphering the human genome.

The genome sequence of the fruit fly Drosophila melanogaster reported in this issue is a landmark achievement that marks the end of a century of gene hunting and heralds a new era of exploration and analysis. It is the second and largest animal genome sequenced (1), containing ∼180 million base pairs (Mbp), of which most of the 120-Mbp euchromatic, gene-rich portion has now been determined (2). The importance of this accomplishment stems in part from the monumental technical feat it represents and the swiftness with which it was completed as a combined academic and industry effort. The foundation was laid by the Berkeley, European, and Canadian DrosophilaGenome Projects, which contributed a detailed chromosomal map and 28 Mbp of sequence. The remaining 75% of the sequence was obtained this past year in a collaboration between Celera Genomics Group and the Berkeley Drosophila Genome Project.

Three million short (∼500 bp) sequence reads were made from the ends of random genomic fragments, and overlaps between the obtained sequences were used to assemble the nearly complete sequence of the four Drosophila chromosomes. This random (“shotgun”) strategy had not previously been attempted for genomes so large and complex, because repeated sequences hundreds to thousands of base pairs long scattered throughout the genome cause ambiguities in assembly. The solution for this was to obtain sequences from both ends of fragments that were ∼2, 10, and 150 kb in length (3). These oriented bits of sequence were assembled into increasingly dense and interlinked scaffolds that ultimately generated long continuous stretches of chromosome sequence with few gaps or ambiguities. An estimated 2% of euchromatin remains unfinished; it is thought to be mostly repeat-dense regions that border heterochromatin and are difficult to assemble. The success of this strategy with Drosophila is encouraging for a similar combination of directed and shotgun sequencing to elucidate larger and more complex genomes, including the human genome, which is nearly 30 times larger than Drosophila.

Beyond the technical achievement, the importance of theDrosophila sequence rests partly on the role this fly has played in the history of experimental biology. Even more significant is the accelerated rate of discovery it will catalyze in new areas ofDrosophila biology important for human biology and medicine.

Drosophila as a Model Animal

Throughout the last century, the fly has been the workhorse for genetic studies in eukaryotes. These studies provide the basis of much of our conceptual understanding of fundamental aspects of eukaryotic genetics, including the chromosomal basis of sex determination, genetic linkage, and chromosomal mechanics and behavior (4).Drosophila now has a wealth of mutants, and many special chromosomes that have been endowed with visible and molecular markers and other properties that facilitate genetic manipulations. These tools enable saturating genome screens directed to the isolation of a broad spectrum of visible and lethal phenotypes, even ones that are manifested in the F2 or F3 generations of mutagenized individuals. Transposon-based methods for manipulating genes have also been developed, all made possible because the P transposon can be modified and stably integrated into the chromosomes. These allow creation of genetically defined, stable lines with regulated transgenes and efficient production of genetic mosaics, techniques not available in other animals, includingCaenorhabditis elegans (5). These tools are invaluable for genetic and developmental studies. Transposon-based technologies have also been used to screen for lethal mutants with tissue- and cell-specific phenotypes and to screen for gene expression patterns [“enhancer trap” screens (6)] in live animals at all stages of development. With this technical arsenal for manipulating gene content, many different ways to identify and analyze genes and genetic interactions in a developmentally and behaviorally complex animal can be exploited.

The molecular cloning and functional analysis of Drosophilagenes has made it possible to assemble a molecular outline of many cellular and developmental processes. Moreover, these advances have provided entries into studies of the corresponding processes in mammals. Cloned Drosophila genes have led to the identification of mammalian cognates, and to an extent no one predicted, many of these cognates have closely related functions in mammals. This includes transcription factors and their regulatory targets, structural proteins, chromosomal proteins, ion channels, and signaling proteins. Because so many basic cellular functions are conserved, attention is drawn to the ones that differ significantly, such as the absence of cytoplasmic intermediate filaments (7). Conservation also extends to higher order processes, such as development, behavior, sleep, and physiological response to drugs such as alcohol (8), as well as to neurodegeneration (9). Homeobox genes were one of the earliest examples of genetic conservation between the fly and mammals. More recently, the pathways controlling development of limbs, nervous system, eyes, and the heart, as well as complex systemic interactions such as circadian rhythms and innate immunity, have also been found to be conserved (10). These discoveries of conservation in the underlying control pathways cast doubt on long-held views that eyes, body segmentation, and circadian rhythms arose independently in flies and mammals and are products of convergent evolution. Instead, there appears to be a fundamental unity that makes the fly an important model for many aspects of mammalian biology.

Implications of Drosophila Genome Sequence forDrosophila Workers

Fly genetics started in 1910 with the discovery in T. H. Morgan's laboratory of a spontaneous mutant with white eye color. Progress accelerated following the discovery of radiation, chemical, and insertional mutagenesis. The current era has witnessed a broad array of systematic screens for phenotypes in development, fertility, behavior, longevity, learning, and drug susceptibility. With the advent of molecular cloning techniques, roughly 2500 Drosophilagenes have been molecularly defined, and an even larger number have been genetically characterized. With the full complement of ∼14,000Drosophila genes now revealed, practitioners can begin to develop methods to ascertain the functions of the genes whose phenotypes are unknown or have not yet been linked to a particular complementation group. Remarkably, an organism as complex as the fly has only twice the number of genes as the unicellular yeastSaccharomyces cerevisiae. The relatively small gene complement in Drosophila, with few duplicated genes, makes it a streamlined animal genome ideal for genetic analysis.

The analysis of Drosophila genes currently underway in many labs will be facilitated by the availability of the full genome sequence. Transposon-induced mutations can now be mapped simply by obtaining a short stretch of genomic sequence flanking the insertion site; sequence polymorphisms between strains can be readily identified for recombinational mapping of chemically induced mutations. Whereas a complete set of loss-of-function mutants like the one being constructed for S. cerevesiae is clearly needed, efficient methods for targeted mutagenesis in Drosophila have yet to be found. The current strategy is to generate and map 105 P-element insertions (11). This is the best approach available, and it is expected to generate a collection of mutants representing 80% of all genes within 3 years. Nevertheless, transposon mutagenesis lacks the efficiency and precision of targeted gene replacement, and the full benefit of the Drosophila genome sequence will not be realized until efficient methods for directed mutagenesis are found. In addition, an easy way to freeze mutant and transgenic strains is needed in order to preserve the genetic bounty and to relieve the current burden of maintaining thousands of fly stocks in continuous culture.

It is critical to begin linking the new Drosophila genes with biochemical pathways. The availability of the full sequence will speed this effort by adding genomic approaches to the time-honored genetic and molecular ones. In Drosophila, mRNAs can be readily localized in whole mounts by in situ hybridization, as can proteins by immunohistochemical methods. With the full sequence, the temporal and spatial expression patterns of all genes can now be defined: in situ hybridizations can be undertaken with a full set of gene probes and DNA microarrays containing all predictedDrosophila genes can be constructed to allow massive parallel analysis of gene expression (12). Genes with related patterns of expression will be revealed by such studies, and the interdependence of their expression can be determined. The sequence can also be used to make reagents to localize any Drosophilaprotein and to define protein-protein interactions. Novel approaches to epistasis studies, interaction networks, and protein functions will likely be developed as well.

Implications of Drosophila Genome Sequence for Biology and Medicine

The conservation of biological processes from flies to mammals extends the influence of Drosophila to human health. When aDrosophila homolog of an important but poorly understood mammalian gene is isolated, the arsenal of genetic techniques in theDrosophila system can be applied to its characterization. The Drosophila gene's developmental expression pattern, loss-of-function phenotype, and overexpression phenotype can be analyzed to elucidate gene function. Other genes that function in the same pathway can be identified among genes with similar mutant phenotypes or expression patterns, or among mutant genes that enhance or suppress its phenotype. Pathways can be proposed on the basis of genetic epistasis studies. Isolating mammalian homologs of newly identified genes in the Drosophila pathway can elucidate the corresponding mammalian pathway. In this way, the power ofDrosophila genetics has been leveraged to elucidate mammalian pathways involved in cancer biology, the cell cycle, and receptor tyrosine kinase (RTK/RAS) signaling.

The full genome sequence identifies many new candidates for such approaches. Chief among these are Drosophila cognates of human disease genes, especially those whose functions are not understood. Finding the Drosophila homolog will no longer take months or years of uncertain searching by molecular methods. Homologs of tau and Parkin involved in Parkinsonism, thep53 tumor suppressor gene, the menin gene in multiple endocrine neoplasia type 1, and many other disease genes are now revealed by the genome sequence. In this vein, the long-sought insulin homologs have finally been found, as have those for receptors for somatostatin, vasopressin, leutotropin, thyroid stimulating hormone, and other hormones.

Looking to the Future

There are many aspects of Drosophila biology and physiology waiting to be explored by genetics and the new genomic approaches (13). We expect the future ofDrosophila research to turn increasingly to questions beyond the cellular level, to questions of physiology, maintenance, and regeneration of whole organs, and to systemic processes. For example, some human renal disorders are associated with defects in genes involved in fluid and electrolyte transport. Drosophilaorthologs of these genes found in the genomic sequence should spur studies of the physiology and function of Malphigian tubules, which serve as the Drosophila “kidney.” We anticipate that new collaborations between vertebrate and fly researchers will come about to study behavior, neurodegeneration, aging, and drugs and that important new biological principles and pathways will emerge.

Research is subject to strong selective pressures. Experimental systems that offer the most efficient and direct access to important questions attract the most attention and effort, and as techniques evolve and interests mature, the landscape will change rapidly. TheDrosophila sequence is a critical resource that ensures that this tiny dew-lover will continue to lead the way to new biological pathways and principles. If Drosophila has been difficult for workers in other fields because of an arcane nomenclature and idiosyncratic husbandry, the sequence now provides access through a universal language—the DNA sequence.


View Abstract

Stay Connected to Science

Navigate This Article