Insights from Genomic Data

Science  16 Feb 2001:
Vol. 291, Issue 5507, pp. 1155
DOI: 10.1126/science.291.5507.1155c

The complete assembly of the entire human genome sequence by Venter et al. confirms recent estimates that the total number of human protein coding genes might be less than 30,000—only one-third more than the nematode Caenorhabditis elegans. Claverie (p. 1255) points out that such a low number of genes could drastically modify our understanding of organism complexity and evolution, as well as our current interpretation of transcriptome analyses. He suggests that there may be severe consequences for the long-term sustainability of the biomedical industry in the postgenomic era.

Courseaux and Nahon (p. 1293) analyze the structural organization, pattern of expression, and origin of two genes that have emerged during primate evolution by a combination of retrotransposition of an RNA sequence, sequence mutations, and de novo creation of splice sites in adjoining sequences. These findings shed light on the first steps in the origins of new genes, and offer clues to the process by which humans and their close primate relatives diverged genetically from other mammals.

Cells are continually exposed to environmental and endogenous insults that damage DNA. Left unrepaired, this damage will eventually lead to genome instability, with devastating consequences for both the cell and the organism. Wood et al. (p. 1284) have surveyed the human genome sequence and compiled a comprehensive list of genes that help the cell recognize and repair DNA damage. Ongoing studies of how the products of these repair genes interact with one another promises to shed new light on fundamental cellular control mechanisms that go awry in cancer as well as in normal aging.

Comparison of the proteins coded in the human genome with those from the fruit fly and worms (nematodes) confirms that, in the course of evolution, the process of programmed cell death, or apoptosis, has become more complex. Aravind et al. (p. 1279) found that nematode cells function with just one protein in the NACHT family of nucleoside triphosphatases in their apoptotic arsenal. The human genome, however, shows no fewer than 18 proteins that belong in this family and that are related to NAIM (neuronal apoptosis inhibitory protein), a protein defective in spinal muscle atrophy. Oddly, homologs of proteins in the human apoptotic machinery are found in some bacteria, suggesting that there has been relatively recent gene transfer.

Once gene sequences are determined, the next question is often to ask how these data relate to expression. Caron et al. (p. 1289) describe the integration of existing serial analysis of gene expression (SAGE) data, which show the level of messenger RNA expression, with the human gene map to reveal the pattern of genome-wide expression. This transcriptome map, created from both normal and diseased tissue types, indicates that highly expressed genes tend to be clustered in specific chromosomal regions, or RIDGEs. This organization is unlike that of yeast and suggests that the human genome exhibits a higher order structure.

Navigate This Article