Human DNA Repair Genes

See allHide authors and affiliations

Science  16 Feb 2001:
Vol. 291, Issue 5507, pp. 1284-1289
DOI: 10.1126/science.1056154


Cellular DNA is subjected to continual attack, both by reactive species inside cells and by environmental agents. Toxic and mutagenic consequences are minimized by distinct pathways of repair, and 130 known human DNA repair genes are described here. Notable features presently include four enzymes that can remove uracil from DNA, seven recombination genes related to RAD51, and many recently discovered DNA polymerases that bypass damage, but only one system to remove the main DNA lesions induced by ultraviolet light. More human DNA repair genes will be found by comparison with model organisms and as common folds in three-dimensional protein structures are determined. Modulation of DNA repair should lead to clinical applications including improvement of radiotherapy and treatment with anticancer drugs and an advanced understanding of the cellular aging process.

The human genome, like other genomes, encodes information to protect its own integrity (1). DNA repair enzymes continuously monitor chromosomes to correct damaged nucleotide residues generated by exposure to carcinogens and cytotoxic compounds. The damage is partly a consequence of environmental agents such as ultraviolet (UV) light from the sun, inhaled cigarette smoke, or incompletely defined dietary factors. However, a large proportion of DNA alterations are caused unavoidably by endogenous weak mutagens including water, reactive oxygen species, and metabolites that can act as alkylating agents. Very slow turnover of DNA consequently occurs even in cells that do not proliferate. Genome instability caused by the great variety of DNA-damaging agents would be an overwhelming problem for cells and organisms if it were not for DNA repair.

On the basis of searches of the current draft of the human genome sequence (2), we compiled a comprehensive list of DNA repair genes (Table 1). This inventory focuses on genes whose products have been functionally linked to the recognition and repair of damaged DNA as well as those showing strong sequence homology to repair genes in other organisms. Readers desiring further information on specific genes should consult the primary references and links available through the accession numbers. Recent review articles on the evolutionary relationships of DNA repair genes (3) and common sequence motifs in DNA repair genes (4) may also be helpful.

Table 1

Human DNA repair genes. A version of this table with active links to Gene Cards ( and to the National Center for Biotechnology Information is available (24) on Science Online. A version with updates is available at XP, xeroderma pigmentosum.

View this table:

The functions required for the three distinct forms of excision repair are described separately. These are base excision repair (BER), nucleotide excision repair (NER), and mismatch repair (MMR). Additional sections discuss direct reversal of DNA damage, recombination and rejoining pathways for repair of DNA strand breaks, and DNA polymerases that can bypass DNA damage.

The BER proteins excise and replace damaged DNA bases, mainly those arising from endogenous oxidative and hydrolytic decay of DNA (1). DNA glycosylases initiate this process by releasing the modified base. This is followed by cleavage of the sugar-phosphate chain, excision of the abasic residue, and local DNA synthesis and ligation. Cell nuclei and mitochondria contain several related but nonidentical DNA glycosylases obtained through alternative splicing of transcripts. Three different nuclear DNA glycosylases counteract oxidative damage, and a fourth mainly excises alkylated purines. Remarkably, four of the eight identified DNA glycosylases can remove uracil from DNA. Each of them has a specialized function, however. UNG, which is homologous to the Escherichia coli Ung enzyme, is associated with DNA replication forks and corrects uracil misincorporated opposite adenine. SMUG1, which is unique to higher eukaryotes, probably removes the uracil that arises in DNA by deamination of cytosine. MBD4 excises uracil and thymine specifically at deaminated CpG and 5-methyl-CpG sequences, and TDG removes ethenoC, a product of lipid peroxidation, and also slowly removes uracil and thymine at G·U and G·T base pairs. The existence of multiple proteins with similar activities is a recurring theme in human DNA repair (1). Another illustration of this is the set of at least four adenosine triphosphate (ATP)–dependent DNA ligases encoded by three genes, with LIG3-XRCC1 providing the main nick-joining function for BER.

Until recently, only one endonuclease for abasic sites had been found encoded in the human genome, although there are two each inE. coli and the yeast Saccharomyces cerevisiaeand three genes are predicted in the genome of the plantArabidopsis thaliana. A second human gene, APE2, has recently appeared. Apparently this encodes a minor activity, as deletion of the major gene APE1 causes early embryonic lethality in mice. Repair of the DNA replication–blocking lesion 3-methyladenine is another case where the human genome is frugal. In other organisms, several DNA glycosylases, unrelated at the primary sequence level, can remove 3-meA. Among them are Tag1 of E. coli, AlkA of E. coli (similar to MAG of S. cerevisiae), and MPG in higher eukaryotes. Only the MPG enzyme has been characterized so far in the human genome. This is in contrast to the at least two alkA and six tag1 homologs found in Arabidopsis (5). However, like the genomes of other multicellular animals, the current human genome draft contains no obvious tag1 and alkA homologs (6).

A few unusual enzymes reverse rather than excise DNA damage. The human MGMT removes methyl groups and other small alkyl groups from the O6 position of guanine. There are two such proteins (Ada and Ogt) in E. coli, but no additional homologs have been detected in the human genome sequence. MGMT resembles the COOH-terminal half of Ada. The NH2-terminal half of E. coliAda can remove a methyl group from a DNA phosphate residue. We found no homologs of this region of Ada, and it remains unclear whether such backbone methylations are repaired in human cells.

Many organisms contain photolyases that can monomerize lesions induced by UV light such as cyclobutane pyrimidine dimers and (6-4) photoproducts. The human genome has two CRY genes with similarity to photolyase sequences. These encode blue light photoreceptors involved in setting of circadian rhythms but not in photoreactivation of DNA damage. We have not detected additional homologs of DNA repair photolyases in the human genome, confirming previous reports that photolyase activity is present in many vertebrates including fish, reptiles, and marsupials, but not in placental mammals.

NER mainly removes bulky adducts caused by environmental agents. In E. coli, the three polypeptides UvrA, UvrB, and UvrC can locate a lesion and incise on either side of it to remove a segment of nucleotides containing the damage. Eukaryotes, including yeast and human cells, do not have direct UvrABC homologs but use a more elaborate assembly of gene products to carry out NER (1). For example, E. coli UvrA can bind to sites of DNA damage, whereas at least four different human NER factors have this property (the XPC complex, DDB complex, XPA, and RPA). The formation of an unwound preincision intermediate in human cells requires two DNA helicases, XPB and XPD, instead of the single UvrB inE. coli, and there are dedicated human nucleases (XPG and ERCC1-XPF) for each of the two incisions, instead of the single UvrC in bacteria. S. cerevisiae encodes two additional gene products, Rad7 and Rad16, which are important for NER. No convincing homologs to these can be identified in the human genome, although Rad16 is a difficult case because it is a member of the amply represented Swi/Snf family of DNA-stimulated adenosine triphosphatases (ATPases).

Some organisms such as the fission yeast Schizosaccharomyces pombe have a second system for excision of pyrimidine dimers, initiated by a UVDE nuclease. The human genome apparently lacks a homolog of this nuclease and has no such backup system, consistent with the fact that cells from NER-defective xeroderma pigmentosum patients totally lack the ability to remove pyrimidine dimers from DNA.

The transcribed strand of active human genes is repaired faster than the nontranscribed strand in a transcription-coupled repair process known to involve the products of CSA, CSB, andXAB2. The mechanism of such transcription-coupled repair is not known, and future investigation is expected to reveal additional participants.

MMR corrects occasional errors of DNA replication as well as heterologies formed during recombination. The bacterial mutSand mutL genes encode proteins responsible for identifying mismatches, and there are numerous homologs of these genes in the human genome, of greater variety than those found in yeast, Drosophila melanogaster, or Caenorhabditis elegans. Some of these proteins are specialized for locating distinct types of mismatches in DNA, some are specialized for meiotic recombination, and some have functions yet to be determined. In E. coli, the newly synthesized DNA strand is identified with the aid of the MutH endonuclease, which has no human ortholog. Strand discrimination in human cells may be signaled instead by the orientation of components of the DNA replication complex such as PCNA or by other factors not yet identified.

DNA double-strand breaks may be rectified by either homologous or nonhomologous recombination pathways. Particularly notable in the human sequence is the presence of at least seven genes encoding proteins distantly related to the single Rad51 of S. cerevisiae and the single RecA of E. coli. The latter proteins function in strand pairing and exchange during recombination. By comparison, four members of the Rad51 family have been found in theDrosophila genome (7) and four inArabidopsis (5). Homologous recombination in human cells is likely to involve branch migration enzymes and resolvases that are functionally analogous to the bacterial RuvABC system. Recent biochemical experiments have revealed human activities for such concerted branch migration/resolution reactions, but the responsible gene products have not yet been identified (8).

The nonhomologous end-joining pathway (NHEJ) involves the factors listed in Table 1, and additional components will most likely be discovered. For example, the DNA-dependent protein kinase is believed to phosphorylate key molecules involved in the repair process. These substrates have yet to be fully defined.

Single-strand interruptions in DNA can be rectified by enzymes from the BER pathway. Enzymes of the PARP family, as well as XRCC1, temporarily bind to single-strand interruptions in DNA and may act to recruit repair proteins. We have not listed the telomere-binding proteins protecting the ends of chromosomes, but one member of the PARP family, tankyrase, is present in this complex.

During the past year, the human genome sequence has revealed many previously unrecognized DNA polymerases (1). There are currently at least 15 DNA polymerases in humans, exceeding the number found in any other organism. For repair of nuclear DNA, the main form of BER uses Pol β, whereas Pol δ or Pol ɛ are the main enzymes employed for NER and MMR. Genetic and biochemical evidence has implicated many of the newly discovered polymerases in the DNA damage response, but others may have specialized roles such as sister chromatid cohesion. Table 1 includes the catalytic subunits of these DNA polymerases, but not other subunits and DNA polymerase cofactors.

REV3L, the catalytic subunit of DNA polymerase ζ, illustrates how DNA sequence homology searches can yield unexpected results. The DNA polymerase domain at the COOH-terminus of the human protein resembles S. cerevisiae Rev3, but most of the first 2000 amino acids are not present in the yeast protein. A second human gene highly homologous to 1200 residues in this region (outside the polymerase domains) is encoded on the X chromosome (accession numberAL139395). It is premature to classify this as a DNA repair gene, but study of it is expected to shed light on the function of REV3L.

The human genome sequence has already markedly influenced the field of DNA repair. Many of the genes listed were discovered as investigators searched the expanding database for sequence similarity to genes discovered in model organisms. This approach will no doubt continue, and new human genes will be identified as additional repair functions are identified in other systems. One source that is likely to be fruitful is the genome of Deinococcus radiodurans (9). This bacterium has an exceptionally high resistance to DNA-damaging agents, especially ionizing radiation, in comparison to other microorganisms. Some of the currently uncharacterized genes in D. radiodurans are expected to contribute to DNA repair, and it remains to be seen if there will be homologs of such functions in the human genome.

The sequence database also makes it increasingly straightforward to use mass spectrometry fingerprinting to identify new subunits of repair protein complexes (10). In this sensitive technique, isolated proteins are digested with an enzyme such as trypsin, and the exact molecular masses of the resulting fragments are measured. Comparison of these fragments with a computer-simulated tryptic digest of each human gene product can unambiguously identify the protein.

In addition, new genes will be found as novel biochemical assays are developed for various aspects of repair. For example, human cells can repair cross-links between the two DNA strands. Interstrand cross-links are generated by natural psoralen compounds and their chemotherapeutic derivatives, by other drugs used for cancer treatment such as nitrogen mustards, and to some extent by ionizing and ultraviolet radiation. Repair of such cross-links involves the NER genes and the XRCC2 and XRCC3 recombination genes and is predicted to involve the DNA polymerase POLQ. In addition, the sensitivity of cells from individuals with Fanconi anemia (FA) points to a role for the FANC group of genes in cross-link repair. However, the mechanism of interstrand DNA cross-link repair remains obscure, and further investigation may implicate even more gene products.

Several other classes of DNA damage exist for which repair has been relatively unexplored. New genes may be identified, for instance, involved in the repair of damage caused by lipid peroxidation (1). Other uncharacterized forms of DNA damage caused by reactive metabolites and catabolites may be found. For example, the genome is dynamic, and single-stranded regions are temporarily exposed during DNA replication and gene transcription. Positions that are normally protected by base-pairing within the double helical structure are then vulnerable to group-specific reagents, creating new classes of lesions. Alkylating agents can form the cytotoxic lesions 1-methyladenine and 3-methylcytosine in single-stranded DNA, and new repair strategies may be needed to remove such lesions.

DNA is assembled into several levels of ordered chromatin structure, and so DNA metabolic processes need a close connection with proteins that allow chromatin remodeling or disassembly. Several human chromatin remodeling complexes are known, for instance, that allow and control access to DNA during gene transcription (11). The great majority of enzymological DNA repair studies to date have worked with naked DNA, but chromatin presents a substantial barrier to recognition of DNA damage. It is expected that human protein complexes will be found that are dedicated to DNA repair and recombination, facilitating access of DNA repair enzymes to the genome.

The three-dimensional structures of DNA repair proteins are being determined at an ever-increasing pace (12). Structural biologists will soon turn their attention to open reading frames of unknown function, and new repair genes will become apparent in the process. As an example, the functionally related SMUG1, TDG, and UNG enzymes show little or no primary sequence homology yet have common structural folds and belong to a single protein superfamily (13). As the structures of new protein folds are documented, more members of DNA repair enzyme families are likely to be found with the aid of three-dimensional structure prediction models. In this way, the new field of structural genomics will help guide functional studies of presently uncharacterized open reading frames in the human genome.

For an impressive number of genes involved in human DNA repair, disruptions of the corresponding murine genes have been reported (14), are in progress, or have recently been constructed. The results are beginning to guide searches for additional DNA repair enzymes. Knockouts of DNA glycosylases in mice have unexpectedly mild consequences by comparison with budding yeast andE. coli models. This implies that more backup systems exist, probably because endogenous damage presents a more frequent problem for larger genomes.

As the genes from the human genome sequence continue to be cataloged, studying the activity of the protein products will become increasingly important. More effective methods for rapid expression of active proteins will be required to test for possible functions. An alternative approach is to selectively inactivate individual proteins in vivo. An efficient method for selective proteolytic destruction has been successful in budding yeast (15) and should be extendable to mammalian cells. Alternatively, systematic interference with gene expression with the use of inhibitory RNA molecules, as employed successfully in C. elegans (16), is proving to be a powerful way to dissect gene functions.

Intense activity is being devoted to understanding how DNA damage transmits signals to the cell-cycle checkpoint machinery and to the monitoring systems that control cellular apoptosis. There is recent progress on this complex extended network, which involves damage recognition factors, protein kinases, and transcription factors such as p53 (17). Attempts are already being made to obtain an integrated picture of DNA repair with regard to signaling (18). The subject is of great interest as some inherited human syndromes associated with sensitivity to DNA-damaging agents result from loss of functions such as ATM, which is involved in damage sensing.

New clinical applications relating to human DNA repair genes are certain to emerge. Tumor cells often acquire resistance to therapeutic drugs or radiation. Genomics approaches such as array technology will be used to define any DNA repair genes that may be overexpressed in this context. Furthermore, it will be important to find ways to specifically inhibit DNA repair in these resistant cells by targeting the key enzymes. Genetic polymorphisms in relevant repair genes will be identified and efforts made to correlate them with effects on activity of the respective proteins, with response to particular therapies and with clinical outcomes. Although a number of polymorphisms in DNA repair genes are being reported, there is presently little functional information on the consequences of the attendant amino acid changes. It will be important to find out which polymorphisms actually affect protein function and then concentrate on these in epidemiological and clinical studies. For example, homozygosity for a particular polymorphism in the DNA ligase subunit XRCC1 is associated with higher sister chromatid exchange frequencies in smokers, suggesting an association of this allele with a higher risk for tobacco- and age-related DNA damage (19). Larger studies and comparison with other polymorphisms having known biochemical effects will be needed to further validate and extend these findings.

Furthermore, with the use of gene and protein array techniques, it should be possible to compare expression profiles of DNA repair genes in normal and tumor cells—information that could eventually lead to individually tailored therapies with chemicals and radiation. For example, tumors with low levels of NER should be more susceptible to treatment with cisplatin (20). In experimental systems, MMR-deficient cells are highly tolerant to alkylating chemotherapeutic drugs. MMR-defective tumors such as those found in hereditary nonpolyposis colon cancer may be resistant to treatment with such agents (21).

Some variation in DNA repair gene expression is epigenetic in origin and has been found for instance with MGMT and MSH6 (22). The MGMT gene promoter is often methylated in gliomas, resulting in suppressed expression that can be associated with an improved response after tumor treatment with an alkylating agent (23). The complete human genome sequence now allows the definition of promoter regions so that the DNA methylation status of relevant CpG islands can be investigated readily. Finally, DNA repair, especially repair of oxidative damage, has often been suggested as a relevant factor in counteracting aging. An examination of polymorphisms and gene expression levels in human DNA repair genes and a comparison with the equivalent genes in shorter lived mammalian species should help determine the importance of DNA repair in normal aging processes.

  • * Present address: University of Pittsburgh Cancer Institute, S867 Scaife Hall, 3550 Terrace Street, Pittsburgh, PA 15261, USA.


Stay Connected to Science

Navigate This Article