Somatic mutation in cancer and normal cells

See allHide authors and affiliations

Science  25 Sep 2015:
Vol. 349, Issue 6255, pp. 1483-1489
DOI: 10.1126/science.aab4082


Spontaneously occurring mutations accumulate in somatic cells throughout a person’s lifetime. The majority of these mutations do not have a noticeable effect, but some can alter key cellular functions. Early somatic mutations can cause developmental disorders, whereas the progressive accumulation of mutations throughout life can lead to cancer and contribute to aging. Genome sequencing has revolutionized our understanding of somatic mutation in cancer, providing a detailed view of the mutational processes and genes that drive cancer. Yet, fundamental gaps remain in our knowledge of how normal cells evolve into cancer cells. We briefly summarize a number of the lessons learned over 5 years of cancer genome sequencing and discuss their implications for our understanding of cancer progression and aging.

Although most of the somatic mutations that steadily accumulate in our cells are harmless, occasionally a mutation affects a gene or regulatory element and leads to a phenotypic consequence. A fraction of these mutations can confer a selective advantage to the cell, leading to preferential growth or survival of a clone. We use the term “driver mutation” to denote mutations under positive selection within a population of cells, and we use “passenger mutation” for variants that have either no phenotypic consequences or biological effects that are not selectively advantageous to the clone (1). One end product of somatic cell evolution is cancer, a disease in which an autonomous clone of cells escapes from both the in-built programs of normal somatic cell behavior and the exogenous restraints on cell proliferation.

A very brief history of somatic mutation and cancer

Cancer results from the clonal expansion of a single abnormal cell. In 1914, the observation of chromosomal abnormalities in cancer cells was one of the first links between mutation and cancer (1). The causal role of somatic mutations in cancer was later supported by the discovery that many carcinogenic chemicals are also mutagenic (2). Conclusive evidence came from studies showing that the introduction of DNA fragments from cancer cells into normal cells led to malignant transformation and also from the identification of the responsible mutations in the transforming DNA (1). This work led to the discovery of the first oncogenes, whose mutation can bring about a gain of function that drives transformation into cancer. In parallel, studies on hereditary cancers led to the discovery of tumor suppressor genes (3), which are typically inactivated by mutations, either germline or somatic.

As the link between somatic mutation and cancer was established, cancer was described as an example of Darwinian evolution, in which cells acquire the hallmarks of cancer through somatic mutation and selection (4, 5). This remains a widely accepted framework for understanding the progression of cancer, but we still lack quantitative information about the role of different factors in the evolution of normal cells into cancer cells.

In the past decade, high-throughput DNA sequencing has enabled the systematic sequencing of more than 10,000 cancer exomes and 2500 whole cancer genomes. This has revolutionized our understanding of the genetics of cancer, leading to the discovery of previously unrecognized cancer genes, new mutational signatures, and fresh insights into cancer evolution.

Mutational processes in cancer

Mutations arise from replication errors or from DNA damage that is either repaired incorrectly or left unrepaired. DNA damage can be caused by exogenous factors, including chemicals, ultraviolet (UV) light, and ionizing radiation; by endogenous factors, such as reactive oxygen species, aldehydes, or mitotic errors; or by enzymes involved in DNA repair or genome editing, among others (6). Additionally, viruses and endogenous retrotransposons can cause insertions of DNA sequence.

The rates of different mutational processes vary among tumors and cancer types (Fig. 1A). Though numbers vary widely, most cancers carry 1000 to 20,000 somatic point mutations and a few to hundreds of insertions, deletions, and rearrangements (710). Pediatric brain tumors and leukemias typically have the lowest numbers of mutations, whereas tumors induced by exposure to mutagens, such as lung cancers (tobacco) or skin cancers (UV rays), present the highest rates (810). Although these are common figures, some cancers acquire dramatically increased mutation rates due to the loss of repair pathways or chromosome integrity checkpoints (6, 8). Depending on which process is affected, this can manifest as a very high rate of point mutations, microsatellite instability, or chromosome instability.

Fig. 1 Spectrum of somatic mutations in cancer genomes.

(A) Mutation burden in 20 tumor types and relative contribution of different mutational processes. For each tumor type, samples were divided into deciles on the basis of their mutation burden. (Top) The median mutation burden is shown as a dot plot (substitutions and small indels); orange bars denote the median burden of all samples. AML, acute myeloid leukemia. (Bottom) The mean percentage contribution of different mutation signatures is depicted by stacked bars. Data are from The Cancer Genome Atlas (TCGA) ( (top) and L. B. Alexandrov [updated from (8)] (bottom) and are visualized as in (8, 9). (B) Context-dependent mutation spectrum of several mutational signatures found in cancer genomes [COSMIC (Catalogue of Somatic Mutations in Cancer) database (26)]. Heat maps show relative rates per trinucleotide.

Thus, both exogenous and endogenous mutational processes contribute to different cancers to various extents. Although in some tissues the mutations and incidence of cancer are dominated by exposure to external mutagens (8), intrinsic factors—such as the number of cell divisions in a tissue—seemingly dominate other cancer types (8, 11) (Fig. 1A).

Signatures of mutational processes in cancer genomes

Different mutational processes leave idiosyncratic patterns of mutations, termed “mutational signatures.” These patterns allow us to identify known and novel mutational processes and quantify their action on a cancer genome (8). Features that can characterize the action of a given mutational process are: (i) the type of mutations observed, (ii) local sequence context, (iii) distribution across the genome, (iv) evidence of repair, and (v) timing during cancer evolution.

Often, a mutational process will cause only one type of somatic mutation; for example, the carcinogen aristolochic acid causes A>T base substitutions almost exclusively (12). In contrast, the loss of the homologous recombination genes BRCA1 or BRCA2 in breast, ovarian, and pancreatic cancers is associated with a distinctive pattern of base substitutions, medium-sized indels, and larger chromosomal duplications and deletions (13).

Mutations are often enriched in specific local sequence contexts. For example, UV light induces pyrimidine dimers, whose erroneous repair leads to C>T mutations at CpC or TpC dinucleotides (Fig. 1B). For mutations induced by enzymatic damage, the local sequence context derives from properties of the enzyme. One of the most widespread mutational signatures in human cancers is due to off-target modification of DNA by the APOBEC family of proteins (8, 13), which predominantly causes C>T or C>G substitutions at sites preceded by a thymine nucleobase (Fig. 1B).

Some mutation processes show considerable variation in genomic distribution. Rates of point mutations vary along the genome and are typically higher in regions with low expression levels, repressed chromatin, and late replication times (14, 15). Such patterns are also observed in human evolution (16) and in somatic mutations in normal cells (17, 18). Some of this variation may be driven by reduced access of mismatch repair machinery to closed chromatin regions (19). In contrast, other mutational processes show enrichment in regions of open chromatin (20, 21).

We can occasionally infer the action of repair processes in the distribution of mutations. Transcription-coupled repair leads to a reduction in mutations on the transcribed strand compared with the nontranscribed strand (14) and is evident in some sources of exogenous DNA damage, such as UV light or tobacco exposure (8). The medium-sized indels associated with BRCA1/2 deficiency show a signature of repair by microhomology-mediated end-joining, as this pathway repairs double-stranded DNA breaks in the absence of homologous recombination (13).

Finally, information about timing of mutational processes can sometimes be inferred from genomics data. Surprisingly, some mutational events occur as one-off catastrophes during cancer evolution. Chromothripsis is characterized by clusters of tens to hundreds of rearrangements that occur as a single event (22), which has now been experimentally recreated in vitro in a single cell division (23). Other clusters of simultaneous mutations include breakage-fusion-bridge cycles (24), chromoplexy (25), and clusters of point mutations (kataegis) (13). In contrast, other mutational processes seemingly show steady rates of accumulation over time.

By studying the patterns of mutations in cancer genomes, we are beginning to unravel the action of many known and novel mutational processes in cancer. This pursuit also helps us to identify preventable sources of mutations behind certain cancers, such as the role of particular herbal medicines in some bladder cancers in Asia (12).

Positive selection on somatic mutations

Of the many thousands of mutations that cancer genomes have, typically only a handful (the driver mutations) have been positively selected. The vast majority are effectively neutral or mildly deleterious mutations that occurred before or soon after a driver mutation and “hitchhiked” as the cell containing the driver clonally expanded (1). Across multiple patients, positive selection of driver mutations manifests as a higher rate of mutations in a gene or a region than that expected by neutral mutation accumulation. Since the discovery of the Philadelphia chromosome (1), mutation recurrence has proven to be a powerful tool for the identification of new cancer genes. As data sets have grown larger, it has become increasingly important to develop accurate models of the background mutation rate for discovering more rarely mutated cancer genes, because overly simplistic models yield large numbers of false positives (9).

The Cancer Gene Census, a database of genes recurrently mutated in cancer, presently contains 572 genes (26). Of these, ~90% are altered by somatic mutation and ~20% by germline mutations that predispose to cancer (familial cancers). Figure 2 summarizes the frequency of somatic mutations (substitutions and indels) in 198 cancer genes across 20 tissues. Only three genes are mutated in more than 10% of patients across the range of tumor types shown here: TP53 (36.1%), PIK3CA (14.3%), and BRAF (10%).

Fig. 2 Frequency of mutations in recurrently mutated cancer genes across tumor types.

This reference table summarizes the percentage of patients carrying a nonsynonymous coding substitution or small indel in each gene. Data are from TCGA and are visualized as in (60). Different colors emphasize the frequency of mutations. The list of 198 cancer genes corresponds to all genes in the curated list of 174 mutated genes from the COSMIC database (version 73) (26) and any Cancer Gene Census database gene found recurrently mutated in (59).

Most large sequencing studies have used exome sequencing, and thus most known driver mutations affect protein-coding regions of the genome. Though an increasing number of studies are employing whole genomes, the yield of driver mutations appears much less dramatic in noncoding regions (27, 28). Nonetheless, there are examples of driver mutations affecting regulatory regions, such as mutations in the promoter of the telomerase gene (TERT) in up to 71% of melanomas (29) and more than half of bladder cancers and glioblastomas (30). These mutations create a new transcription factor–binding motif, leading to overexpression of the TERT gene. Structural variation in noncoding regions can also contribute to cancer development. For example, lymphomas frequently overexpress BCL2 and BCL6 by translocation of the genes adjacent to the immunoglobulin locus (31). More recently, overexpression of cancer genes by juxtaposing them with active enhancers has been described in some cancers (32, 33).

Progression of normal cells to cancer

The study of established cancers has provided many clues about the temporal evolution of cancers, but many gaps in our understanding remain. It seems that a clone must acquire a handful of driver mutations to transform into a cancer (1); yet, the mutation rate of normal cells is believed to be insufficient to generate enough driver mutations in one cell to explain the incidence of cancer. Two explanations, not mutually exclusive, have been proposed: cells can acquire hypermutation (the “mutator hypothesis”) (34) and/or early driver mutations trigger clonal expansions, increasing the pool of cells at risk for further driver mutations (35). Studies of normal tissues will probably settle this debate by quantifying the mutation rate and the extent of clonal expansions across normal tissues.

Mutation rates and signatures in normal cells

Estimates of the somatic mutation rate in human B and T lymphocytes and in fibroblasts are on the order of 2 to 10 mutations per diploid genome per cell division. Similar rates have been estimated in the retina and the intestinal epithelium (36). Thus, the substitution rate per cell division in normal somatic cells may be an order of magnitude higher than in germ cells (36). Although estimates of the rate of stem cell divisions in adult tissues are controversial and vary widely (11), normal cells of different tissues are predicted to accumulate hundreds to a few thousands of substitutions, not too far from the numbers in cancers, without the need for acquired hypermutation. Nonetheless, there is marked heterogeneity in the rates of various mutational signatures across cancers. If these rates and signatures are not mirrored in normal cells, this finding will suggest that acquisition of an increased mutation rate is important in cancer development.

Systematic sequencing studies of normal tissues are needed to clarify this debate. Unfortunately, such studies are still technically challenging, as the error rate of single-cell sequencing remains too high for accurate detection of de novo mutations, and only clonally expanded mutations can be reliably detected with current technologies. Despite these limitations, sequencing studies of normal blood and skin have recently been conducted, revealing burdens and signatures of somatic mutations broadly similar to the cancers from those cell types (18, 3740).

Beyond point mutations, little is known about the rate of rearrangements or large structural variations in normal tissues. Multiple studies have reported structural mutations—including indels (18), copy-number aberrations (18, 41, 42), retrotransposition (43), and even chromothripsis (44)—in normal cells from individuals without cancer. Nevertheless, it is often argued that acquired chromosomal instability is restricted to cancer cells (10).

Selection and clonal expansions in normal cells

Sequencing studies in normal blood and skin have revealed insights into patterns of clonal expansion associated with driver mutations. In blood, driver mutations can be found in ~10% of individuals older than 65 years of age, showing the typical patterns seen in leukemias (3739). Occasionally, these mutations drive expansions such that most blood cells derive from the mutant clone but still provide the essential biological functions of blood. Individuals carrying these driver mutations have an elevated future risk of blood cancers (37, 38), suggesting that these are genuine precancerous clones.

By middle age, sun-exposed skin cells carry thousands of point mutations, and about 25 to 30% of these cells have already acquired at least one driver mutation (18). Positive selection is evident in most known driver genes of squamous skin cancer, but clone sizes are relatively limited and similar across individuals, suggesting that the growth of clones with driver mutations slows relatively early in their expansion. The mechanisms constraining the expansion of driver clones are not known but are likely to represent an essential protection against cancer.

Selection in different tissues can act in different ways. These include increasing the relative rate of cell proliferation over differentiation; eluding quiescence, senescence, or cell death; or colonizing nearby areas. Typically, driver mutations must occur in stem or proliferating cells to lead to clonal expansions. For example, stem cells in the epithelia of the skin (45, 46), esophagus (47), and lung (48) have three types of divisions: an asymmetric division into one stem cell and one differentiated cell, or symmetric divisions producing either two stem cells (proliferation) or two differentiated cells (differentiation) (Fig. 3A). Homeostasis is maintained by having identical rates of symmetrical divisions, whereas increasing the ratio of proliferation to differentiation leads to exponential clonal expansions. Driver mutations in TP53 and NOTCH1 can cause this imbalance (45, 47). Alternatively, in the intestinal epithelium, a stem cell can take over an intestinal crypt, even in the absence of selection, but expansion beyond a crypt is physically constrained. In this tissue, selection seems to manifest as an increase in the ability of stem cells to extend beyond a crypt (49). Mutations in the colorectal cancer genes APC and KRAS induce large increases in the rate of crypt fission, colonizing neighboring areas (49, 50) (Fig. 3B).

Fig. 3 Selection and clonal expansion of early driver mutations.

(A) Cell fate model proposed for epithelial stem cells in skin, esophagus, and lung (4548). Probabilities of the three possible outcomes shown here are representative of normal murine epidermis (46). (B) Schema of two steps in the clonal expansion of a mutation in intestinal epithelium. (C) Four different models of successive clonal expansions induced by driver mutations accumulated throughout life. The first example (top left) corresponds to Armitage and Doll’s model (54), in which only the first five mutations are rate limiting. The dynamics of intermediate clonal expansions are unknown, but they are critical for the correct modeling of age-incidence statistics and for our understanding of cancer evolution.

Precancerous conditions

The terms “precancer” or “precancerous lesion” are often used to refer to areas of a tissue that show certain histological changes associated with an increased risk of cancer (51). Examples of early histological changes known to precede some epithelial cancers are hyperplasia, dysplasia, and metaplasia. These early changes can evolve into carcinoma in situ, where cells present the morphological features of cancer but do not yet invade the underlying tissue. Some examples of precancerous lesions are adenomatous polyps of the colon, Barrett’s esophagus, breast ductal carcinoma in situ (DCIS), and cervical intraepithelial neoplasia. The risk of these lesions evolving into cancer is variable.

Genomic studies of precancerous conditions can illuminate the dynamics of tumor progression. In the transition from Barrett’s esophagus to esophageal adenocarcinoma, the majority of driver genes are mutated at similar frequency in the precursor lesion, with the exception of TP53 and SMAD4, which are more frequently mutated in the invasive tumors (52). This suggests that Barrett’s esophagus is, genetically speaking, an advanced precancerous lesion. Similar findings have been described for patients with DCIS and invasive breast cancer (53).

These observations seem consistent with the stepwise model of progression to cancer, in which a series of events, mutational or otherwise, drives successive clonal expansions with progressively more disordered phenotypes. However, the majority of cancers arise without a histologically discernible premalignant phase, and it may be that different modes of tumor evolution are operative here. If strong cooperation between driver mutations occurs, major histological changes may not take place until the full repertoire of variants is acquired (Fig. 3C). Furthermore, catastrophic mutational processes such as chromothripsis or telomere crises may fuel a rapid accumulation of driver mutations, resulting in a transformation from normal to malignant cells without easily detectable intermediate stages.

Cancer, aging, and the evolution of protection mechanisms

Cancer is virtually inevitable in complex, long-lived, multicellular organisms. Virtually every cell in an organism has the information and the potential to propagate rapidly, but the success of multicellularity relies on the evolution of mechanisms to suppress this ability. Yet, somatic mutations inevitably accumulate with time and, aided by selection at the tissue level, can erode these suppressive mechanisms.

Age-incidence statistics and the number of driver events per tumor

Time plays a major role in the occurrence of cancer. In fact, the risk of suffering any cancer before the age of 40 is ~2%, but by age 80 this risk increases to 50% (Fig. 4A). Moreover, the incidence of some common cancers rises roughly to the power of four to six as a function of age (Fig. 4B). As early as 1954, this observation was used to propose that cancer could be the result of four to six rate-limiting steps accumulating randomly at a constant rate throughout life (54). The exact numbers are disputed, because fits are imperfect and models with clonal expansions (Fig. 3C) can predict fewer rate-limiting steps, but this observation remains very influential (1, 10). With the discovery of the role of somatic mutations in cancer, it is tempting to propose that at least some of these steps are driver mutations, as was exemplified by Knudson for retinoblastoma in 1971 (3).

Fig. 4 Age incidence of cancer.

(A) Cumulative risk of cancer versus age. This plot shows the risk of suffering a given cancer before a particular age. (B) Log-log representation of the incidence of different cancers (cases per year per 100,000 people) versus age. The regression lines highlight the approximately geometrical increase of cancer incidence with age, although the association is imperfect and only correlative for some cancer types (54). k denotes line slope. U.S. cancer-incidence data are from the SEER (Surveillance, Epidemiology, and End Results Program) Cancer Statistics Review (data are from 2008 to 2012 and include any race and both genders, unless otherwise specified; M, men; W, women).

Although driver mutations and rate-limiting steps from age-incidence curves are likely to be related, they are not equivalent. For example, when tumors are sufficiently large or acquire hypermutation, analogous or even identical mutations can occur frequently enough to appear independently in multiple subclones, which suggests that mutations may not be rate limiting in later stages of tumor development (53). Also, some rate-limiting steps are likely to be nonmutational, such as epigenetic changes and changes in the microenvironment of a tumor.

Despite 5 years of systematic sequencing of thousands of cancer genomes, we have not yet determined the number of driver mutations required to make a tumor. It is easier to identify regions with an overall excess recurrence of mutations than it is to attribute driver or passenger status to a given mutation in a given patient. This is especially true in tumors with high mutation rates and for noncoding mutations or complex structural changes, for which detection of driver events remains a major challenge.

Evolution of protection mechanisms against cancer

Over millions of years, species have evolved protective mechanisms to keep the incidence of cancer low. These include high-fidelity replication, DNA repair pathways, cellular senescence, stem cell hierarchies, tumor suppressor genes, immune surveillance, and microenvironmental control of cellular behavior.

Given the availability of protective mechanisms against cancer, why does cancer still exist? Some answers can be found in evolutionary theory, which predicts fundamental limits to how much evolution can reduce the incidence of cancer. Most importantly, selection is virtually powerless to fight causes of death after reproductive age, so mechanisms will mainly evolve to reduce cancer in the young. Yet, even these mechanisms will be limited by genetic drift, which makes cancer incidence in the young unlikely to be reduced below ~1/10,000 in humans (16). Finally, as argued to explain the evolution of aging, in maximizing reproduction success, selection can favor traits that benefit the young at the cost of an added burden later in life (55).

Further, at least two additional factors can increase the incidence of cancer above the lower limit set by evolution. Exposure to unfamiliar mutagens (such as tobacco smoke) to which a species has not yet adapted can strongly increase the incidence of certain cancers. Also, recent rapid changes in human evolution might explain higher rates of certain cancers. For instance, the increase in brain size and changes in the development of long bones have been proposed to explain the relatively high frequency of childhood brain and bone cancers, respectively, in humans (56).

Together with the roughly geometric rise in cancer incidence predicted by the stepwise model of cancer, these evolutionary considerations can help to explain why cancer incidence is low, but not zero, in the young, as well as why incidence rises rapidly later in life (Fig. 4).

Cancer as an example of aging

Owing to their association with age and contribution to morbidity and mortality late in life, many cancers can be considered a natural part of aging. In fact, cancer offers a particularly well-understood example of an aging process, exemplifying how a linear accumulation of errors (somatic mutations) can cause a rapid (geometric) rise in morbidity and mortality after reproductive age.

Multiple theories—including the progressive accumulation of DNA damage, somatic mutations, oxidation of mitochondrial DNA, progressive loss of epigenetic regulation, chromatin disorganization, and/or expression deregulation (55, 57)—have been proposed to explain the molecular basis of aging. All of these and other forms of progressive molecular degradation are believed to place a burden on the organism and may, together, explain the varied manifestations of aging (55).

Somatic mutations are thought to play an important role in aging, beyond causing cancer (57, 58). Interestingly, several premature aging disorders, such as Werner syndrome, are caused by DNA repair deficiency. Although the link between somatic mutation and aging is not fully understood, studies in which the rate of DNA damage is increased or reduced have shown acceleration or deceleration, respectively, of aging in cell and animal models (57, 58). In general, the accumulation of somatic mutations, occasionally amplified by clonal expansions, over time may lead to alterations in the ability of tissues to function normally. Mutations can alter key genes for the cell or tissue, affect DNA repair, activate cell senescence pathways, or alter gene regulation in the cells, all of which can contribute to the hallmarks of aging (57, 58). Sequencing studies of aging tissues should further clarify the extent and the role of somatic mutation in aging.


In just a few years, cancer genome sequencing has revolutionized our understanding of the genetics of cancer. The sequencing of larger numbers of tumors and poorly explored tumor types will continue to yield previously unidentified cancer genes and mutational signatures. Studies exploiting detailed clinical information will correlate these findings with treatment responses and clinical outcomes. The extent of driver mutations in noncoding elements and structural variation remains to be determined. Whole-genome sequencing and novel statistical methods, aided by large sequencing efforts, will help us answer these questions.

We believe that the next decade will see systematic analyses of somatic mutation in normal tissues and its role in cancer progression and aging. Direct studies of mutation burden, mutation signatures, clonal dynamics, and cellular phenotypes will provide a bridge from epidemiological findings to mechanistic insights into the earliest steps of cancer.

References and Notes

  1. Acknowledgments: We thank L. B. Alexandrov (Los Alamos National Laboratory, USA) for sharing data before publication (Fig. 1A) and the TCGA team for their invaluable public resource. Figures 1 and 2 are largely based on data generated by TCGA ( P.J.C. is a Wellcome Trust Senior Clinical Fellow, and I.M. is a fellow of Queens’ College, Cambridge. P.J.C. holds equity in and is a paid consultant for 14M Genomics Ltd.
View Abstract

Stay Connected to Science

Navigate This Article