Essays on Science and SocietyScience & SciLifeLab Prize

Understanding the origins of human cancer

See allHide authors and affiliations

Science  04 Dec 2015:
Vol. 350, Issue 6265, pp. 1175-1177
DOI: 10.1126/science.aad7363

All cancers originate from a single cell that starts to behave abnormally, to divide uncontrollably, and, eventually, to invade adjacent tissues (1). The aberrant behavior of this single cell is due to somatic mutations—changes in the genomic DNA produced by the activity of different mutational processes (1). These mutational processes include exposure to exogenous or endogenous mutagens, abnormal DNA editing, the incomplete fidelity of DNA polymerases, and failure of DNA repair mechanisms (2). Early studies that sequenced TP53, the most commonly mutated gene in human cancer, provided evidence that mutational processes leave distinct imprints of somatic mutations on the genome of a cancer cell (3). For example, C:G>A:T transversions predominate in smoking-associated lung cancer, whereas C:G>T:A transitions occurring mainly at dipyrimidines and CC:GG>TT:AA double-nucleotide substitutions are common in ultraviolet light–associated skin cancers. These patterns of mutations matched the ones induced experimentally by tobacco mutagens and ultraviolet light, respectively, the major, known, exogenous carcinogenic influences in these cancer types, and demonstrated that examining patterns of mutations in cancer genomes can yield information about the mutational processes that cause human cancer (4).

When I started my Ph.D. at Mike Stratton's lab at the Wellcome Trust Sanger Institute, large-scale global initiatives, such as the International Cancer Genome Consortium, had started performing molecular characterization of thousands of cancer patients around the world (5). However, at that time, there had only been limited characterization of patterns of mutations imprinted by mutational processes. During my Ph.D. studies, I explored the possibility of leveraging the available cancer genomics data to elucidate the mutational processes operative in human cancer. I started by conceptualizing the problem and developing a mathematical model that describes the interconnection between the activity of mutational processes in cancer cells and the mutational catalogs generated by next-generation sequencing of cancer genomes (6). The mathematical model was subsequently used to develop a computational approach (6), which I later applied to thousands of sequenced human cancers (7).

Biologically, the somatic mutations in a cancer genome are the cumulative result of the mutational processes that have been active since the very first division of the fertilized egg from which the cancer cell was derived (2). Different mutational processes often generate unique combinations of mutation types, and we termed these patterns “mutational signatures.” Multiple distinct mutational signatures may be recorded on the genome of a single cancer cell and, as such, an individual cancer genome is insufficient for identifying all imprinted mutational signatures. However, the availability of thousands of samples in which mutational signatures are present with different frequencies makes it possible to decipher their patterns. Mathematically, a set of mutational catalogs of cancer genomes could be examined as a linear mixture of unknown numbers of mutational signatures. The mutational catalogs of these cancer genomes are known from DNA sequencing, and the aim is to identify the patterns of the mutational signatures as well as the number of mutations attributed to each signature in each sample. This problem belongs to a well-known class of blind source separation (BSS) problems, in which mixtures of recordings need to be separated with very little information about the underlying mixing process. To solve this cancer-specific BSS problem in a practical way, I developed a computational framework that uses the previously established multiplicative update algorithm for non-negative matrix factorization (8). The framework was extensively evaluated with simulated and real data, demonstrating that it allows one to accurately identify mutational signatures both from whole-genome and whole-exome sequenced samples (6).

Initially, I applied the developed computational framework to the somatic mutations found in 21 whole-genome sequenced breast cancers (9, 10). Analysis revealed the existence of multiple distinct mutational signatures (9), and we were able to explore the activity of these signatures over time (10). This initial application of the developed computational framework was followed by a comprehensive global analysis of mutational signatures across the spectrum of human neoplasia (7). I curated the majority of publicly available data and compiled a data set encompassing ~5 million somatic mutations from the mutational catalogs of 7042 primary cancers of 30 different classes. These data revealed the existence of 21 distinct mutational signatures in human cancer. Some were present in many cancer types, notably a signature attributed to the APOBEC family of cytidine deaminases (7, 11); others were confined to a single cancer class. For some of these processes, the underlying biological mechanism is still unknown. However, some of the identified mutational signatures were associated with age of cancer diagnosis, tobacco smoking, exposure to ultraviolet light, treatment with anticancer drugs, presence of BRCA1 or BRCA2 mutations, activity of polymerase ŋ, activity of polymerase ε, and inactivation of mismatch repair genes.

The performed comprehensive pan-cancer analysis was complemented by a plethora of studies focusing on individual cancer types. In the last year of my Ph.D. studies, I contributed to further elaborating the understanding of mutational signatures in breast cancer (12), prostate cancer (1315), liver cancer (16), renal cancer (17), B cell lymphoma (18), a diverse set of childhood cancers (19), multiple myeloma (20), and acute lymphoblastic leukemia (21). Additionally, I participated in mapping the signatures of the somatic mutational processes in human mitochondria (22) as well as in understanding the mutational processes operative in normal somatic cells (23, 24). Overall, the pan-cancer analysis and the hitherto mentioned research resulted in identifying 30 distinct signatures of somatic mutational processes, most of which were previously unknown.

These 30 mutational signatures are briefly summarized in the table.

In summary, my Ph.D. thesis provided a basis for deciphering mutational signatures from cancer genomics data and developed the first comprehensive census of mutational signatures in human cancer. The results reveal the diversity of mutational processes underlying the development of cancer and have far-reaching implications for understanding cancer etiology, as well as for developing cancer prevention strategies and novel targeted cancer therapies.



Ludmil Alexandrov

Ludmil Alexandrov for his essay “Understanding the origins of human cancer.” Dr. Alexandrov is an Oppenheimer Fellow in the Theoretical Biology and Biophysics Group at Los Alamos National Laboratory. He earned his Bachelor of Science degree in Computer Science from Neumont University and received his Master's of Philosophy in Computational Biology as well as his Ph.D. in Cancer Genetics from the University of Cambridge. He is a recipient of the 2015 Weintraub Award for Graduate Research and, in 2013, he was listed by Forbes magazine as one of the “30 brightest stars under the age of 30” in the field of Science and Healthcare. His work is focused on understanding the mutational processes responsible for human cancer and human ageing. In 2015, his research was highlighted by the American Society of Clinical Oncology as an important step forward in the fight against cancer.

For the full text of all winning essays and further information, see

References and Notes

  1. Mutational Signatures, The Cancer Genome Project;
View Abstract

Stay Connected to Science


Navigate This Article