Research Article

Somatic evolution and global expansion of an ancient transmissible cancer lineage

See allHide authors and affiliations

Science  02 Aug 2019:
Vol. 365, Issue 6452, eaau9923
DOI: 10.1126/science.aau9923

It's a dog's life

Canine transmissible venereal tumor is one of the few cancer lineages that is transferred among individuals through contact. It arose millennia ago and has been evolving independently from its hosts ever since. Baez-Ortega et al. looked at the phylogenetic history of the cancer and describe several distinctive mutational patterns (see the Perspective by Maley and Shibata). Most notably, both positive and negative selection show only weak or distant signals. This suggests that the main driver of the lineage's evolution is neutral genetic drift. Understanding the influence of drift may reshape how we think about long-term cancer evolution.

Science, this issue p. eaau9923; see also p. 440

Structured Abstract


The canine transmissible venereal tumor (CTVT) is a sexually transmitted cancer that manifests as genital tumors in dogs. This cancer first arose in an individual “founder dog” several thousand years ago and has since survived by transfer of living cancer cells to new hosts during coitus. Today, CTVT affects dogs around the world and is the oldest and most prolific known cancer lineage. CTVT thus provides an opportunity to explore the evolution of cancer over the long term and to track the unusual biological transition from multicellular organism to obligate conspecific asexual parasite. Furthermore, the CTVT genome, acting as a living biomarker, has recorded the changing mutagenic environments experienced by this cancer throughout millennia and across continents.


To capture the genetic diversity of the CTVT lineage, we analyzed somatic mutations extracted from the protein-coding genomes (exomes) of 546 globally distributed CTVT tumors. We inferred a time-resolved phylogenetic tree for the clone and used this to trace the worldwide spread of the disease and to select subsets of mutations acquired at known geographical locations and time periods. Computational methods were applied to extract mutational signatures and to measure their exposures across time and space. In addition, we assessed the activity of selection using ratios of nonsynonymous and synonymous variants.


The CTVT phylogeny reveals that the lineage first arose from its founder dog 4000 to 8500 years ago, likely in Asia, with the most recent common ancestor of modern globally distributed tumors occurring ~1900 years ago. CTVT underwent a rapid global expansion within the past 500 years, likely aided by intensification of human maritime travel. We identify a highly specific mutational signature dominated by C>T mutations at GTCCA pentanucleotide contexts, which operated in CTVT up until ~1000 years ago. The number of mutations caused by ultraviolet light exposure is correlated with latitude of tumor collection, and we identify CTVTs with heritable hyperactivity of an endogenous mutational process. Several “driver” mutation candidates are identified in the basal trunk of the CTVT tree, but there is little evidence for ongoing positive selection. Although negative selection is detectable, its effect is largely confined to genes with known essential functions, thus implying that CTVT predominantly evolves through neutral processes.


We have traced the evolution of a transmissible cancer over several thousand years, tracking its spread across continents and contrasting the mutational processes and selective forces that molded its genome with those described in human cancers. The identification of a highly context-specific mutational process that operated in the past but subsequently vanished, as well as correlation of ultraviolet light–induced DNA damage with latitude, highlight the potential for long-lived, widespread clonal organisms to act as biomarkers for mutagenic exposures. Our results suggest that neutral genetic drift is the dominant evolutionary force operating on cancer over the long term, in contrast to the ongoing positive selection that is often observed in short-lived human cancers. The weakness of negative selection in this asexual lineage may be expected to lead to the progressive accumulation of deleterious mutations, invoking Muller’s ratchet and raising the possibility that CTVT may be declining in fitness despite its global success.

Cancer evolution over thousands of years.

The canine transmissible venereal tumor (CTVT) is an ancient contagious cancer with a global distribution. We sequenced the exomes of 546 CTVT tumors and identified somatic single-nucleotide variants (SNVs). These were used to construct a time-resolved phylogenetic tree, yielding insights into the cancer’s phylogeography, mutational processes, and signatures of selection across thousands of years. Notably, a highly context-specific mutational pattern named signature A was identified, which was active in the past but ceased to operate about 1000 years ago. BP, years before present.


The canine transmissible venereal tumor (CTVT) is a cancer lineage that arose several millennia ago and survives by “metastasizing” between hosts through cell transfer. The somatic mutations in this cancer record its phylogeography and evolutionary history. We constructed a time-resolved phylogeny from 546 CTVT exomes and describe the lineage’s worldwide expansion. Examining variation in mutational exposure, we identify a highly context-specific mutational process that operated early in the cancer’s evolution but subsequently vanished, correlate ultraviolet-light mutagenesis with tumor latitude, and describe tumors with heritable hyperactivity of an endogenous mutational process. CTVT displays little evidence of ongoing positive selection, and negative selection is detectable only in essential genes. We illustrate how long-lived clonal organisms capture changing mutagenic environments, and reveal that neutral genetic drift is the dominant feature of long-term cancer evolution.

Transmissible cancers are malignant somatic cell clones that spread between individuals by direct transfer of living cancer cells. Analogous to the metastasis of cancer to distant tissues within a single body, transmissible cancers “metastasize” as allogeneic grafts between individuals within a population (1). Such clones have been observed only eight times in nature, suggesting that they arise rarely; however, once established, transmissible cancers can spread rapidly and widely and persist through time (1, 2). Such cancers provide an opportunity to explore the evolution of cancer over the long term and to track the unusual biological transition from multicellular organism to obligate conspecific asexual parasite.

The canine transmissible venereal tumor (CTVT) is the oldest and most prolific known contagious cancer (2, 3). It is a sexually transmitted clone that manifests as genital tumors in dogs. This cancer first arose from the somatic cells of an individual “founder dog” that lived several thousand years ago (2). The cancer survived beyond the death of this original host by transfer of cancer cells to new hosts. Subsequently, this cancer has spread around the world and is a common disease in dog populations globally, although it declined and largely disappeared from many Western countries during the 20th century owing to the management and removal of free-roaming dogs (4).

Similar to cancers that remain in a single individual, CTVT accumulates somatic mutations. These result from the activities of endogenous and exogenous mutational processes, and genetically imprint a cancer’s history of mutagenic exposures (5). Thus, the CTVT genome can be considered a living biomarker that records the changing mutagenic environments experienced by this cancer throughout millennia and across continents. Although most somatic mutations in cancer have no functional effect and are considered neutral “passenger” mutations, a subset of mutations are positively selected “driver” mutations that confer the proliferation and survival advantages that spur cancer growth (6). Ordinary cancers, which remain in a single host, often acquire additional driver mutations during tumor progression (7); however, it is unknown whether transmissible cancers that survive for hundreds or thousands of years similarly continue to adapt. It seems possible that the evolution of long-lived cancers such as CTVT may instead be dominated by negative selection acting to remove deleterious mutations. Finally, in addition to recording a history of exposures and signatures of selection, somatic mutations provide a tool for tracing CTVT phylogeography, potentially revealing how dogs, together with humans, moved around the world over the past centuries. Here, we use somatic mutations extracted from the protein-coding genomes (exomes) of 546 globally distributed CTVT tumors to trace the history, spread, diversity, mutational exposures, and evolution of the CTVT clone.

CTVT phylogeny

We sequenced the exomes (43.6 megabases, Mb; mean sequencing depth ~132×) of 546 CTVT tumors collected between 2003 and 2016 from 43 countries across all inhabited continents (datasets S1 and S2). Candidate somatic mutations were defined as single-nucleotide variants (SNVs) or short insertions and deletions (indels) identified in one or more CTVT tumors, but not found in 495 normal dog exomes from the CTVT tumors’ matched hosts. This approach yielded 160,207 variants (148,030 SNVs, 3392 per Mb; 12,177 indels, 279 per Mb; table S1). The features of this set, including its variant allele fraction distribution, phylogenetic structure, comparison with the distribution of private germline variants in the dog population, mutational signature composition, and nonsynonymous-to-synonymous mutation ratio [details in (8)], suggest that it is very highly enriched for somatic mutations. However, some minimal germline variation may remain, possibly including rare germline variants from the founder dog and residual contaminating alleles from matched hosts.

We identified the subset of the candidate somatic mutations belonging to a clocklike mutational process [specifically, cytosine-to-thymine (C>T) substitutions at CpG sites (8, 9)] and used these to construct a time-resolved phylogenetic tree for the CTVT lineage (Fig. 1A). The mutation rate was inferred by applying a Bayesian Poisson model to previously ascertained empirical observations (10) and was estimated as 6.87 × 10−7 C>T mutations per CpG site per year (8). The topology of the CTVT phylogenetic tree reveals a long basal trunk (Fig. 1A), representing the chain of CTVT transmissions from its origin ~6220 years ago [95% highest posterior density interval (HPDI) 4148 to 8508 years ago] to the earliest detected node ~1938 years ago (95% HPDI 993 to 3055 years ago). This node splits a set of five tumors collected in India from the remaining population (groups labeled 57 and 58; Fig. 1A). The second and third most basal nodes (respectively ~1004 years ago, 95% HPDI 497 to 1570 years ago, and ~829 years ago, 95% HPDI 424 to 1310 years ago) separate 16 tumors from Eastern Europe and the Black Sea region, and three tumors from Northern India, from the remaining set, respectively (groups labeled 54 to 56 and 1; Fig. 1A). Together with evidence that the founder dog shared ancestry with ancient dog remains recovered in northeast Siberia and North America (10), the CTVT phylogeny supports a model whereby CTVT originated ~4000 to 8500 years ago in Central or Northern Asia and remained within the area for the subsequent 2000 to 6000 years. Starting less than ~2000 years ago, CTVT escaped from its founding population, perhaps due to contact between previously isolated dog groups, and spread to several locations in Asia and Europe (Fig. 1B).

Fig. 1 Phylogeny and geographical expansion of CTVT.

(A) Time-resolved phylogenetic tree inferred from clocklike exonic somatic variation in CTVT. Each tip indicates a tumor, and sampling locations are labeled. Numbers refer to phylogenetic groups displayed on maps in (B) to (D). Sublineages 1 and 2, referred to in (C) and (D), are marked. Three groups of ancestral somatic variation (A1, A2, A3) and their respective numbers of SNVs are indicated. The age of the CTVT founder tumor and the earliest detected node are indicated in years before present (BP), with gray bars depicting Bayesian 95% HPDIs. (B to D) Maps presenting likely routes of early and late expansion of CTVT. Numbered circles indicate the locations of phylogenetic groups labeled in (A); arrows represent inferred geographical movements. Circle and arrow colors indicate different sets of movements, as labeled in (A). Thin arrows indicate expansion routes for which phylogenetic evidence is limited; dots without numbers indicate tumors that are not represented in the tree. C.V., Cape Verde; Gr., Greece; Guat., Guatemala; Hond., Honduras; Ken., Kenya; Rom., Romania; Tan., Tanzania; Tur., Turkey.

The more recent history of CTVT is marked by rapid global expansion (11) (Fig. 1C and fig. S1). CTVT was introduced to the Americas with early colonial contact (~500 years ago, 95% HPDI 284 to 888 years ago), probably initially to Central America, and further into North and South America (red sublineage 1; Fig. 1, A and C). About 300 years ago, this sublineage spread out of the Americas in an almost polytomous global sweep that brought CTVT into Africa at least five times and reintroduced the disease to Europe and Asia (black sublineage 1; Fig. 1, A and C). In parallel, a second tumor sublineage spread out of Asia or Europe into Australia and the Pacific (sublineage 2; Fig. 1, A and D). This second sublineage is also detected in North America, and its tumors were introduced to Africa on at least two occasions. By ~100 years ago, CTVT was present in dog populations worldwide, establishing local lineages that have since remained largely in situ. The CTVT phylogeny thus suggests that dogs, together with their neoplastic parasites, were extensively transported around the world in the 15th to early 20th centuries, probably by sea travel.

Mutational processes in CTVT

The CTVT mutational spectrum, a representation of the six substitution types together with their immediate 5′ and 3′ base contexts, is dominated by C>T mutations, as previously described (12, 13) (Fig. 2A). Applying Markov chain Monte Carlo sampling on a Bayesian model of mutational signatures (8, 14), we extracted signatures of five mutational processes from the CTVT mutation load. These include three signatures that closely resemble COSMIC (15) signatures 1, 5, and 7 (Fig. 2B). These signatures, which have previously been described in CTVT (12), reflect endogenous mutational processes (signatures 1 and 5) and exposure to ultraviolet (UV) light (signature 7) (5). A fourth signature displaying some similarity (cosine similarity 0.81) to COSMIC signature 2, which is associated with activity of APOBEC enzymes (5), was also detected (labeled signature 2*, Fig. 2B).

Fig. 2 Mutational processes in CTVT.

(A) Trinucleotide-context mutational spectrum of somatic SNVs in a single CTVT tumor. Horizontal axis presents 96 mutation types displayed in pyrimidine context. Relevant mutation contexts are indicated. (B) Mutational spectra of extracted mutational signatures with relevant mutation contexts indicated. (C) Pentanucleotide-context mutational spectra of signature A (top) and signature 7 (bottom). Horizontal axis presents 256 C>T mutation types with relevant mutation contexts indicated. The inset tree shows the phylogenetic branches exposed to signature A. (D) Bayesian logarithmic regression and Spearman’s correlation between absolute mean latitude and normalized CC>TT mutations in phylogenetic groups shown in Fig. 1A. Normalized CC>TT mutations represent the ratio between group-specific CC>TT mutations and group-specific C>T changes at CpG dinucleotides. The black line and shadowed area indicate the regression curve and associated 95% HPDI. The orange dot and bars represent predicted absolute mean latitude and associated 90% prediction interval for the basal trunk ancestral variation (group A1). Posterior median and 95% HPDI of the correlation coefficient are shown. (E) Map showing the latitude range corresponding to the 90% prediction interval for group A1, presented in (D), in the Northern Hemisphere. (F) Mutational spectra of a phylogenetic group showing evidence of signature 5 hyperactivity (top) and a closely related unaffected group (bottom). (G) Diagram indicating the phylogenetic situation of the tumor groups displaying signature 5 hyperactivity.

The fifth signature extracted from CTVT does not resemble any previously described mutational pattern. This signature, which we designate signature A, is characterized by C>T mutations at NCC contexts (mutated nucleotide underlined) and shows substantial pentanucleotide sequence preference for GTCCA (TGGAC on the complementary strand; Fig. 2, B and C, and fig. S2). This extended sequence preference is markedly more pronounced than previously reported pentanucleotide context biases, such as those associated with UV light or DNA polymerase epsilon deficiency (Fig. 2C) (1618), and is not explained by the sequence composition of the canine exome (fig. S3). It is possible that signature A’s causative mutagen is highly context-specific, or, alternatively, that this signature’s associated repair processes are ineffective at certain sequence contexts (“repair shielding”) (19). In addition, signature A displays strong transcriptional strand bias, with more mutations of guanine on the untranscribed compared to the transcribed strand of genes, indicating that its causative lesion is likely a guanine adduct subject to transcription-coupled repair (TCR). Notably, the guanine-directed transcriptional strand bias of signature A at TCC contexts counteracts the cytosine-directed transcriptional strand bias of signature 7 at TCC, such that no overall transcriptional strand bias is observed at this context in the CTVT mutational spectrum (Fig. 2A).

Using the CTVT phylogenetic tree to isolate subsets of mutations, we explored variation in mutational signature exposure across time and space (figs. S4 and S5 and dataset S3). Notably, this revealed that signature A was highly active prior to ~2000 years ago (causing ~35% of mutations in the basal trunk of the tree, branch A1) and persisted in parallel at lower levels in the two basal branches after the first node (~12 and ~9% of mutations in branches A2 and A3, respectively), but then abruptly vanished (Fig. 2C and fig. S5). Moreover, signature A is not detectable within the germ line of a global population of 495 dogs (fig. S6). It is possible that signature A reflects the activity of an exogenous mutagen that was exclusively present in the environment that CTVT inhabited before its escape from its founding population. Alternatively, it is plausible that signature A may result from an endogenous DNA-damaging agent that occurred in CTVT cells early during the lineage’s history, but which ceased to accumulate from ~1000 years ago, perhaps as a result of a cellular metabolic change. Although the nature of such a change is unknown, the replacement of possibly defective mitochondrial DNA by horizontal transfer, which likely occurred in parallel in branches A2 and A3 within the past ~1690 years (11), may have altered the metabolic environment within CTVT cells.

Although CTVT usually occurs within the internal genital tract, it may sometimes protrude from the genital orifice or spread to perineal skin, resulting in sporadic exposure to solar UV radiation (12, 13). The amount of UV radiation reaching Earth, however, varies substantially across global environments (20). We investigated whether latitude influenced the degree of UV exposure in CTVT tumors by estimating the contribution of signature 7 within subsets of mutations acquired at known latitudes. Indeed, qualitative assessment of mutational spectra of location-specific CTVT mutation subsets suggests extensive variation in UV exposure; for example, the mutational spectra of tumors collected in Mauritius show considerably more evidence of signature 7 compared with those of tumors collected in Russia (fig. S4). Using CC>TT dinucleotide mutations (21) as a proxy for signature 7 (fig. S7), we identified a nonlinear association between latitude and UV exposure (Spearman’s correlation –0.40, 95% HPDI –0.65 to –0.14; Fig. 2D). By fitting CC>TT mutations observed in the basal trunk of the CTVT tree to this curve, we estimated the latitude of the CTVT founder population (Fig. 2, D and E) (8).

Examining the contribution of signature 5 across the CTVT lineage, we observed three independent phylogenetic groups of tumors that appear to have acquired signature 5 hyperactivity phenotypes (groups labeled 12 to 16, 20, and 40; Fig. 2, F and G, and figs. S4 and S5). In one case, involving tumors collected in several South and Central American countries (groups 12 to 16), the phenotype has been maintained for ~150 years. This phenotype is likely to result from signature 5 and not from the double-strand DNA repair deficiency–mediated COSMIC signature 3, which presents a similar mutational profile (5, 22), as we failed to observe the enrichment for indels that co-occurs with signature 3 (22, 23). It is possible, however, that these tumors were exposed to another, as yet undescribed, mutational process. Signature 5 is widespread in cancer and normal tissues and has unknown etiology, although it may be partly associated with endogenously generated adducts subject to nucleotide excision repair (5, 9, 18). We annotated nonsynonymous mutations occurring in the three groups’ respective clonal ancestors, providing a catalog of genes that may play a role in generation or suppression of signature 5 (dataset S4).

CTVT mutations and gene expression

The prevalence of substitution mutations in CTVT decreases with increasing gene expression, likely reflecting the activity of TCR operating on DNA damage associated with signatures 7 and A, as well as a signature 1 preference for genes with lower expression (16, 24, 25) (fig. S8, A and B). We observed that exons have a higher substitution prevalence than introns, possibly as a result of sequence context (figs. S8A and S9). The prevalence of indels is positively correlated with increasing gene expression, as has been observed in human cancers, and may reflect transcription-associated damage (26) (fig. S8A).

We assessed the contribution of TCR in two temporally distinct subsets of mutations: those acquired before the earliest detectable node in the phylogenetic tree (~8500 to 2000 years ago; branch A1 in Fig. 1A) and those acquired subsequent to this node (~2000 years ago to the present). Although C>T mutations acquired at TCC contexts in highly expressed genes in branch A1 have little strand bias, likely because of the opposing transcriptional strand preferences of signatures 7 and A at this context, those genes with very low expression predominantly show the transcriptional strand bias associated with signature A (fig. S8C). Assuming that the transcriptional strand bias observed in these low-expressed genes reflects earlier expression and subsequent silencing of genes, this suggests that there may have been an early period in CTVT evolution when the lineage was exposed to signature A more intensely than it was to signature 7. This may reflect variation in the climate or environment to which CTVT was exposed early in its history.

Selection in CTVT

CTVT has a massive mutation burden, which exceeds that observed in even the most highly mutated human cancer types (Fig. 3A). Each CTVT tumor carries on average 37,800 SNVs across its predominantly diploid (12) exome (~2 million SNVs genome-wide; table S2). Indeed, the tally of somatic mutations that have accumulated in CTVT since it departed its original host is comparable with the number of germline variants that distinguish some pairs of outbred dogs (fig. S10). Within the set of 546 tumors, 14,412 (~73%) protein-coding genes carry at least one nonsynonymous mutation, and 5704 (~29%) have mutations predicted to cause protein truncation (Fig. 3B).

Fig. 3 Selection in CTVT.

(A) Somatic SNV prevalence across six human cancer types and CTVT. Dots represent individual tumors; red lines indicate median SNV prevalence. ALL, acute lymphoblastic leukemia. (B) Bars showing the percentage of protein-coding genes in the CTVT genome harboring ≥1 nonsynonymous somatic mutation (SNV or indel; 14,412 genes) and ≥1 somatic protein-truncating somatic mutation (5704 genes). (C) Diagram presenting the putative driver events found in the set of basal trunk ancestral variants (group A1, Fig. 1A). A description of each somatic alteration is shown next to the corresponding gene symbol. (D) Exome-wide dN/dS ratios estimated for somatic SNVs in all protein-coding genes (left) and in sets of genes defined according to gene essentiality, copy number state, and expression level. Estimates of dN/dS are presented for missense (blue) and nonsense (orange) mutations in each gene group. The dashed line indicates dN/dS = 1 (neutrality); error bars indicate 95% confidence intervals.

We searched for evidence of positive selection in CTVT. The driver mutations that initially caused CTVT, and promoted its transmissible phenotype, will have occurred in the basal trunk of the CTVT tree. SETD2, CDKN2A, MYC [previously described (12, 27)], PTEN, and RB1, known cancer genes that frequently harbor driver mutations in human cancers (15), carry biallelic loss-of-function or potential activating mutations in the trunk and may be early drivers of CTVT (Fig. 3C and table S3). To search for late drivers, which may have been acquired in more recent parallel CTVT lineages, we identified independent mutations that occurred repeatedly across the tree, and measured the normalized ratio of nonsynonymous to synonymous mutations (dN/dS) per gene after correcting for mutational biases and context effects (8). This approach only yielded two uncharacterized genes with dN/dS > 1 (q-value < 0.05), predicted to encode a neuroligin precursor and a roundabout homolog (dataset S5). The potential for these genes to act as late drivers in CTVT cannot be assessed, and it is possible that local sequence structures may result in higher-than-expected recurrent mutation rates at these loci (28). Overall, we find little evidence that CTVT is continuing to adapt to its environment.

Negative selection, which acts to remove deleterious mutations, is very weak in human cancers (17, 29, 30). Human cancers have short life spans, and their evolution is dominated by sweeps of strong positive selection, thus reducing the potential for negative selection to act (17). Given its long life span, high mutation burden, and lack of ongoing positive selection, it is possible that negative selection may be a more dominant force in CTVT evolution. Further, unlike in ordinary cancers, intertumor competition may offer more opportunities for negative selection to manifest in CTVT, purging lineages less able to infect new hosts and spread through the host population. Indeed, negative selection has been detected operating on CTVT mitochondrial genomes (11). Our analysis of dN/dS in CTVT across all genes, however, yielded dN/dS ≈ 1 for both missense and nonsense mutations, indicating near-neutral evolution (Fig. 3D and dataset S5). Similarly, dN/dS did not differ from neutrality in genes categorized by expression level (Fig. 3D). Negative selection, acting both on missense and nonsense mutations, could be detected, however, in sets of genes with known essential functions (Fig. 3D) and was particularly pronounced for nonsense mutations in essential genes occurring in haploid regions (dN/dS = 0.33, p < 10−4). A slight signal of negative selection acting on nonsense mutations in haploid regions (dN/dS = 0.88, p = 0.027) is explained by 269 essential genes, as negative selection was not detected after removal of these genes (Fig. 3D and dataset S5). These results imply that CTVT largely evolves by neutral genetic drift. This may partly reflect functional obsolescence of many mammalian genes in this relatively simple parasitic cancer, as well as the buffering effect of CTVT’s largely diploid genome (12). However, it is also likely that transmission bottlenecks between hosts render weak selection inefficient. This may be expected to lead to the progressive accumulation of deleterious mutations in the population (Muller’s ratchet) (31), raising the possibility that CTVT may be declining in fitness despite its global success.


Studies of cancer evolution typically focus on how malignant clones alter during the first years, or perhaps decades, of their existence. We have tracked the evolution of a cancer over several thousand years, and compared the mutational processes and selective forces that molded its genome with those described in short-lived human cancers.

Our results suggest that neutral genetic drift may be the dominant evolutionary force operating on cancer over the long term, in contrast to the ongoing positive selection that is often observed in human cancers (7, 17). Thus, our results suggest that CTVT may have optimized its adaptation to the transmissible cancer niche early in its history. Subsequently acquired advantageous mutations may have offered incremental change of minimal benefit, such that they were insufficient to overcome the neutral effects of drift. Notably, since the 1980s, CTVT has been routinely treated with vincristine, a cytotoxic microtubule inhibitor (32). Despite the strong selection pressure imposed by vincristine treatment, we find no evidence of convergent evolution of vincristine resistance mechanisms in CTVT at the level of point mutations or indels.

The mechanisms whereby CTVT is tolerated by the host immune system, despite its status as an allogeneic graft, are poorly understood (33, 34). The weakness of negative selection beyond genes essential for cell viability implies that there are negligible selective pressures imposed by immunoediting of somatic neoepitopes at a genome-wide level. This is perhaps unsurprising, given the massive antigenic burden already presented by allogeneic epitopes. These findings support evidence that CTVT largely circumvents the adaptive immune system, at least during its initial stages of progressive tumor growth, perhaps in part through down-regulation of major histocompatibility complex molecules (13, 3436).

Our analyses reveal a mutational signature, signature A, that occurred in the past but ceased to be active from about 1000 years ago. A recent study (37) detected evidence for an excess of C>T mutations at TCC contexts, the mutation type most prevalent in signature A, accumulating in the human germ line between 15,000 and 2000 years ago. If this human mutation pulse is due to signature A, it could indicate a shared environmental exposure that was once widespread but has now disappeared. However, we find no evidence of an excess of C>T mutations at GTCCA pentanucleotides in the dog germ line, suggesting that dogs were not systemically exposed to signature A in their past. Further research will be required to elucidate the biological origin of signature A and the mechanism of its pronounced pentanucleotide sequence bias; however, this study highlights the potential for long-lived, widespread clonal organisms to act as biomarkers for the activity of mutational processes.

Genomic instability and ongoing positive selection are often considered key hallmarks of carcinogenesis (38). CTVT does not have an intrinsically high mutation rate (“genomic instability”), at least at the level of SNVs, and its vast mutation burden simply reflects the lineage’s age. We find no clear evidence for continued positive selection beyond initial truncal events. Thus, CTVT illustrates that, once spawned and sufficiently well-adapted to its niche, neither hallmark is necessary to sustain cancer over the long term.

CTVT is a singular biological entity. It is the oldest, most prolific, and most divergent cancer lineage known in nature; it has spread throughout the globe and has seeded its tumors in many thousands of dogs. Here, we have traced this cancer’s route through the steppes of Asia and Europe and as an unwelcome stowaway on global voyages. We have observed the patterns in its mutational profiles reflecting the dynamics of its exogenous and endogenous environment. Further, we have shown that CTVT largely evolves by neutral processes, and that the mutations that it continues to acquire may pose a threat, rather than an advantage, to its long-term fitness.

Materials and methods summary

The protein-coding genomes of 1051 CTVT and matched host samples were captured and sequenced on an Illumina HiSeq2000 instrument. Germline and somatic variants were identified using a bespoke computational pipeline based on Platypus (39) and annotated with the Ensembl Variant Effect Predictor (40). A phylogenetic tree was inferred using RAxML (41), and time of origin was estimated with a Poisson Bayesian model using information about clonal somatic variation from a case of direct CTVT transmission (10). Probabilistic estimates of tree branch lengths and divergence times were obtained using BEAST (42). Mutational signatures and exposures were inferred from subsets of somatic variants using the sigfit R package (14). Selection was assessed by global and genewise estimates of somatic dN/dS, obtained using the dNdScv R package (17).

Supplementary Materials

Materials and Methods

Figs. S1 to S16

Tables S1 to S4

Datasets S1 to S6

References (4470)

References and Notes

  1. Materials and methods are available as supplementary materials.
Acknowledgments: We acknowledge the Core Sequencing Facility, IT groups, and members of the Cancer Genome Project at the Wellcome Sanger Institute. We thank the following individuals for useful information and for their help obtaining samples for this project: I. Airikkala-Otter, J. Alzate-Ocampo, D. Argüello, J. I. Arias, C. L. Arnold, S. Barrass, E. Batrakova, R. Bortolotti Viéra, N. Brown, F. C. Casas, J. Cooper, A. C. Cotacachi, S. M. Cutter, J. de Vos, L. Dmytro, P. Farnham, A. Fassati, A. Fernandez-Riomalo, R. Gaitan, D. Hanzlíček, R. R. Huppes, J. M. Igundu, M. Jimenez-Coello, D. Kamstock, P. Kelly, T. Korytina, A. Kuznetsova, G. E. Lavalle, O. A. Lawal, T. Lerotholi, M. Lima-Maigua, J. Loayza-Feijoo, M. López-Bucheli, M. Maina, M. Mancero-Albuja, C. Marchiori Bueno, L. Martínez-López, A. Martínez-Meza, B. M. Masuruli, T. M. Morata Raposo, J. Mulholland, C. Murgia, A. Murison Swartz, F. Nargi, M. M. Onsare, E. Ortiz-Rodríguez, E. Peach, L. Pellegrini, G. Polton, F. Proaño-Pérez, J. C. Ramirez-Ante, C. Raw, C. Semedo, I. Stoikov, I. Swarisch, M. Tinucci Costa, E. Turitto, M. R. Vural, D. Walker, R. Weiss, K. Xie, M. Zandvliet, staff at Animal Medical Centre Belize City (Belize), veterinary surgeons and staff at Help in Suffering (Jaipur, India), staff at Hopkins Belize Humane Society (Belize), veterinary workers at Pet Centre (UVAS, Lahore, Pakistan), students from St. George's University (True Blue, Grenada, West Indies) who assisted with sample collection, staff at Veterinary Clinic “El Roble” (Chile), staff and volunteers at World Vets (Gig Harbor, USA), and staff at the WVS International Training Centre in Ooty (India). We are grateful to the following organizations for helpful information: American College of Veterinary Internal Medicine (ACVIM), Animal Balance, Animal Care Association (The Gambia), Animal Management in Rural and Remote Indigenous Communities (AMRRIC), Associação Bons Amigos de Cabo Verde, Humane Society of Cozumel, Humane Society Veterinary Medical Association–Rural Area Veterinary Services (HSVMA–RAVS), Israel Veterinary Medical Association, Italian Veterinary Oncology Society, Rural Vets South Africa, Veterinary Cancer Society, Veterinary Society of Surgical Oncology (VSSO), VetPharma, Vets Beyond Borders, ViDAS and Coco’s Animal Welfare, The Spanky Project, VWB/VSF Canada, West Arnhem Land Dog Health Program (WALDHeP), World Small Animal Veterinary Association (WSAVA), МИР ВЕТЕРИНАРИИ (World Veterinary Medicine). Funding: This work was supported by Wellcome (102942/Z/13/A) and by a Philip Leverhulme Prize from the Leverhulme Trust. A.St. was supported by a Postgraduate Student Award from the Kennel Club Charitable Trust. Author contributions: E.P.M. designed and directed the project. A.B.-O. developed methods and led computational data analysis. K.G. developed methods and assisted with computational analysis. A.St. collected samples, performed laboratory work, designed exome probes, oversaw sequencing and provided conceptual advice. J.L.A., K.M.A., L.B.-I., T.N.B., J.L.B., C.B., A.C.D., A.M.C., H.R.C., J.T.C., E.D., K.F.d.C., A.B.d.N., A.P.d.V., L.D.K., E.M.D., A.R.E.H., I.A.F., M.F., E.F., S.N.F, F.G.-A., O.G., P.G.G., R.F.H.M., J.J.G.P.H., R.S.H., N.I., Y.K., C.K., D.K., A.K., S.J.K., M.L.-P., M.L., A.M.L.Q., T.L., G.M., S.M.C., M.F.M.-L., M.M., E.J.M., B.N., K.B.N., W.N., S.J.N., A.O.-P., F.P.-O., M.C.P., K.P., R.J.P., J.F.R., J.R.G., H.S., S.K.S., O.S., A.G.S., A.E.S.-S., A.Sv., L.J.T.M., I.T.N., C.G.T., E.M.T., M.G.v.d.W., B.A.V., S.A.V., O.W., A.S.W.-M. and S.A.E.W. provided clinical samples. Y.-M.K., M.N.L., and M.S. assisted with analysis and contributed to interpretation of results. J.W. contributed to sample management and curation. M.R.S., L.B.A., and I.M. provided technical advice and assisted with interpretation of results. A.B.-O. and E.P.M. wrote the manuscript and designed the figures. All authors commented on the manuscript. Competing interests: The authors declare no competing interests. Data and materials availability: Whole-exome sequence data have been deposited in the European Nucleotide Archive (ENA; under overarching accession number ERP109580. Variant calling data and other data supporting analyses have been deposited in the University of Cambridge Repository ( (43). Custom algorithms employed for data processing and analysis are available in GitHub (

Stay Connected to Science

Navigate This Article