Research Article

Rearrangement bursts generate canonical gene fusions in bone and soft tissue tumors

See allHide authors and affiliations

Science  31 Aug 2018:
Vol. 361, Issue 6405, eaam8419
DOI: 10.1126/science.aam8419

Looping together genes in cancer

A subset of human cancers are characterized by aberrant fusion of two specific genes. In some cases, the activity of the resultant fusion protein drives tumor growth. Most fusion genes in cancer appear to arise from simple reciprocal chromosomal translocations. Anderson et al. found that the characteristic fusion gene in a bone and soft tissue tumor called Ewing sarcoma is produced by a far more complicated mechanism (see the Perspective by Imielinski and Ladanyi). In nearly half of the tumors examined, the fusion gene was created by the formation of dramatic genomic loops that disrupt multiple genes. These complex rearrangements occur in early replicating and transcriptionally active regions of the genome and are associated with poor prognosis.

Science, this issue p. eaam8419; see also p. 848

Structured Abstract

INTRODUCTION

Gene fusions are often disease-defining events in cancer. The mutational processes that give rise to fusions, their timing relative to initial diagnosis, and whether they change at relapse are largely unknown. Mutational processes leave distinct marks in the tumor genome, meaning that DNA sequencing can be used to reconstruct how fusions are generated. A prototypical fusion-driven tumor is Ewing sarcoma (ES), a bone cancer predominantly affecting children and young adults. ES is defined by fusions involving EWSR1, a gene encoding an RNA binding protein, and genes encoding E26 transformation-specific (ETS) transcription factors such as FLI1. We sought to reconstruct the genomic events that give rise to EWSR1-ETS fusions in ES and chart their evolution from diagnosis to relapse.

RATIONALE

We studied the processes underpinning gene fusions in ES using the whole-genome sequences of 124 primary tumors. We determined the timing of the emergence of EWSR1 fusions relative to other mutations. To measure ongoing mutation rates and evolutionary trajectories of ES, we studied the genomes of primary tumors, tumors at relapse, and metastatic tumors.

RESULTS

We found that EWSR1-ETS, the key ES fusion, arises in 42% of cases via complex, loop-like rearrangements called chromoplexy, rather than by simple reciprocal translocations. Similar loops forming canonical fusions were found in three other sarcoma types. Timing the emergence of loops revealed that they occur as bursts in early replicating DNA, as a primary event in ES development. Additional gene disruptions are generated concurrently with the fusions within the loops. Chromoplexy-generated EWSR1 fusions appear to be associated with an aggressive form of the disease and a higher chance of relapse. Numerous mutations present in every cell of the primary were absent at relapse, demonstrating that the primary and relapsed diseases evolved independently. This divergence occurs after formation of an ancestral clone harboring EWSR1 fusions. Importantly, we determined that divergence of the primary tumor and the future relapsed tumor occurs 1 to 2 years before initial diagnosis, as estimated from the number of cell division–associated mutations.

CONCLUSION

Our findings provide insights into the pathogenesis and natural history of human sarcomas. They reveal complex DNA rearrangements to be a mutational process underpinning gene fusions in a large proportion of ES. Similar observations in other fusion-defined sarcoma types indicate that this process operates more generally. Such complex rearrangements occur preferentially in early replicating and transcriptionally active genomic regions, as evidenced by the additional genes disrupted. EWSR1 fusions arising from chromoplexy correlated with worse clinical outcomes. Formation of the EWSR1 fusion genes is a primary event in the life history of ES. We found evidence of a latency period between this seeding event and diagnosis. This is in keeping with the often-indolent nature of symptoms before clinical disease presentation.

Timing of mutations in a patient with ES.

The schematic shows genetic alterations in tumors at prediagnosis, diagnosis, and relapse. In many cases, the fusion gene that drives tumorigenesis (EWSR1-FLI1 or EWSR1-ERG) emerges via a sudden burst of genomic rearrangements involving multiple chromosomes and genes. This event, called chromoplexy (indicated by the starburst), happens early in the evolution of the disease in a prediagnostic lesion. After this event, the diagnostic and relapsed tumors evolve in parallel. In this model, the clone that would ultimately become the relapsed tumor was already present at the time of initial diagnosis, although it was undetectable.

Abstract

Sarcomas are cancers of the bone and soft tissue often defined by gene fusions. Ewing sarcoma involves fusions between EWSR1, a gene encoding an RNA binding protein, and E26 transformation-specific (ETS) transcription factors. We explored how and when EWSR1-ETS fusions arise by studying the whole genomes of Ewing sarcomas. In 52 of 124 (42%) of tumors, the fusion gene arises by a sudden burst of complex, loop-like rearrangements, a process called chromoplexy, rather than by simple reciprocal translocations. These loops always contained the disease-defining fusion at the center, but they disrupted multiple additional genes. The loops occurred preferentially in early replicating and transcriptionally active genomic regions. Similar loops forming canonical fusions were found in three other sarcoma types. Chromoplexy-generated fusions appear to be associated with an aggressive form of Ewing sarcoma. These loops arise early, giving rise to both primary and relapse Ewing sarcoma tumors, which can continue to evolve in parallel.

Genomic rearrangements (structural variants) are a ubiquitous source of somatic mutation in human cancer. They arise from breaks in chromosomes, which are then aberrantly rejoined. Rearrangements may occur in isolation or in the context of complex genomic catastrophes that shatter chromosomes [chromothripsis (1)] or join chromosomes in chains or loop structures [chromoplexy (2)]. Rearrangements can generate cancer-driving mutations through several mechanisms, including the formation of gene fusions. Typically, fusions are fashioned by translocations that are often reciprocal. An exception is the prostate cancer fusion gene, TMPRSS2-ERG, which can occur through chromoplexy (2).

Oncogenic gene fusions are particularly common in leukemia and bone and soft tissue tumors (3), often acting as the sole driver mutation and delineating clinically relevant tumor entities and subgroups. In leukemia, recombination-activating gene (RAG)–mediated structural variation has been identified as the leading mutational process that creates canonical gene fusions and drives oncogenesis through translocations and deletions (4). Here we sought to investigate processes and timing of oncogenic fusions in human bone and soft tissue tumors.

The starting point of our investigation was Ewing sarcoma (ES), a bone and soft tissue cancer predominantly diagnosed in adolescents and young adults. It represents the prototypical fusion-driven sarcoma, defined by fusions between EWSR1, a gene encoding an RNA binding protein, and genes encoding E26 transformation-specific (ETS) transcription factors, including FLI1 and ERG (5). Although the downstream consequences of EWSR1-ETS fusion genes are well established (6), the timing and mechanism by which they arise are unknown.

Burden and signatures of substitution mutations in ES

We sequenced either the gene-containing portions or the whole genomes of 50 ES tumors and their matched normal DNA (complete sequencing details in table S1). We used a conventional analysis pipeline to call somatic substitutions and rearrangements, with additional custom software to remove recurrent artifacts and sources of false positives (see methods and fig. S1). Overall, and consistent with previous reports (710), the ES genome is genetically quiet, with few somatic substitutions identified (median of <1 mutation/Mb; Fig. 1A).

Fig. 1 Mutation landscape of ES.

(A to C) The initial cohort consisted of 50 individuals with primary ES tumors, from which 23 tumors were whole-genome sequenced (Toronto cohort, left). One rearrangement screen sample (sample 4462) is included in this figure. The validation cohort consisted of individuals from which 100 ES whole genomes were sampled, from Tirode et al. (10) (right). (A) Somatic mutation burden for ES. The mutation burdens of all genome samples are shown. Three outlier samples with >2 mutations/Mb are clipped, as indicated by the hatch marks. (B) ES mutation signatures. Mutation signature analysis, defined by the proportion of 96 possible trinucleotides, identified common mutation patterns in most samples (age-associated, clock-like signature 1). Other signatures included 2, 5, 8, 13, 18, and 31. Signatures 2 and 13 are associated with the activity of the AID/APOBEC family of cytidine deaminase, whereas signature 5 is also clock-like in some cancers but less so in ES (11, 13). Signatures 8 and 18 have an unknown molecular etiology; however, it has been suggested that signature 18 is caused by reactive oxygen species (43). Signature 31 is believed to be the result of exposure to platinum-based antineoplastic therapy (24). (C) Rearrangement profiles for ES. Shown are the burden of structural variants (SVs)—including deletions (blue), duplications (red), inversions (green), and translocations (purple)—in individual ES genomes. Samples with chained complex rearrangements (looped rearrangements) are highlighted by red arrows (14/24 for the Toronto cohort and 38/100 for the validation cohort; aggregated prevalence, 52/124). (D) Rearrangement breakpoint clusters. The aggregated density distributions of the genomic distance between consecutive rearrangement breakpoints are shown. Reciprocal breakpoints are close together (~102 bp) because there is an equal exchange of genetic material arising from a single break on each chromosome. Chromoplectic rearrangements (red) overlap this range owing to the proximity of breakpoints involved in looped rearrangements. Deletion-bridge (DB) chromoplexy (purple) is a looped rearrangement cluster in which a deletion spans two breakpoints, resulting in breakpoint distances that are farther apart (illustrated in fig. S11). Noncomplex breakpoints (simple SVs) are far apart (~108 bp). (E) Schematic diagram of chromoplexy fusion loops. Illustrative example of chromoplexy in ES shows three chromosomes undergoing double-strand breakage, shuffling, and religation in an aberrant configuration. This phenomenon generates the canonical fusion, EWSR1-FLI1 (ERG or ETV1), and disrupts a third locus, X, in a one-off burst of rearrangements. In reality, up to eight chromosomes may be disrupted in this looping pattern. A representative genome-wide Circos plot depicting genomic rearrangements in an ES tumor (from the discovery cohort), which are organized in a loop. (F) Genomic correlates and clinical impact of looped rearrangements. In genomes without rearrangement loops, only simple structural variants (SSVs) exist with an average rearrangement burden of seven rearrangements per sample. This rate is similar to the background SSV rate (determined by removing rearrangements involved in a loop) in genomes with rearrangement bursts (compare the two red lines). The additional complexity of looped rearrangements results in higher genomic instability in these tumors. The most common genomic alterations include somatic TP53 mutations, which are rare but enriched in patients with complex genomes (top pie chart, orange fraction; P < 0.05, Fisher’s exact test). EWS-ERG fusions are also rare, as they represent 10% of all ES diagnoses; however, all EWS-ERG fusion ES tumors are either chomothriptic or chromoplectic (middle pie chart, orange fraction). Lastly, patients with complex genomes tend to relapse (bottom pie chart, orange fraction; P < 0.05, Fisher’s exact test). All the markers of aggressive disease (high genomic instability, somatic TP53 mutations, and relapse) are present in tumors with complex genomes.

We next asked if the collection of all mutations, when considered together, could help highlight consistent mutagenic processes in ES. We extracted mutational signatures using an established method that allows for the discovery of new signatures. Despite the young age of patients with ES, and overall low number of mutations, we found that the tumors contained at least seven distinct signatures, all of which matched patterns seen in adult cancers [Catalogue of Somatic Mutations in Cancer (COSMIC) signatures 1, 2, 5, 8, 13, 18, and 31; Fig. 1B and fig. S2A] (11, 12). Two of these (signatures 1 and 5) were nearly universal and associated with patient age. Signature 1 generated a steady rate of seven mutations per gigabase per year, which is similar to that of adult ovarian and breast cancers (fig. S2B) (13). An overview of the somatic architecture and mutational signatures of each tumor in our discovery cohort is shown in Fig. 1, A to C (left panels, Toronto cohort).

Chromoplexy rearrangement loops are common in aggressive ES

Having observed few small mutations, we then focused our attention on structural rearrangements. We applied a bespoke analysis tool to detect clustered rearrangements from whole-genome data, defined as having an inter-rearrangement distance of <10 kilobase pairs (kbp) (see methods). Using a computational data structure that modeled adjacent breakpoints as vertices and interconnected rearrangements as edges in a graph, we uncovered several distinct configurations of rearrangement clusters (Fig. 1D). As expected, one configuration of rearrangement clusters was a result of reciprocal rearrangements, where there is an equal exchange of genetic material and overlapping breakpoints. These were isolated rearrangements that occurred without additional breakpoints nearby. A second configuration, seen in 14/24 tumor genomes, was a distinctive pattern of focal clustered events with nearly overlapping junctions, organized as closed loops (distance of <30 bp; Fig. 1D, red distribution). That is, if one follows these complex rearrangements across their multiple constituent chromosomes, one is ultimately brought back to the point of departure. Importantly, the loops were nearly always centered on EWSR1-ETS (Fig. 1E). These abutting rearrangements that occur in a loop resemble a pattern of chromoplexy, akin to the loops of the prostate cancer fusion gene, TMPRSS2-ERG. Of note, the EWSR1-ERG fusion was always generated by a complex mechanism, whereas EWSR1-FLI1 arose with or without this mechanism (fig. S3A). This is likely due to the opposite gene orientation of EWSR1 relative to ERG on their respective chromosome arms. A simple two-chromosome break rearrangement cannot place the genes in the correct transcriptional orientation, necessitating more complex chromosomal rearrangements for fusion formation. Other than this difference, chromoplexy in ERG- and FLI1-driven tumors was very similar (fig. S3B).

In all cases, we resolved the breakpoints and found positions largely consistent with type I or type II ES (14). In the most complex case of chromoplexy, up to 18 genes were brought together with the canonical fusion on the same derivative chromosome (fig. S3C; the full list of genes affecting all samples is shown in fig. S3D). We validated chromoplectic looped rearrangements by deep sequencing or by cytogenetic analysis using standard G-banding and spectral karyotyping (methods and fig. S4). Using RNA sequencing, we found that chromoplectic loops universally disrupted the reciprocal fusion (FLI1-EWSR1); 52% of the cancers with simple rearrangements expressed the reciprocal fusion, but none of the chromoplectic tumors expressed it (n = 27, fig. S5). For further validation of chromoplexy in ES, we reanalyzed a published, independent cohort of 100 ES patients, whose genomes had been sequenced, using our informatics pipeline (10). The somatic architecture and mutational signature of the validation cohort is shown in Fig. 1, A to C (right panels, validation cohort). Both cohorts harbored copy number profiles consistent with previous reports (fig. S6) (10). With this series, the aggregated prevalence of chromoplectic EWSR1-ETS gene fusions was 42% (52/124).

Patients with relapsed ES have a poor survival rate, and new prognostic markers are needed. We evaluated the association between chromoplexy, patient outcomes, and known markers of worse prognosis. We found that higher overall genomic complexity, a marker of aggressive ES (10, 15), was almost completely explained by chromoplectic rearrangements (Fig. 1F). By contrast, there was no difference in the burden of nonchromoplectic rearrangements. Similarly, TP53 mutations, another established marker of poor prognosis (10, 16), were enriched in chromoplexy ES (16 versus 3%, P < 0.05, Fisher’s exact test). There was no enrichment for CDKN2A or STAG2 mutations (fig. S7). Finally, and consistent with the above, patients with chromoplexy ES were more likely to relapse (54 versus 30%, P < 0.05, Fisher’s exact test), strongly suggesting that it marks a more aggressive variant of ES.

Chromoplexy generates the key fusion in other bone and soft tissue tumors

We next widened our search across four different benign and malignant bone and soft tissue tumor types for which canonical gene fusions have been identified (table S2). We subjected 13 tumors to high- or low-coverage whole-genome sequencing, and RNA sequencing where feasible. In three tumor types—chondromyxoid fibroma, synovial sarcoma, and phosphaturic mesenchymal tumors—we found that chromoplectic rearrangements (occurring in a similar looped formation) did indeed generate canonical gene fusions (Fig. 2). Furthermore, in one of the chondromyxoid fibroma cases, the fusion emerged from chromothripsis across seven different chromosomes (fig. S8, CMF 2). Chromothripsis was seen in five ES cases, of which four involved the canonical fusion. Taken together, these findings in human bone and soft tissue tumors show that canonical gene fusions are frequently caused by complex rearrangement processes, predominantly chromoplexy but also chromothripsis.

Fig. 2 Genomic catastrophes are common in sarcomas.

Copy number profile for fusion-driven sarcomas with chromoplexy are shown. Rearrangements are colored red, and the loci with the canonical fusion are highlighted (blue box) and enlarged on the right. (A) Chondromyxoid fibroma (CMF) with chromoplexy. The genomic breakpoint lies in the upstream SHPRH gene, and the BCLAF1-GRM1 fusion was detected by RNA sequencing. Additional complex CMFs, which also show a SHPRH genomic breakpoint but with the GRM1 fusion found in the RNA, can be found in fig. S8. (B) Synovial sarcoma with chromoplexy. Chromoplexy generating the SS18-SSX1 pathognomonic canonical fusion is shown. (C) Phosphaturic mesenchymal tumor (PMT) with chromoplexy. Genome sequencing of PMTs revealed deletion bridges occurring across the genome at chromoplectic loci, generating the canonical FN1-FGFR1 fusion.

We examined the microanatomy of chromoplexy fusion loops at base-pair resolution, comparing ES to a published series of prostate cancers (2). EWSR1-ETS ES loops were less complex than TMPRSS2-ERG prostate cancer loops, with fewer rearrangements and individual loops involved in their generation (2 to 10 rearrangements in 1 to 2 loops compared with up to 130 rearrangements in up to 25 loops in prostate cancer). This may be a consequence of the ES genome having a shorter time frame to mutate compared to the prostate cancer genome. Consistent with this proposition, multiple independent chromoplexy loops can exist in older prostate cancers, compared to the one simple loop seen in ES (17). In contrast to ES, where chromoplexy is virtually synonymous with the disease-defining fusion, several chromoplexy fusion loops occur in prostate cancer without necessarily forming the TMPRSS2-ERG fusion. When a loop was present in ES, it almost always generated the EWSR1-ETS fusion (47/52 cases, 90%) (Fig. 3, A and B, and figs. S9 and S10).

Fig. 3 Characterizing chromoplexy loops that generate EWSR1-ETS in ES.

(A) Patterns of looped rearrangements. Chromoplexy circos webs demonstrate that patterns of looped rearrangements are conserved across samples, whereas different genes or loci are affected in each cancer (black panels). In each web, individual samples are indicated by different colors (and named in the gray panel). In all cases, central to chromoplexy fusion loops were the key driver genes: EWSR1 (blue), FLI1 (green), and ERG (purple). The most frequent patterns of chromoplexy in ES are those with a three-way looping structure as well as the presence of deletion bridges. An enlarged Circos web can be found in fig. S9 for readability. Three samples have structures only involving EWSR1, FLI1, and adjacent loci. Sample 4004 has deletion-bridge chromoplexy and is described in fig. S3C. (B) Summary of chromoplexy types. A bar chart showing the number of rearrangements in a loop (x axis) and the number of samples with that rearrangement pattern is shown. Other chromoplexy web structures can be found in fig. S10. (C) Transcriptional consequences of chromoplexy: Gene disruptions and fusions. There are three mechanisms of gene dysregulation via RNA fusion when chromoplexy occurs. The first involves two genes (blue and purple boxes) brought together by chromoplectic rearrangements (black lines with arrowheads), leading to gene disruptions (first scenario, shown at the top) and in-frame fusions (second scenario). This was detected in the 3/10 cases where there was genome (with chromoplexy) and transcriptome sequencing available. When RNA sequencing was not available, these were predicted to cause fusions (n = 47, excluding the EWSR1-ETS driver) and gene disruptions by fusing genes in opposite transcriptional orientation or fusing a gene to an intergenic sequence (n = 168). The second mechanism involves two chromoplexy genes brought together by a rearrangement at the genomic level, but one of the partner’s neighboring genes (green box) is transcriptionally fused to the other chromoplexy partner in its place (third scenario). This is also the predominant mechanism of GRM1 fusion generation in chondromyxoid fibromas (fig. S8). The final mechanism of gene dysregulation occurs when chromoplexy facilitates the production of a fusion by acting as a molecular scaffold (fourth scenario; also illustrated in fig. S12). Two genes are both rearranged to a third locus (orange) and are then transcriptionally fused together. No direct genomic link exists between these two genes. These phenomena can only be detected if both whole-genome and RNA sequencing are available. (D) Transcriptional consequences of chromoplexy: Gene expression. Volcano plot illustrating the differential gene expression in chromoplexy versus nonchromoplexy ES, revealing 504 differentially expressed genes. Points greater than 1 or less than −1 and above 1.3 (as indicated by the red lines) are genes that are significantly differentially expressed (blue dots). Red dots highlight genes that are differentially expressed and involved in a cancer hallmark pathway.

Transcriptional disruptions are associated with chromoplexy

These loops also led to targeted disruptions or fusions between genes brought together directly through chromoplexy (n = 168 gene disruptions, and n = 47 fusions; Fig. 3C). Given that chromoplexy appeared to mark an aggressive form of ES, we wondered if its gene expression program was globally different—above and beyond the immediate, focal structural consequences listed above. We identified 504 differentially expressed genes in chromopletic ES compared to simple ES (P < 0.001, Student’s t test; Fig. 3D). Gene set enrichment analysis of well-curated pathways (18) uncovered a significant enrichment of dysregulated genes in established cancer hallmark pathways (table S3).

Both prostate cancer and ES loops were characterized by focal intrachromosomal rearrangements, so-called deletion bridges (2), that acted as local mediators of large-scale loops (illustrated in fig. S11). We found deletion bridges in ~60% (30/52) of chromoplectic ES. In contrast to the bridges observed in prostate cancer, more than a third of bridges are utilized in ES in a highly consistent manner. That is, if a deletion bridge was found in one component of the loop, it would occur on all chromosomes. For example, in sample 2226, we observed 13 rearrangements spanning three chromosomes, all of which involved deletion bridges. These bridged chromoplectic rearrangements fused EWSR1-FLI1 and disrupted the neighboring gene, AP1B1, as well as the known cancer gene ARID1B. Thus, deletion bridges can create further oncogenic disruptions.

We also observed a remarkable pattern of splicing, whereby the transcriptional machinery further refined the looped rearrangements found in the genome. In chondromyxoid fibromas with chromoplectic GRM1 fusions (3/4 cases), the rearrangement breakpoint did not actually reside within the GRM1 gene body. Rather, the breakpoint was found in the upstream gene SHPRH within a narrow window (fig. S8). Thus, chromoplexy together with conventional splicing leads to the promoter swap that is characteristic of this tumor [see (19)]. Interestingly, we also observed the transcriptional generation of gene fusions in ES. Examination of the transcriptomic consequences of loops showed that genes that were unconnected at the DNA level were brought together, in cis, at the mRNA level. This included examples of the EWSR1-ETS fusion itself (Fig. 3D and fig. S12). In the cases reported here, no direct rearrangement links EWSR1 and FLI1; however, they are linked via two rearrangements to a third locus. In this way, chromoplexy generates the canonical driver gene through a chromoplexy scaffolding event.

Chromoplexy occurs early in tumor evolution through a replication-associated mechanism

We next studied the timing of chromoplexy rearrangements in tumor evolution. Chromoplexy may arise from a one-off sudden event, generating many breakpoints simultaneously, or through stepwise progressive bursts of mutations in succession (2). To differentiate between these two modes of evolution, we used DNA copy number profiling associated with the breakpoints of chromoplexy rearrangements to assess the copy number of neochromosomes. A low number of copy number states (three or fewer) is associated with a one-off mutational event, because breakage and ligation can only involve a small number of chromosomes inside a cell at any given time (20, 21). By contrast, stepwise progression would result in multiple copy number states owing to the possibility of copy number alterations arising within older copy number alterations. Chromoplectic breakpoints involve many chromosomes and are not associated with any copy number alterations (fig. S13). That is, these looped rearrangements across the genome are balanced. In addition, using a bespoke algorithm, we found that the allele frequency of chromoplectic breakpoints was higher than that of simple structural rearrangements, providing further evidence that these breakpoints occurred together and early in tumor development (methods and fig. S14). Given their extremely tight clustering, low number of copy number–state transitions, and consistent clonal variant-allele frequency, EWSR1-ETS loops are likely to have arisen from singular bursts of rearrangements.

We then examined whether genomic regions of loop breakpoints share properties that might predispose these regions to simultaneous rearrangement. We performed a comprehensive analysis of 38 genomic properties, including adjacency to histone marks, association with replication timing, and proximity to genes and repetitive or transposable elements (table S4). Of these properties, early replicating DNA and features consistent with this were the most strongly associated with chromoplexy loops (P < 1.0 × 10−36, Mann-Whitney U rank sum test and Benjamini-Hochberg correction; Fig. 4, A and B). In notable contrast, neither nonlooped simple breakpoints of ES nor simulated simple breakpoints were significantly associated with replication timing or, indeed, any other feature (see methods). Replication timing is known to be strongly correlated with gene activity, chromatin accessibility, and nuclear position (22). Accordingly, chromoplectic breakpoint positions were also strongly associated with high gene density and high GC content (fig. S15A). Conversely, lamina-associated domains, which are enriched in late-replication regions and repressive chromatin environments, were found to be negatively associated with chromoplectic rearrangements. These significant associations were upheld when breakpoints directly residing in EWSR1, FLI1, or ERG were removed from the analyses. Of note, the same associations were found for looped rearrangements of ETS-positive (ETS+) prostate cancers but not for simple prostate cancer rearrangements (fig. S15B). Of further interest, we noted that the genes affected by chromoplexy were among the most highly expressed in ES across all patients (top 20%; fig. S16). Most expressed genes are found in early replicating DNA (23). These data are consistent with the proposed model of chromoplexy in which DNA is colocalized in transcription hubs, allowing for multiple genes from many chromosomes to be broken, shuffled, and aberrantly ligated (2).

Fig. 4 Early replicating DNA and chromoplexy.

(A) Heatmap of genomic property associations. The genomic properties listed in table S4 were calculated for all rearrangements in both cohorts. Complex rearrangements (chromoplexy and chromothripsis), exclusively, are strongly associated with early replication timing and other genomic features consistent with this feature (gene density, CpG density, Alu density, etc.). Table values are indicative of false discovery rate–corrected P values compared to a million random points in the genome. Blue highlights in the figure are indicative of a Cohen’s d greater than or equal to 0.3. Bold boxes indicate a positive (red, enrichment) or negative (blue, depletion) association with the feature. All features were evaluated in 1-kb bins across the genome. For feature density metrics, associations were calculated in 1-Mb sliding windows centered in 1-kb bins. (B) Density distribution of the average wavelet-smoothed signal (WSS) and single-nucleotide variations (SNVs) on a representative chromosome. The average WSS, of replication timing, is plotted for a subset of chromosome 6 to illustrate changes between early and late replication timing and the co-association with mutations in ES. The positional variation of replication timing across the chromosome is depicted as changes in density and color. Point mutations peak in late-replicating regions (dip in WSS, light purple), whereas complex rearrangements peak in regions of early replication timing (peak in WSS, dark purple).

Mutation patterns of relapsed and metastatic ES

As noted above, chromoplexy arises early in the evolutionary history of ES and portends a worse prognosis and possible relapse. The genetic makeup of relapsed ES is largely unknown because standard of care for ES does not typically involve rebiopsy of the cancer when the disease returns or has metastasized. Therefore, whether further mutations—chromoplectic or otherwise—emerge at relapse is unclear, because few samples have been available. We obtained samples from six relapse, metastatic, or secondary tumors and performed whole-genome sequence analysis as well as full mutation and signature analyses (Fig. 5A). This included two primary-metastasis pairs, two primary-relapse pairs, one unpaired metastasis, and one tumor in which ES was secondary to a different cancer. Notably, every relapse or metastatic tumor contained the chromoplexy-associated fusion, whether it was from a metastasis at the time of diagnosis or a relapse arising later (Fig. 5B). The pattern of point mutations was also distinct. There was an enrichment of signatures 8 and/or 18, in addition to the clocklike signature seen at diagnosis, suggesting that new processes drive relapse and metastatic ES (Fig. 5B). For example, in one patient’s tumor, we found a pronounced increase of COSMIC signature 31, which has been recently associated with exposure to platinum therapy in chronic myelomonocytic leukemia (24). Notably, our patient had been treated with carboplatin for retinoblastoma 3 years before diagnosis of ES. At least three other patients in the validation cohort had a similar signature in their ES, which may also have been treatment induced (Fig. 1B).

Fig. 5 Mutation signatures of relapse and metastatic ES tumors.

(A) Prevalence of mutation signatures in relapse and metastatic (Met) tumors. Shared and private mutations for four primary-metastatic or relapse pairs are shown (first four columns). Signatures 1 and 5 are common throughout, with signature 5 contributing considerably to the mutations that arise at relapse. Signature 8 was also common throughout the cohort. One metastatic tumor (no paired primary) is also shown to have similar mutation signature patterns as other metastatic or relapse tumors. Lastly, a secondary ES tumor to a primary retinoblastoma (germline RB1 mutation identified) was also sequenced in this cohort. This patient harbored the rare signature 31, which likely resulted from the patient’s prior exposure to carboplatin for their primary retinoblastoma (the only patient to receive this treatment in the Toronto cohort). (B) Phylogenetic trees of primary-relapse and primary-metastatic ES. Using the shared and private mutations, we identified the mutational order in ES. Known cancer-driver mutations (IDH1, TP53, etc.) arise early (shared branches). LOH, loss of heterozygosity; del, deletion; inv, inversion. (C) A clonal PTEN inversion. A PTEN inversion was found in the primary but not in the relapse tissue, suggesting that the inversion arose after early divergence of a common clonal ancestor. However, a pathogenic PTEN SNV can be found in the relapse tissue. Together, these point toward parallel, convergent evolution on this gene. (D) Proposed model of ES tumor evolution. After birth, signature 1 is operative in all somatic tissues throughout life. ES patients’ cells experience a replication-associated burst of rearrangements that generates the canonical fusion driver. Early somatic cancer gene mutations occur before clonal bifurcation. This occurs 1 to 2 years before an ES diagnosis; thus, the cells that would give rise to the relapse existed years before diagnosis. Signature 5 contributes substantially to the number of mutations seen at relapse.

Early divergence and parallel evolution of ES tumors

A predominant model for tumor progression posits that metastases originate directly from the primary tumor. The metastatic lesions may have acquired new mutations, but, because they are thought to be derived by linear clonal evolution, most of the genomic properties of the primary tumor will be found in the metastasis (25). A different model was suggested for ES, on the basis of mutation data from two primary-metastasis pairs whose exomes were sequenced (8). Specifically, it was proposed that metastases diverged from the primary tumor early, although the timing of this divergence was not established. We compared coding, noncoding, and structural rearrangements across the genome within four ES pairs. As is the case in most tumor types, relapse and metastatic ES tumors acquired many new mutations (on average, 50% were unique, or “private,” mutations). A notably high number of clonal mutations from the primary ES were lost in the relapse (average 20%), confirming that the latter diverged early, evolving in parallel. For example, a disruptive clonal PTEN inversion was found in all tumor cells of one primary ES but was absent from the relapse (Fig. 5C). We also confirmed the same model of parallel evolution in one additional primary-metastatic pair, profiled using microarrays (fig. S17). The clinical implication of this model is that one should be searching for therapeutically targetable mutations arising in parallel with those in the primary ES, using biomarkers like circulating tumor DNA, because these mutations would not necessarily be present in the primary tumor.

To determine when the divergence of the lethal clone occurred, we used the number of COSMIC signature 1 mutations, which emerge at a steady rate in ES (see methods and fig. S18). We first confirmed our approach by comparing the number of signature 1 mutations between established time intervals, such as the dates of diagnosis and recurrence. In all cases, the observed number of mutations was extremely close (75 to 90%) to what would be expected (fig. S18). Using the established rate, we calculated the amount of time between the divergence of the primary and relapse or metastatic tumors. Notably, the common ancestor in ES clonally diverges 1 to 2 years before diagnosis. Therefore, the cells that give rise to the primary and relapse tumor can exist in the patient years before diagnosis, providing a window for early cancer detection and surveillance. ES is often difficult to diagnose, and the time to diagnosis is notoriously long (26). These findings provide a plausible biological mechanism for this latency.

Discussion

Our analyses reveal rearrangement bursts (chromoplectic loops) as a source of gene fusion in human bone and soft tissue tumors. ESs with complex karyotypes are associated with a poorer prognosis than those with simpler karyotypes (27), and here we show chromoplexy as the mechanism in 42% of tumors. It is possible that the chromoplectic tumor’s additional gene disruptions and fusions contribute to the difference in patient survival. Our whole-genome sequence data support a model in which there is an early clone of ES, containing EWSR1-ETS and chromoplexy, arising at least 1 year before diagnosis, which gives rise to both the primary and metastatic or relapse tumors (Fig. 5D). Whether the bursts described here are chance events or driven by specific mutational processes, akin to the RAG machinery operative in leukemia, remains to be established. As an increasing and diverse number of tumor genome sequences become available, we may be able to define further rearrangement processes that underlie fusion genes and thus unravel the causes of fusion-driven human cancers.

Materials and methods

Patient and sample collection

ES tumor and matched blood samples were collected from the Hospital for Sick Children (SickKids) and Mount Sinai Hospital in Toronto, Canada, in accordance with each institution’s Research Ethical Board (REB) guidelines. Detailed clinical information (age at presentation, gender, tumor site, stage, etc.) were obtained from the corresponding institutional tumor banks (table S5). Overall, the patients’ clinical features and demographics were typical of ES: the average age of diagnosis was 14.8 years (2.8 to 36.6 years); the male to female ratio was 1.38:1; and 14 patients had relapsed, with 13 having died from their disease. Additional samples (n = 3) were also obtained from Universitätsklinikum Erlangen, Erlangen, Germany. All metastatic or relapse ES tumors were collected from the SickKids tumor bank or the SickKids cancer sequencing program (KiCS). Detailed information on KiCS is available at https://www.kicsprogram.com.

Of the 25 high-coverage genomes sequenced, EWSR1-ETS fusions were detected in all patients except for a 37-year-old individual who was instead found to have a FUS-ERG translocation. This patient’s gene expression profile (by RNA-seq) was also discrepant, so they were removed from subsequent analyses (fig. S19). One additional genome was removed due to poor sequencing quality. We also performed low pass (~10X) rearrangement screens on 19 ES samples. However, as we required breakpoint resolution, all but one of the rearrangement screens were excluded from this study due to insufficient coverage (see table S1, orange row). Taken together, our discovery cohort consisted of individuals from which 23 standard genomes (30X to 60X) and one rearrangement screen genome (20X) were sequenced. The validation cohort consisted of individuals from which 119 tumor-normal samples were sequenced by Tirode et al. (10), which we downloaded from the European Genome-phenome Archive (accessions: EGAS00001000855 and EGAS00001000839). Of these, 19 patient samples were omitted either because the EWSR1-ETS fusion was not detected by our pipeline and manual inspection of the aligned reads, or because they harbored an excess of artefactual small inversions or deletions.

Code availability

Custom code described here is available at github.com/shlienlab.

High-throughput sequencing and alignment

Exome, genome, and transcriptome (RNA-seq) sequencing were performed using established protocols on Illumina instruments. For exome and genomes, paired-end FASTQ files were aligned to the human genome (hg19/GRCh37) using BWA-MEM (v.0.7.8); Picard MarkDuplicates (v.1.108) was used to mark PCR duplicates. Indel realignment and base quality scores were recalibrated using the Genome Analysis Toolkit (v.2.8.1).

Detection of high-quality somatic substitutions and rearrangements

We detected somatic mutations using established tools [MuTect2 (part of GATK v.3.5) (28) and Delly v.0.7.1 (29)]. To evaluate and validate our WGS substitution pipeline, we used a “gold standard” cancer genome tumor-normal dataset, COLO829 (30). Using this somatic reference standard, we determined our precision to be 0.885 and our sensitivity to be 0.971. Copy number was detected for genomes and rearrangement screens using BIC-seq v.1.2.1 (31). When no matched normal was available (in the case of rearrangement screens), an ES normal was used. We then developed custom code to increase specificity of putative substitution and rearrangement detection, as follows:

1. Somatic and depth filter. No mutation should exist in the matched-normal sequence. For substitutions, we removed common single-nucleotide polymorphisms (SNPs) as previously described (32) and a required >10X coverage at the mutated locus (10-kb window), in both tumor and normal. For rearrangements, this filter required ≥4 discordant read-pairs in the tumor. We then directly interrogated the normal BAM file, at each putative somatic rearrangement to ensure no germline variants existed near the breakpoint, on either side of the rearrangement.

2. Panel of normals filtering. To remove common germline variants, we created a panel of normal, nonneoplastic, samples that had been sequenced using the same technology and to a similar depth of coverage (n = 133). We removed any putative rearrangements if present in ≥2 normals. For rearrangements, breakpoints must exist on both sides of the junction within a 1-kb window. We found that as we increased the number of normals in our panel, our specificity increased (fig. S1C).

3. Quality-control filtering. Putative rearrangements were removed if supported by reads with MAPQ < 30. Putative rearrangements were removed if they met any two of the following criteria:

(i) Non-unique mapping. <70% of the reads at the locus map uniquely.

(ii) Multimapping clusters. At the same locus (200 bp up and downstream), a pattern of multiple overlapping groups of discordant reads whose paired-ends align to different chromosomes (>3 reads in each group, mapping to >4 chromosomes). Seen in both the tumor and paired normal.

(iii) High depth. Excessively high depth alignments in difficult-to-align regions of the genome, as described (33). We apply a maximum depth threshold of d + 4*sqrt (d), where d is the average normal mean read depth of the chromosome in the corresponding normal.

(iv) Low-complexity regions. Overlap with a highly repetitive sequence (using DUST (34) with score >60).

Mutation signature extractions and analysis

First, a de novo extraction was performed on the catalog of ES point mutations to produce novel consensus mutational signatures. These signatures were deciphered using a previously described computational framework that optimally explains the proportion of each mutation type found in the catalog and then estimates the contribution of each signature to the mutation catalog (11). Overall, we identified 11 consensus mutational signatures. Four of these signatures were previously found to be attributed to sequencing artifacts. We then compared our true consensus mutational signatures to the previously curated COSMIC list and quantified their similarity using a cosine similarity as previously done (13). We report >0.9 cosine similarity between the Ewing signatures and the COSMIC list.

Validation by targeted custom-capture sequencing

A custom targeted-capture enrichment system was designed to capture 1 Mb of DNA (Nextera, Illumina) with custom probes for the whole of EWSR1, FLI1, and ERG genes as well as the exons of TP53, STAG2, and ATRX. We also targeted known complex breakpoints from the discovery cohort, achieving between 900- to 1000-fold coverage. We reasoned that paired-end sequencing would capture any locus joined to the three core genes, even if the panel did not specifically target it. In this way, we validated rearrangements in samples where chromoplexy was already known from the whole genome and uncovered new instances in samples that had not been whole-genome sequenced (n = 7 and 4, respectively). Each tumor had three or four rearrangements validated using the panel. All had the same breakpoint (as found by the whole-genome sequence) and were found to harbor looped rearrangements are on the same derivative chromosomes.

Validation by FISH, G-banding, or spectral karyotyping

We further validated these looped rearrangements by karyotyping ESs using standard G-banding as well as spectral karyotyping (n = 17 and 3; fig. S4). By cytogenetics we found additional complexity—beyond the canonical chr22-chr11 translocation—in eight cases. Of these, six tumors had been sequenced and found to be complex. Additionally, there were five cases for which chromoplexy was detected by genome sequencing yet not found by cytogenetics techniques, indicating that routine cytogenetics may miss chromosomal complexity present in these genomes due to the nature of these submicroscopic complexities (fig. S20).

Timing of rearrangements using breakpoint allele fraction

To determine the timing of the chromoplectic loops, we developed a tool to accurately measure the breakpoint allele fraction (BAF) of each rearrangement. The BAF is the proportion of reads containing a rearrangement breakpoint divided by the total number of reads, analogous to the variant allele fraction (VAF) for point mutations (illustrated in fig. S14A). This is analogous to the variant-allele frequency of substitution mutations and, similarly, can be used to infer the relative order of rearrangement mutations. The tool accurately counts all reads supporting each rearrangement, even if these had not been used to nominate the rearrangement in the first place. From the raw aligned reads, we first collected all split reads near the breakpoint (within 20 bp) from one side of the rearrangement. Next, we extracted the clipped sequence (i.e., the nonaligned portion) from these reads and attempted to map it to the other side of the rearrangement (within 70 bp of the breakpoint) using a Smith-Waterman algorithm (35). Clipped sequences shorter than 5 bp were discarded, as were those that failed to map to the other side of the rearrangement (≤80% similarity). Since the retained sequences can map at slightly different position, due to microhomology near the breakpoint, we considered all those close to one another as supportive of the same rearrangement. Overall, we found that most rearrangements are supported by remapped reads that are less than 10 bp apart. Finally, the total number of split and realigned reads were divided by the average coverage between the two breakpoints per side of each rearrangement. This allowed us to arrive at an accurate measure of the breakpoint allele fraction. To validate our tool, we applied it to a curated list of known polymorphic copy number variants (CNVs) (36). As expected, the BAF of germline CNV deletions followed a bimodal distribution with peaks at 0.5 and 1.0, for heterozygous and homozygous rearrangements, respectively (fig. S14B, green line). We then compared the BAF of somatic rearrangements in chains to those without. Chained rearrangements had higher BAFs than simple structural variants (fig. S14B, red versus blue line), confirming that chromoplectic rearrangements are in fact earlier.

Detection of breakpoint clusters of chained rearrangements

Using their interbreakpoint distance, we identified rearrangements within 10 kbp of one another. Using these, we created an undirected graph in which two rearrangement breakpoints within 10 kbp of one another (a breakpoint cluster) were represented as a vertex and connected to other breakpoint clusters (rearrangements are edges in the graph). We selected connected components of the graph and identified components with greater than one vertex as interconnected rearrangements. In most of our cases, these interconnected rearrangements formed chains or loops, where one could follow the edges around the graph and return to the initial vertex of departure. These were further filtered for reciprocal rearrangements or overlapping intrachromosomal rearrangements. Chromoplexy rearrangements were validated by manual inspection and using the ChainFinder algorithm (2).

Association of rearrangements with genomic features

We formally evaluated the association of rearrangement position with 38 properties of the human genomes (table S4). We separately evaluated each of these associations in 1-kb bins across the genome. Feature density properties were calculated as densities in various sliding windows (1 kb, 10 kb, 100 kb, and 1 MB) centered on each 1-kb bin or as the log2 distance, as indicated in table S4. The positions of ES rearrangements were compared to a million random positions that had been uniformly sampled from regions of the genome where confident genotypes could be determined (i.e., the “callable” genome). We limited our analysis to chromosomes 1 to 22 and X. To test for significant associations between our rearrangements and these genomic properties, we performed a Mann Whitney U rank sum test and Benjamin and Hochberg FDR correction to raw P values. We used the Cohen’s d metric to determine the effect size between the two groups to account for differences in sample size. We applied an absolute Cohen’s d cut-off of 0.3, a medium effect size (37, 38). Genomic properties were considered significantly different between rearrangements and random positions if absolute (d) ≥ 0.3 and the corrected P < 0.05.

Detection of gene fusions

We detected gene fusions in regions of genomic complexity using an approach that integrates multiple independent fusion algorithms, and then removed those found in normal tissue. Putative fusions were validated by de novo assembly. A total of 1277 normal (nonneoplastic) samples from 43 different tissues were obtained from the NHGRI GTEx consortium (database version 4) and used to remove artifacts. All fusions were visually inspected if one or both genes involved chromoplexy or were adjacent (up to 1 Mbp). Fusions were further filtered by quality of the realigned transcript, breakpoint coverage, and gene expression.

Detection of gene expression

Gene expression for fusions, differential gene expression analysis, and principal component analysis was performed using HT-seq (39) to count the reads aligning to every gene. PCR duplicates and reads mapping to ribosomal RNA, miRNA, and small nucleolar RNA were removed. We used the trimmed mean of M-value (TMM) method in the EdgeR package to perform normalization on genes with at least 1 read per million bases in at least three samples (40, 41). Differential expression analysis in chromoplexy versus nonchromoplexy samples was performed using a generalized linear model (GLM) likelihood ratio test, taking in consideration different sources of variation like batch, gender and age. P values for the GLM test were adjusted for multiple testing using the Benjamini and Hochberg method for controlling the false discovery rate (FDR). Differentially expressed genes in chromoplexy versus nonchromoplexy were considered statistically significant if FDR ≤ 0.05 and absolute value of log(fold change) ≥ 1. Pathway analysis was performed on genes differentially expressed in samples with and without chromoplexy using Gene Set Enrichment Analysis (GSEA) software (javaGSEA v2.2.4). Cancer gene signatures were selected from the hallmark collection from the Molecular Signature DataBase (MsigDB) (18). Enrichment scores for the hallmark pathways were considered statistically significant if FDR < 0.01.

Evaluation of replication timing in prostate cancer rearrangements

We obtained prostate cancer rearrangements, including chained and others, from the Baca et al. publication [supplemental tables S3C and S5 from (2)]. Samples were annotated as “ETS+” or “ETS−” using supplemental table S1 from (2). ETS+ fusions include any ETS fusion detected by sequencing (including ERG and ETV1). Using this list, we performed the same test for genomic property enrichment as we did in ESs.

Molecular inversion probe (MIP) microarray

Raw MIP data from three additional primary-metastatic ES pairs were obtained from the Huntsman Cancer Institute, Salt Lake City, Utah (42). The original source material was clinically archived, formalin-fixed paraffin-embedded (FFPE) scrolls that were retrieved from three individual patients diagnosed with ES. Primary tumor samples were from diagnostic biopsies taken before chemotherapy. The raw MIP data from the completed assay was loaded into Nexus Copy Number (BioDiscovery, Inc., El Segundo, California) for copy number detection using default settings.

Supplementary Materials

www.sciencemag.org/content/361/6405/eaam8419/suppl/DC1

Figs. S1 to S20

Tables S1 to S5

References

References and Notes

Acknowledgments: We dedicate this manuscript to Simon Hajjar, a brilliant and fearless friend, colleague, and student in our lab, and to Ana Novokmet, a friend and colleague who held all of us to high standards and was committed to helping children with cancer. Simon and Ana both died from cancer as this work neared completion. We thank The Centre for Applied Genomics (TCAG) NGS and Biobanking facility for sequencing services. We thank J. Lees, N. Alon, G. Collord, and R. Arnold for their contributions toward this work. We thank N. Tan for her work on the illustration used in the manuscript’s summary page. Funding: This research project was conducted with support from C17 and partially funded by Ewings Cancer Foundation of Canada and Childhood Cancer Canada Foundation. A.S. and D.M. received financial support from the SickKids Foundation through the Garron Family Cancer Centre. N.D.A is personally supported by an NSERC Canada Graduate Scholarship (Master’s) and a SickKids Restracomp Award. S.B. receives personal Fellowships from Wellcome and the St. Baldrick’s Foundation. Author contributions: A.S., S.B., and D.M. designed the study. N.D.A., R.D., M.D.Y., M.L., A.R., N.D.R., F.F., S.H., P.E.K., B.I., L.C.S., R.M., and J.S. performed experiments. N.D.A., R.D., M.D.Y., A.R., F.F., M.A., M.S., and L.B.A. collected and analyzed data. L.B., S.D., N.G., A.N., A.F., N.P., A.Y., T.S., M.M., G.R.S., S.W.S., A.M.F., M.S., J.S.W., I.L.A., P.J.C., J.D.S., D.M., S.B., and A.S. contributed reagents, tissue, and clinical data. N.D.A., A.S., and S.B. wrote the manuscript. M.Z., P.E.K., G.T.G., J.A.T., M.S., I.L.A., D.M., S.B., and A.S. provided technical support and conceptual advice. A.S. oversaw the study. All authors have approved the manuscript. Competing interests: The authors declare no competing interests. Data and materials availability: Raw sequencing data has been deposited at the European Genome-phenome Archive (EGA) under accession number EGAS00001003062.
View Abstract

Stay Connected to Science

Navigate This Article