Research Article

RNA sequence analysis reveals macroscopic somatic clonal expansion across normal tissues

See allHide authors and affiliations

Science  07 Jun 2019:
Vol. 364, Issue 6444, eaaw0726
DOI: 10.1126/science.aaw0726
  • Somatic clonal expansions in normal human tissues.

    RNA sequences from 29 normal human tissues collected as part of the Genotype–Tissue Expression (GTEx) project are analyzed using RNA-MuTect, a method developed for detecting somatic mutations in RNA-seq data. Macroscopic clonal expansions, characterized by shared somatic mutations, are detected in all tissues; skin, esophagus, and lung have the largest number of somatic mutations.

  • Fig. 1 Validation of RNA-MuTect in TCGA samples.

    (A) Total number of mutations detected before filtering in DNA (red) and RNA (blue) across samples in each TCGA cohort. (B) Sensitivity and precision of sufficiently covered sites across training and validation samples. Box plots show median, 25th, and 75th percentiles. The whiskers extend to the most extreme data points not considered outliers, and the outliers are represented as dots. (C) Co-mutation plot with mutations across the 243 TCGA samples, overall frequencies, allele fractions, and significance levels of candidate cancer genes (Q < 0.05) identified by applying MutSig2CV (24) on the mutations detected in the RNA. Genes marked with a red arrow were also identified as significantly mutated in the DNA. (D) Mutational signatures identified by SignatureAnalyzer (25) on the basis of mutations detected in the RNA. The mutational signatures identified are (i) a mixture of smoking and nucleotide-excision repair signatures (W1, combination of COSMIC signatures 4 and 5, cosine similarities of 0.7 and 0.75, respectively); (ii) UV (W3, COSMIC signature 7, cosine similarity = 0.95); (iii) APOBEC (W4, COSMIC signature 13, cosine similarity = 0.9); (iv) aging (W5, COSMIC signature 1, cosine similarity = 0.9); (v) POLE (W6, COSMIC signature 10, cosine similarity = 0.88); (vi) MSI (W7, COSMIC signature 15, cosine similarity = 0.8); and (vii) W2, a signature found only in the RNA.

  • Fig. 2 Somatic clonal expansion in normal tissues.

    (A) An illustration of the composition of bulk RNA extracted from a normal human tissue. The biopsy consists of three different cell types that express different transcripts (marked in blue, green, and yellow) at different levels. Blue cells represent cells with a higher probability to form clones. Two clones, small and large, are denoted by purple- and red-dashed outlines, respectively. Mutated reads are marked with an X. The allele fractions of the mutations in the blue and green genes are the same (0.25; 2/8 and 4/16 reads, respectively), despite the different clone sizes. Additionally, the allele fraction of the mutation in the yellow gene is higher than the allele fractions of the mutations in the blue and green genes (0.33; 2/6 reads), even though the yellow mutation is supported by the same (or smaller) number of reads. These scenarios illustrate the challenge of identifying somatic mutations in bulk normal tissue due to a mixture of cell types and the relatively small clones. Moreover, inferences about clone size are limited because different cell types exist in different proportions and express transcripts at different levels. (B) Numbers of mutations detected in RNA-seq of 28 of the 29 studied tissues (we did not detect mutations in six fallopian tube samples). Each sample is represented by a circle. Black horizontal bars represent mean numbers of mutations in each tissue type. A confidence level from our estimation of false positives in the validation data is indicated in the right y axis. Specifically, this confidence level is computed as the xth percentile on the number of false positive calls (RNA-only mutations in DNA-powered sites) found in the validation set. “n” values represent the total number of samples analyzed in each tissue; “n_z” values represent the number of samples in which no mutations were detected; and “n_80” values represent the number of samples in which more than 13 mutations were found (equivalent to a confidence level of 80%). (C) Left: Distribution of allele fraction across all samples in which somatic mutations were detected. Inset: Mutations with allele fraction ≤ 0.2. Right: Allele fraction as a function of log10(coverage) for all detected mutations.

  • Fig. 3 Mutation load is associated with age and tissue-specific proliferation rate.

    (A) Top: Differences in the average number of aging-related mutations and total number of mutations before and after the age of 45 (left and right, respectively). Bottom: Differences in mutation number in esophagus and skin samples before and after the age of 45 (left and right, respectively). Box plots show median, 25th, and 75th percentiles in each group. Black crosses represent the outliers; asterisks represent significance levels. (B) Mean expression of the proliferation marker MKI67 versus the average number of mutations found in each tissue. (C) Left: Number of mutations associated with the UV signature in sun-exposed and non–sun-exposed skin samples. Center: Number of mutations found in sun-exposed and non–sun-exposed skin samples taken from individuals of European ancestry. Right: Number of mutations found in sun-exposed and non–sun-exposed skin samples taken from individuals of African American ancestry. Boxes and whiskers are box plots with dots reflecting outliers.

  • Fig. 4 Mutations in cancer genes across normal tissues.

    (A) Genes in which hotspot mutations were detected. Left: Number of hotspot mutations detected in each gene, and numbers of silent and nonsilent mutations that are not in hotspots. Right: Normal tissues in which the hotspot mutations were detected. All hotspot mutations except two (FAT1 p.E4454K; FGFR3 p.K650E) were annotated as pathogenic. (B) Occurrences of each hotspot mutation found in different TCGA cohorts. (C) Co-mutation plot for genes significantly mutated in a pan-normal analysis, ordered by their significance level (by MutSig2CV); data show 93 of 6707 samples with at least one mutation in these genes and the overall frequency among samples with at least one mutation. The distribution of allele fraction of mutations appears at the bottom. (D) Allelic imbalance in chromosome 9q of a normal esophagus sample. Top: Allele fraction of 233 heterozygous sites based on DNA from a matched-blood sample. Bottom: Allele fraction of heterozygous sites based on RNA from the esophagus sample. The black horizonal lines indicate the mean allele fraction per chromosomal arm of sites with allele fraction smaller or greater than 0.5.

Supplementary Materials

  • RNA sequence analysis reveals macroscopic somatic clonal expansion across normal tissues

    Keren Yizhak, François Aguet, Jaegil Kim, Julian M. Hess, Kirsten Kübler, Jonna Grimsby, Ruslana Frazer, Hailei Zhang, Nicholas J. Haradhvala, Daniel Rosebrock, Dimitri Livitz, Xiao Li, Eila Arich-Landkof, Noam Shoresh, Chip Stewart, Ayellet V. Segrè, Philip A. Branton, Paz Polak, Kristin G. Ardlie, Gad Getz

    Materials/Methods, Supplementary Text, Tables, Figures, and/or References

    Download Supplement
    • Materials and Methods
    • Figs. S1 to S18
    • Captions for tables S1 to S15
    • References
    Tables S1 to S15
    Table S1. List of TCGA samples analyzed in this study
    Table S2. Summary of the validation in normal (tumor-adjacent) samples
    Table S3. List of somatic mutations detected in the GTEx dataset
    Table S4. Summary of the Fluidigm assay results
    Table S5. List of mutations found in normal tissues and their corresponding cancerous tissue
    Table S6. List of somatic variants detected in the blood of healthy individuals
    Table S7. Correlation and P value for the association between age\MKI67 expression levels and total number of mutations, while controlling for the number of tissues tested
    Table S8. MKI67 expression levels across tissues
    Table S9. List of cancer hotspot mutations
    Table S10. List of cancer hotspot mutations detected in normal tissues
    Table S11. List of cancer genes used in the analysis of identifying significantly mutated genes
    Table S12. STAR and HiSat parameters used for alignment
    Table S13. Hotspot mutations removed from the RNA Panel of Normals
    Table S14. List of GTEx samples analyzed in this study
    Table S15. List of genes and samples showing an allele-specific expression pattern

Stay Connected to Science

Navigate This Article