Research Article

A pathology atlas of the human cancer transcriptome

See allHide authors and affiliations

Science  18 Aug 2017:
Vol. 357, Issue 6352, eaan2507
DOI: 10.1126/science.aan2507
  • Schematic overview of the Human Pathology Atlas.

    A systems-level approach enables analysis of the protein-coding genes of 17 different cancer types from ~8000 patients. Results are available in an interactive open-access database.

  • Fig. 1 Analysis of the global expression patterns of protein-coding genes in human cancers.

    (A) Schematic drawing of the Human Pathology Atlas effort described herein. (B) Principal components analysis (PCA) showing the similarities in expression of 19,571 protein-coding genes among 17 cancer types. See fig. S4 for additional PCA analysis with more stratified patient cohorts. (C) PCA plot showing the individual differences in the genome-wide global expression profiles among the 17 cancer types in 9666 individual patients.

  • Fig. 2 Identification of prognostic genes based on expression coupled with clinical survival for 17 different cancer types.

    (A) Examples of Kaplan-Meier plots for five major cancer patients stratified by the expression of an unfavorable prognostic gene (first row), a favorable prognostic gene (second row), and a combination of 10 prognostic genes (third row). The selected unfavorable and favorable genes had the best log-rank P value based on the Kaplan-Meier analysis, with average RNA expression levels more than the median average expression of all protein-coding genes; the 10 marker genes were a combination of the top five favorable and unfavorable genes with expression higher than the median average expression. Black and red lines show high and low (or, in the third row, favorable and unfavorable) expression, respectively. (B) Examples of two prognostic genes in liver cancer. Left: Distribution of log-rank P values against the RNA expression with different RNA-level (FPKM) cutoffs. Right: Patient-centric scatterplot showing the relationships between living years and RNA expression of the prognostic genes. (C) Numbers of genes showing favorable and unfavorable prognostic effects in the 17 Human Pathology Atlas cancer types. Patient numbers for each cancer are shown in parentheses.

  • Fig. 3 Network analysis of prognostic genes.

    (A) Heat map showing the hypergeometric P value for the pairwise overlap of prognostic genes between the cancer types. (B) Bubble plot showing the common enriched Gene Ontology (GO) functions among the 17 Human Pathology Atlas cancer types. Bubble sizes represent numbers of genes in GO function; the x and y axes indicate the directionalities and generalities of the GO terms. Generality is defined by the number of cancers with their prognostic genes overrepresenting the GO function; directionality is defined by the number of cancers with their favorable genes overrepresenting the GO function minus the number of cancers with unfavorable genes overrepresenting the GO function. Note that only functions with more than five generalities are labeled. All GO terms for each cancer are provided in table S9. Results based on optional P value or hazard ratio cutoff–defined prognostic genes are provided in fig. S7 and table S9. (C) Network plot showing the number of cancer-specific and shared unfavorable cell cycle genes in all cancer types. Note that all groups with only one gene were removed from the plot. (D) Network plot showing the number of liver cancer–specific favorable genes and the favorable genes shared among liver and other cancers in the Human Pathology Atlas. Inset: Pie chart showing the fraction of elevated normal liver genes among the liver cancer–specific favorable genes.

  • Fig. 4 Correlation between tumor differentiation and expression of liver-enriched genes.

    (A) Scatterplots showing the relative (fold) change between the transcript expression level in liver cancer and normal liver tissue (x axis) and the HepG2 cell line and normal tissue (y axis) for all protein-coding genes. Individual genes are colored according to their expression-based category in liver. All FPKM values less than 1 were set to 1 for the fold change calculation. (B) IHC staining of CYP2C9 proteins in four normal tissues and different hepatocellular carcinoma samples. For full IHC protein profiles, view the gene at (C) Box plots showing the expression levels of liver tumor samples of different neoplasm grades for three representative liver-enriched genes for CYP2C9. (D) Box plot showing the distribution of correlation coefficients (Spearman’s rho) between the neoplasm grade and expression for a random set of genes and all liver-enriched genes in liver tumors. (E) Scatterplots for all protein-coding genes showing the fold change in testis-specific antigen in liver cancer and normal liver tissue (x axis) and in the HepG2 cell line and normal liver tissue (y axis). Individual genes are colored according to their expression-based category in the testis.

  • Fig. 5 Coexpression analysis reveals the relationship with the Hallmarks of Cancer and clues for drivers among prognostic genes.

    Gene coexpression of 17 cancers was investigated on the basis of established cancer coexpression networks. (A) Network plot showing the number of cancer-specific and shared prognostic cancer hallmark genes in all cancer types. Note that all groups with fewer than four genes were removed from the plot. (B) A gene coexpression cluster from the coexpression network of lung cancer enriched with both hallmark and prognostic genes. (C) Network plot showing coexpression clusters of lung cancer. All nodes indicate gene coexpression clusters; edges indicate significant coexpression links between clusters. The gray, yellow, and red color of the nodes indicates that the cluster was significantly enriched with hallmark genes, prognostic genes, and both cases, respectively. (D) Bar plot showing the fraction of prognostic genes that are mere hallmark genes (red), coexpressed in hallmark gene clusters (pink), or not coexpressed with hallmark genes (gold).

  • Fig. 6 Genome-scale metabolic models (GSMMs) of cancers.

    (A) Concept of personalized GSMMs, which are comprehensive compilations of all the metabolic reactions within a particular cell, tissue, organ, or organism. By mapping the transcriptomic data from cancer patients, personalized GSMMs could be reconstructed for investigation of the specific metabolic viabilities for each individual. (B) Heat map showing the essential enzymes in the TCA cycle for all glioma patients to exemplify the heterogeneity within the same cancer patient group. Only enzymes that were key in at least one patient are shown. (C) Bar plot showing the fraction of genes that were common in key genes in different proportions of patients for 17 Human Pathology Atlas cancers. (D) Circos plot showing the top 10 common metabolic pathways that were overrepresented by key genes in 17 Human Pathology Atlas cancers. Abbreviated names are provided in Fig. 1A and table S17.

  • Fig. 7 Validation of selected genes with a prognostic effect in lung cancer.

    Kaplan-Meier plots for RNA level separation from the TCGA cohort, RNA level separation from the HPA cohort, and protein-level separation are shown in the first, second, and third columns, respectively. The log-rank P values are shown in the lower left corner of each Kaplan-Meier plot. IHC stained tissues representing high and low protein expression are shown in the fourth and fifth columns, respectively. The protein expression levels across 17 cancer types analyzed by IHC in the Human Pathology Atlas are shown at the right.

Supplementary Materials

  • A pathology atlas of the human cancer transcriptome

    Mathias Uhlen, Cheng Zhang, Sunjae Lee, Evelina Sjöstedt, Linn Fagerberg, Gholamreza Bidkhori, Rui Benfeitas, Muhammad Arif, Zhengtao Liu, Fredrik Edfors, Kemal Sanli, Kalle von Feilitzen, Per Oksvold, Emma Lundberg, Sophia Hober, Peter Nilsson, Johanna Mattsson, Jochen M. Schwenk, Hans Brunnström, Bengt Glimelius, Tobias Sjöblom, Per-Henrik Edqvist, Dijana Djureinovic, Patrick Micke, Cecilia Lindskog, Adil Mardinoglu, Fredrik Ponten

    Materials/Methods, Supplementary Text, Tables, Figures, and/or References

    Download Supplement
    • Materials and Methods 
    • Figs. S1 to S14 
    • Captions for tables S1 to S21 
    • References

    Additional Data

    Tables S1-21
    Table S1. Summary of 33 TCGA cancer types. Table S2. Categories of protein-coding genes in normal tissues and cancers. Table S3. GO term enrichment analysis for cancer-specific house-keeping genes from DAVID. Table S4. Summary of the 17 major cancer types examined in this study, Table S5. The number of prognostic genes for 17 major cancer types. Table S6. Expression cut-off for the best stratification and results of the survival analysis for all protein-coding genes in 17 major cancer types. Table S7. Prognostic genes and their log-rank P values involved in prognostic panels of the Big 5 cancers shown in Figure 2A. Table S8. Summary of all prognostic genes and the respective cancer types for which they are prognostic markers. Table S9. Enriched GO terms for each cancer type with prognostic genes defined by two different log rank P value cutoffs and HR cutoff. Table S10. Summary of unfavorable prognostic cell cycle genes and the respective cancer types for which they are prognostic markers. Table S11. Hypergeometric P values of the overlap between favorable prognostic genes for each cancer and genes with elevated expression in their supposed tissues of origin. Table S12. Statistical features of cancer-specific co-expression networks for 17 cancer types. All the networks are normalized and of the same size with 14,293 genes and 1,021,378 co-expressed gene pairs for fair comparison. Table S13. Summary of genes involved in the co-expression cluster of lung cancer in Figure 5B. Table S14. Statistical summary of cancer-specific co-expression networks in cancer. Table S15. Statistical summary of genome-scale metabolic models for all patients. Table S16. Summary of metabolic pathways associated with the essential genes in 17 cancers. Table S17. Short names for all 17 cancer types. Table S18. Antibodies used for protein profiling of the selected genes. Table S19. Terms and full gene list for hallmark of cancer. Table S20. Reference GSMM for reconstruction of personalized GSMMs. Table S21. Complete list of patient IDs and corresponding cancer types for reconstructed GSMMs.

Navigate This Article