Research Article

Transcriptome-wide isoform-level dysregulation in ASD, schizophrenia, and bipolar disorder

See allHide authors and affiliations

Science  14 Dec 2018:
Vol. 362, Issue 6420, eaat8127
DOI: 10.1126/science.aat8127

Structured Abstract

INTRODUCTION

Our understanding of the pathophysiology of psychiatric disorders, including autism spectrum disorder (ASD), schizophrenia (SCZ), and bipolar disorder (BD), lags behind other fields of medicine. The diagnosis and study of these disorders currently depend on behavioral, symptomatic characterization. Defining genetic contributions to disease risk allows for biological, mechanistic understanding but is challenged by genetic complexity, polygenicity, and the lack of a cohesive neurobiological model to interpret findings.

RATIONALE

The transcriptome represents a quantitative phenotype that provides biological context for understanding the molecular pathways disrupted in major psychiatric disorders. RNA sequencing (RNA-seq) in a large cohort of cases and controls can advance our knowledge of the biology disrupted in each disorder and provide a foundational resource for integration with genomic and genetic data.

RESULTS

Analysis across multiple levels of transcriptomic organization—gene expression, local splicing, transcript isoform expression, and coexpression networks for both protein-coding and noncoding genes—provides an in-depth view of ASD, SCZ, and BD molecular pathology. More than 25% of the transcriptome exhibits differential splicing or expression in at least one disorder, including hundreds of noncoding RNAs (ncRNAs), most of which have unexplored functions but collectively exhibit patterns of selective constraint. Changes at the isoform level, as opposed to the gene level, show the largest effect sizes and genetic enrichment and the greatest disease specificity. We identified coexpression modules associated with each disorder, many with enrichment for cell type–specific markers, and several modules significantly dysregulated across all three disorders. These enabled parsing of down-regulated neuronal and synaptic components into a variety of cell type– and disease-specific signals, including multiple excitatory neuron and distinct interneuron modules with differential patterns of disease association, as well as common and rare genetic risk variant enrichment. The glial-immune signal demonstrates shared disruption of the blood-brain barrier and up-regulation of NFkB-associated genes, as well as disease-specific alterations in microglial-, astrocyte-, and interferon-response modules. A coexpression module associated with psychiatric medication exposure in SCZ and BD was enriched for activity-dependent immediate early gene pathways. To identify causal drivers, we integrated polygenic risk scores and performed a transcriptome-wide association study and summary-data–based Mendelian randomization. Candidate risk genes—5 in ASD, 11 in BD, and 64 in SCZ, including shared genes between SCZ and BD—are supported by multiple methods. These analyses begin to define a mechanistic basis for the composite activity of genetic risk variants.

CONCLUSION

Integration of RNA-seq and genetic data from ASD, SCZ, and BD provides a quantitative, genome-wide resource for mechanistic insight and therapeutic development at Resource.PsychENCODE.org. These data inform the molecular pathways and cell types involved, emphasizing the importance of splicing and isoform-level gene regulatory mechanisms in defining cell type and disease specificity, and, when integrated with genome-wide association studies, permit the discovery of candidate risk genes.

The PsychENCODE cross-disorder transcriptomic resource.

Human brain RNA-seq was integrated with genotypes across individuals with ASD, SCZ, BD, and controls, identifying pervasive dysregulation, including protein-coding, noncoding, splicing, and isoform-level changes. Systems-level and integrative genomic analyses prioritize previously unknown neurogenetic mechanisms and provide insight into the molecular neuropathology of these disorders.

Abstract

Most genetic risk for psychiatric disease lies in regulatory regions, implicating pathogenic dysregulation of gene expression and splicing. However, comprehensive assessments of transcriptomic organization in diseased brains are limited. In this work, we integrated genotypes and RNA sequencing in brain samples from 1695 individuals with autism spectrum disorder (ASD), schizophrenia, and bipolar disorder, as well as controls. More than 25% of the transcriptome exhibits differential splicing or expression, with isoform-level changes capturing the largest disease effects and genetic enrichments. Coexpression networks isolate disease-specific neuronal alterations, as well as microglial, astrocyte, and interferon-response modules defining previously unidentified neural-immune mechanisms. We integrated genetic and genomic data to perform a transcriptome-wide association study, prioritizing disease loci likely mediated by cis effects on brain expression. This transcriptome-wide characterization of the molecular pathology across three major psychiatric disorders provides a comprehensive resource for mechanistic insight and therapeutic development.

Developing more-effective treatments for autism spectrum disorder (ASD), schizophrenia (SCZ), and bipolar disorder (BD), three common psychiatric disorders that confer lifelong disability, is a major international public health priority (1). Studies have identified hundreds of causal genetic variants robustly associated with these disorders and thousands more that likely contribute to their pathogenesis (2). However, the neurobiological mechanisms through which genetic variation imparts risk, both individually and in aggregate, are still largely unknown (24).

The majority of disease-associated genetic variation lies in noncoding regions (5) enriched for noncoding RNAs (ncRNAs) and cis-regulatory elements that regulate gene expression and splicing of their cognate coding gene targets (6, 7). Such regulatory relationships show substantial heterogeneity across human cell types, tissues, and developmental stages (8) and are often highly species specific (9). Recognizing the importance of understanding transcriptional regulation and noncoding genome function, several consortia (8, 1012) have undertaken large-scale efforts to provide maps of the transcriptome and its genetic and epigenetic regulation across human tissues. Although some have included central nervous system (CNS) tissues, a more comprehensive analysis focusing on the brain in both healthy and disease states is necessary to accelerate our understanding of the molecular mechanisms of these disorders (1316).

We present results of the analysis of RNA sequencing (RNA-seq) data from the PsychENCODE Consortium (16), integrating genetic and genomic data from more than 2000 well-curated, high-quality postmortem brain samples from individuals with SCZ, BD, and ASD, as well as controls (17). We provide a comprehensive resource of disease-relevant gene expression changes and transcriptional networks in the postnatal human brain (see Resource.PsychENCODE.org for data and annotations). Data were generated across eight studies (18, 19, 20), uniformly processed, and combined through a consolidated genomic data processing pipeline (21) (fig. S1), yielding a total of 2188 samples passing quality control (QC) for this analysis, representing frontal and temporal cerebral cortices from 1695 individuals across the human life span, including 279 technical replicates (fig. S2). Extensive QC steps were taken within and across individual studies, resulting in the detection of 16,541 protein-coding and 9233 noncoding genes based on Gencode v19 annotations (21) (fig. S3). There was substantial heterogeneity in RNA-seq methodologies across cohorts, which was accounted for by including 28 surrogate variables and aggregate sequencing metrics as covariates in downstream analyses of differential expression (DE) at gene, isoform, and local splicing levels (21). DE did not overlap with experimentally defined brain RNA degradation metrics indicating that results were not driven by RNA-quality confounds (fig. S4) (22).

To provide a comprehensive view of the genomic architecture of these disorders, we characterized several levels of transcriptomic organization—gene-level, transcript isoform, local splicing, and coexpression networks—for protein-coding and noncoding gene biotypes. We integrated results with common genetic variation and disease genome-wide association study (GWAS) results to identify putative regulatory targets of genetic risk variants. Although each level provides important disease-specific and shared molecular pathology, we find that isoform-level changes show the largest effects in diseased brains, are most reflective of genetic risk, and provide the greatest disease specificity when assembled into coexpression networks.

We recognize that these analyses involve a variety of steps and data types and are necessarily multifaceted and complex. We therefore organize results into two major sections. The first is at the level of individual genes and gene products, starting with gene-level transcriptomic analyses, as well as isoform and splicing analyses, followed by identification of potential genetic drivers. The second section is anchored in gene network analysis, where we identify coexpression modules at both gene and isoform levels and assess their relationship to genetic risk. As these networks reveal many layers of biology, we provide an interactive website to permit their in-depth exploration (Resource.PsychENCODE.org).

Gene and isoform expression alterations

RNA-seq–based quantifications enabled assessment of coding and noncoding genes and transcript isoforms, imputed using the RSEM software package guided by Gencode v19 annotations (21, 23). In accordance with previous results (13), we observed pervasive differential gene expression (DGE) in ASD, SCZ, and BD [n = 1611, 4821, and 1119 genes at false discovery rate (FDR) < 0.05, respectively; Fig. 1A and table S1]. There was substantial cross-disorder sharing of this DE signal and a gradient of transcriptomic severity with the largest changes in ASD compared with SCZ or BD (ASD versus SCZ, mean |log2FC| 0.26 versus 0.10, P < 2 × 10−16, Kolmogorov-Smirnov (K-S) test; ASD versus BD, mean |log2FC| 0.26 versus 0.15, P < 2 × 10−16, K-S test), as observed previously (13). Altogether, more than one-quarter of the brain transcriptome was affected in at least one disorder (Fig. 1, A to C; complete gene list, table S1).

Fig. 1 Gene and isoform expression dysregulation in brain samples from individuals with psychiatric disorders.

(A) DE effect size (|log2FC|) histograms are shown for protein-coding, lncRNA, and pseudogene biotypes up- or down-regulated (FDR < 0.05) in disease. Isoform-level changes (DTE; blue) show larger effect sizes than at the gene level (DGE; red), particularly for protein-coding biotypes in ASD and SCZ. (B) A literature-based comparison shows that the number of DE genes detected is dependent on study sample size for each disorder. (C) Venn diagrams depict overlap among up- or down-regulated genes and isoforms across disorders. (D) Gene ontology enrichments are shown for differentially expressed genes or isoforms. The top five pathways are shown for each disorder. (E) Heatmap depicting cell type specificity of enrichment signals. Differentially expressed features show substantial enrichment for known CNS cell type markers, defined at the gene level from single-cell RNA-seq. (F) Annotation of 944 ncRNAs DE in at least one disorder. From left to right: Sequence-based characterization of ncRNAs for measures of human selective constraint; brain developmental expression trajectories are similar across each disorder (colored lines represent mean trajectory across disorders); tissue specificity; and CNS cell type expression patterns.

DGE results were concordant with previously published datasets for all three disorders (fig. S4), although some had overlapping samples. We observed significant concordance of DGE effect sizes with those from a microarray meta-analysis of each disorder [ASD: ⍴ = 0.8, SCZ: ⍴ = 0.78, BD: ⍴ = 0.64, Spearman ⍴ of log2FC, all P values < 10−16 (13)] and with previous RNA-seq studies of individual disorders [ASD: ⍴ = 0.96 (19); SCZ ⍴ = 0.78 (18); SCZ ⍴ = 0.80 (24); BD ⍴ = 0.85 (13); Spearman ⍴ of log2FC, all P values < 10−16]. These DE genes exhibited substantial enrichment for known pathways and cell type–specific markers derived from single-nucleus RNA-seq in the human brain (Fig. 1, D and E) (21), consistent with previously observed patterns (13, 19).

Expanding these analyses to the transcript isoform level, we observed widespread differential transcript expression (DTE) across ASD, SCZ, and BD (n = 767, 3803, and 248 isoforms at FDR < 0.05, respectively; table S1). Notably, at the DTE level, the cross-disorder overlap was significantly attenuated (Fig. 1C), suggesting that alternative transcript usage and/or splicing confers a substantial portion of disease specificity. In addition, isoform-level alterations in disease exhibited substantially larger effect sizes compared with gene-level changes (mean |log2FC| 0.25 versus 0.14, P < 2 × 10−16, K-S test), particularly for protein-coding biotypes (Fig. 1A), consistent with recent work demonstrating the importance of splicing dysregulation in disease pathogenesis (25). Furthermore, although isoform and gene-level changes exhibited similar pathway and cell type enrichments (e.g., Fig. 1, D and E), isoform-level analysis identified DE transcripts that did not show DGE (isoform-only DE), including 811 in SCZ, 294 in ASD, and 60 in BD. These isoform-only DE genes were more likely to be down-regulated than up-regulated in disease (one-sample t test, P < 10−16), exhibited greatest overlap with excitatory neuron clusters [odds ratios (ORs) > 4, Fisher’s exact test, FDRs < 10−10], and showed significant enrichment for neuron projection development, mRNA metabolism, and synaptic pathways (FDR < 3 × 10−3; table S1). To validate DTE results, we performed polymerase chain reaction (PCR) on several selected transcripts in a subset of ASD, SCZ, and control samples (21) and found significant concordance in fold-changes compared with those from RNA-seq data (fig. S5, A and B). Together, these results suggest that isoform-level changes are most reflective of neuronal and synaptic dysfunction characteristic of each disorder.

Differential expression of the noncoding transcriptome

ncRNAs represent the largest class of transcripts in the human genome and have increasingly been associated with complex phenotypes (26). However, most have limited functional annotation, particularly in the human brain, and have been only minimally characterized in the context of psychiatric disease. On the basis of Gencode annotations, we identified 944 ncRNAs exhibiting gene- or isoform-level DE in at least one disorder [hereafter referred to as neuropsychiatric (NP) ncRNAs (21)], 693 of which were differentially expressed in SCZ, 178 in ASD, and 174 in BD. Of these, 208, 60, and 52 are annotated as intergenic long ncRNAs (lincRNAs) in each disorder, respectively. To place these NPncRNAs within a functional context, we examined expression patterns across human tissues, cell types, and developmental time periods, as well as sequence characteristics including evolutionary conservation, selection, and constraint. We highlight several noncoding genes exhibiting DE across multiple disorders (fig. S6) and provide comprehensive annotations for each NPncRNA (table S2), including cell type specificity, developmental trajectory, and constraint, to begin to elucidate a functional context in the human brain.

As a class, NPncRNAs were under greater selective constraint compared with all Gencode annotated ncRNAs (Fig. 1F), consistent with the observed increased purifying selection in brain-expressed genes (27). We identified 74 NPncRNAs (~8%) under purifying selection in humans, with average exon-level context-dependent tolerance scores (CDTS) below the 10th percentile (21). More than 200 NPncRNAs exhibited broad and nonspecific expression patterns across cell types, whereas 66 were expressed within a specific cell type class (table S2). Notable examples are: LINC00996, which is down-regulated in SCZ (log2FC −0.71, FDR < 5 × 10−11) and BD (log2FC −0.45, FDR = 0.02) and restricted to microglia in the brain (fig. S6); LINC00343, which is expressed in excitatory neurons and down-regulated in BD (log2FC −0.33, FDR = 0.012) with a trend in SCZ (log2FC −0.15, FDR 0.065); and LINC00634, an unstudied brain-enriched lincRNA down-regulated in SCZ (log2FC −0.06, FDR 0.027) with a genome-wide significant SCZ TWAS association as described below.

Local splicing dysregulation in disease

Isoform-level diversity is achieved by combinatorial use of alternative transcription start sites, polyadenylation, and splicing (28). We used LeafCutter (29) to assess local differential splicing (DS) in ASD, SCZ, and BD compared with controls using de novo aligned RNA-seq reads, controlling for the same covariates as DGE and DTE (fig. S7). This approach complements DTE by considering aggregate changes in intron usage affecting exons that may be shared by multiple transcripts and is consequently not restricted to the specified genome annotation (21). Previous studies have identified alterations in local splicing events in ASD (19, 30) and in smaller cohorts in SCZ (18, 24) and BD (31).

We identified 515 DS intron clusters in 472 genes across all disorders (FDR < 0.1), 117 of which (25%) contained one or more previously unidentified exons (table S3 and Fig. 2A). Validation of DS changes for 9 genes in a subset of cases and controls (n = 5 to 10 in each group) by semiquantitative reverse transcription (RT)–PCR showed percent spliced-in (PSI) changes consistent with those reported by LeafCutter (fig. S5, C to E). The most commonly observed local splicing change was exon skipping (41 to 60%), followed by alternative 5′ exon inclusion (e.g., due to alternative promoter usage; 11 to 21%) and alternative 3′ splice site usage (5 to 18%) (table S3 and fig. S8A). DS genes overlapped significantly with DTE results for ASD and SCZ (fig. S8B), but not BD, which likely still remains underpowered. There was significant cross-disorder correlation in PSI changes (Spearman’s ⍴ = 0.59 SCZ-BD, ⍴ = 0.52 SCZ-ASD, all P < 10−4) and, subsequently, overlap among DS genes (Fig. 2, A and B), although the majority of splicing changes still are disorder specific. Only two genes, DTNA and AHCYL1, were significantly differentially spliced in all three disorders (fig. S9). Differentially spliced genes showed significant (FDR < 0.05) enrichment for signaling, cell communication, actin cytoskeleton, synapse, and neuronal development pathways across disorders (Fig. 2C and fig. S8C) and were relatively broadly expressed across cell types (Fig. 2D). Disorder-specific pathways implicated by splicing dysfunction include plasma membrane receptor complex, endocytic vesicle, regulation of cell growth and cytoskeletal protein binding in ASD; angiotensin receptor signaling in BD; and guanosine triphosphatase receptor activity, neuron development, and actin cytoskeleton in SCZ. We also found significant enrichment of splicing changes in targets of two RNA binding proteins that regulate synaptic transmission and whose targets are implicated in both ASD and SCZ, the neuronal splicing regulator RBFOX1 (FDR = 5.16 × 10−11) (32) and the fragile X mental retardation protein (FMRP) (FDR = 3.10 × 10−21) (33). Notably, 48 DS genes (10%; FDR = 8.8 × 10−4) encode RNA binding proteins or splicing factors (34), with at least six splicing factors also showing DTE in ASD (MATR3), SCZ (QKI, RBM3, SRRM2, U2AF1), or both (SRSF11).

Fig. 2 Aberrant local splicing and isoform usage in ASD, SCZ, and BD.

(A) Venn diagram showing cross-disorder overlap for 472 genes with significant differentially spliced (DS) intron clusters (FDR < 10%) identified by LeafCutter. P values for hypergeometric tests of pairwise overlaps between each disorder are shown at the bottom. (B) Scatter plots comparing PSI changes for all 1287 introns in 515 significant DS clusters in at least one disorder, for significant disease pairs SCZ versus ASD and SCZ versus BD (Spearman’s ⍴ = 0.52 and 0.59, respectively). Principal component regression lines are shown in red, with regression slopes for ASD and BD ΔPSI compared to SCZ in the top-left corner. (C) Top 10 gene ontology (GO) enrichments for DS genes in each disorder (see also fig. S8C). (D) Significant enrichment for neuronal and astrocyte markers (ASD and SCZ), as well as oligodendrocyte and microglia (SCZ) cell type markers in DS genes. The odds ratio (*OR) is given only for FDR < 5% and OR > 1. Oligo, oligodendrocytes; OPC, oligodendrocyte progenitor cells. (E) A significant DS intron cluster in GRIN1 (clu_35560; chr9:140,040,354-140,043,461) showing increased exon 4 (E4) skipping in both ASD and SCZ. Increased or decreased intron usage in ASD and SCZ cases compared to controls is highlighted in red and blue, respectively. Protein domains are annotated as ANF_receptor, extracellular receptor family ligand binding domain; Lig_chan, ionotropic glutamate receptor; Lig_chan-Glu_bd, ligated ion channel l-glutamate- and glycine-binding site; CaM_bdg_C0, calmodulin-binding domain C0 of NMDA receptor NR1 subunit. Visualization of splicing events in cluster clu_35560 with the change in PSI (ΔPSI) for ASD (left) and SCZ (right) group comparisons. FDR-corrected P values (q) are indicated for each comparison. Covariate-adjusted average PSI levels in ASD or SCZ (red) versus CTL (blue) are indicated at each intron. (F) Violin plots with the distribution of covariate-adjusted PSI per sample for the intron skipping E4 are shown for each disease group comparison. (G) DGE for GRIN1 in each disorder (*FDR < 5%). (H) Whole-gene view of NRXN1 highlighting (dashed lines) the intron cluster with significant DS in ASD (clu_28264; chr2:50,847,321-50,850,452), as well as transcripts NRXN1-004 and NRXN1-012 that show significant DTU in SCZ and/or BD. Protein domain mappings are shown in purple. DM, protein domains; Tx, transcripts; ConA-like_dom_sf, concanavalin A–like lectin/glucanase domain; EGF-like, epidermal growth factor-like domain; laminin_G, laminin G domain; neurexin-like, neurexin/syndecan/glycophorin C domain. (I) (Left) Close-up of exons and protein domains mapped onto the DS cluster and FDR-corrected P value (q). (Right) Visualization of introns in cluster clu_28264 with their change in percent spliced in (ΔPSI). Covariate-adjusted average PSI levels in ASD (red) versus CTL (blue) are indicated for each intron. (J) Violin plots with the distribution of covariate-adjusted PSI per sample for the largest intron skipping exon 8 (E8). (K) Bar plots for changes in gene expression and transcript usage for NRXN1-004 and NRXN1-012 (*FDR < 5%).

Many differential splicing events show predictable functional consequences on protein isoforms. Notable examples include GRIN1 and NRXN1, which are known risk loci for neurodevelopmental disorders (35, 36). GRIN1 encodes the obligatory subunit of the N-methyl-d-aspartate (NMDA)–type glutamate ionotropic receptors, is up-regulated in SCZ and BD, and shows increased skipping of exon 4 in both ASD and SCZ that affects its extracellular ligand-binding domain (Fig. 2, E to G). NRXN1 is a heterotypic, presynaptic cell adhesion molecule that undergoes extensive alternative splicing and plays a key role in the maturation and function of synapses (35, 37). We observed various DS and/or differential transcript usage (DTU) changes in NRXN1 in ASD, SCZ, and/or BD (Fig. 2, H to K). An exon skipping event in ASD disrupts a laminin domain in NRXN1 (Fig. 2, I and J), changes that are predicted to have major effects on its function (Fig. 2H). Another example is CADPS, which is located within an ASD GWAS risk locus and supported by high-resolution chromosome conformation capture (Hi-C)–defined chromatin interactions as a putative target gene (38) and manifests multiple isoform and splice alterations in ASD (fig. S9 and tables S1 and S3).

We found significant overlap (42%, P = 3.42 × 10−27; Fisher’s exact test) of the ASD DS intron clusters and splicing changes identified in a previous study (19) that used a different method and only a subset of the samples in our ASD and control cohorts (table S3). Overall, this examination of local splicing across three major neuropsychiatric disorders, coupled with the analysis of isoform-level regulation, emphasizes the need to understand the regulation and function of transcript isoforms at a cell type–specific level in the human nervous system.

Identifying drivers of transcriptome dysregulation

We next sought to determine whether changes observed across levels of transcriptomic organization are reflective of the same, or distinct, underlying biological processes. Further, transcriptomic changes may represent a causal pathophysiology or may be a consequence of disease. To begin to address this, we assessed the relationships among transcriptomic features and with polygenic risk scores (PRS) for disease, which provide a directional, genetic anchor (Fig. 3A). Across all three disorders, there was strong concordance among differential gene, isoform, and ncRNA signals, as summarized by their first principal component (Fig. 3A). Notably, DS exhibited greatest overlap with the ncRNA signal, suggesting a role for noncoding genes in regulating local splicing events.

Fig. 3 Overlap and genetic enrichment among dysregulated transcriptomic features.

(A) Scatterplots demonstrate overlap among dysregulated transcriptomic features, summarized by their first principal component across subjects (R2 values; *P < 0.05). PRS show greatest association with differential transcript signal in SCZ. (B) SNP heritability in SCZ is enriched among multiple differentially expressed transcriptomic features, with down-regulated isoforms showing the most substantial association via stratified LD-score regression. (C) Several individual genes and isoforms exhibit genome-wide significant associations with disease PRS. Plots are split by direction of association with increasing PRS. In ASD, most associations localize to the 17q21.31 locus, harboring a common inversion polymorphism.

Significant associations with PRS were observed for DGE and DTE signals in SCZ, with greater polygenic association at the isoform level in accordance with the larger transcript isoform effect sizes observed. Transcript-level DE also showed the greatest enrichment for SCZ single-nucleotide polymorphism (SNP) heritability, as measured by stratified LD (linkage disequilibrium) score regression (21, 39) (Fig. 3B). The overall magnitude of genetic enrichment was modest, however, suggesting that most observed transcriptomic alterations are less a proximal effect of genetic variation and more likely the consequence of a downstream cascade of biological events following earlier-acting genetic risk factors.

We were also interested in determining the degree to which genes showed increases in the magnitude of DE over the duration of illness, as a positive relationship would be expected if age-related cumulative exposures (e.g., drugs, smoking) were driving these changes. To assess this, we fit local regression models to case and control sample-level expression measurements as a function of age and computed age-specific DE effect sizes (fig. S10). Of 4821 differentially expressed genes in SCZ, only 143 showed even nominal association between effect size magnitude and age. Similar associations were seen in 29 of 1119 differentially expressed genes in BD and 85 of 1611 differentially expressed genes in ASD. Consequently, this would not support substantial age-related environmental exposures as the mechanism for the vast majority of differentially expressed genes.

Using gene expression data from animal models, we investigated whether exposure to commonly used psychiatric medications could recapitulate observed gene expression changes in disease (fig. S11). Overall, with the exception of lithium, chronic exposure to medications—including antipsychotics (clozapine, haloperidol), mood stabilizers (lamotrigine), and SSRI antidepressants (fluoxetine)—had a small effect on the transcriptome, in many cases with no differentially expressed genes at traditional FDR thresholds (21). Even at more liberal thresholds, the overlap between medication-driven and disease signal remains sparse. One notable exception was a module that reflects major components of a well-described (40) neural activity–dependent gene expression program, whose disease relationships are refined in the network analysis section below. Finally, we note that other unmeasured factors could potentially contribute to gene expression variation in postmortem tissue, including agonal events or smoking (22, 41, 42) in addition to those measured and used as covariates, such as RNA integrity and postmortem interval. We used surrogate variable correction in our analyses to account for such unmeasured confounders (43), which is a standard approach (44).

Transcriptome-wide association

We next sought to leverage this transcriptomic dataset to prioritize candidate disease risk genes with predicted genetically driven effects on expression in brain. We identified 18 genes or isoforms whose expression was significantly associated with PRS [(21); Bonferroni-corrected P < 0.05]: 16 in ASD and 2 in SCZ, with none in BD (Fig. 3C and table S4). In ASD, the majority of associations map to 17q21.31, which harbors a common inversion polymorphism and rare deleterious structural variants associated with intellectual disability (45). Additional associations for ASD included two poorly annotated pseudogenes, FAM86B3P and RP11-481A20.10. In SCZ, PRS was associated with up-regulation of the established risk gene C4A (3). Concordantly, we found a strong positive correlation between C4A expression and genetically imputed C4A copy number (R = 0.36, P = 6 × 10−21) and imputed number of C4-HERV elements (R = 0.35, P = 4 × 10−20) but a slight negative association with C4B copy number [R = −0.087, P = 0.03 (21)]. At less stringent thresholds (FDR-corrected P < 0.05), we identified BD PRS associations with isoforms of the neuronal calcium sensor NCALD and SNF8, an endosomal sorting protein, as well as several additional associations in the major histocompatibility complex (MHC) region in SCZ, which harbors the largest GWAS peak composed of multiple independent signals (3) but is difficult to parse due to complex patterns of LD. These included two lncRNAs, HCG17 and HCG23, as well as the MHC class I heavy-chain receptor HLA-C. However, expression of all three was also significantly (P < 0.05) correlated with imputed C4A copy number, suggesting pleiotropic effects.

Taking an orthogonal approach, we performed a formal transcriptome-wide association study (TWAS) (46) to directly identify genes whose cis-regulated expression is associated with disease (21). TWAS and related methods have the advantage of aggregating the effects of multiple SNPs onto specific genes, reducing multiple comparisons and increasing power for association testing, although results can still be influenced by LD and pleiotropy (46, 47). Further, by imputing the cis-regulated heritable component of brain gene expression into the association cohort, TWAS enables direct prediction of the transcriptomic effects of disease-associated genetic variation, identifying potential mechanisms through which variants may impart risk. However, the limited size of brain eQTL (expression quantitative trait loci) datasets to date has necessitated the use of non-CNS tissues to define TWAS weights (46). Given the enrichment of psychiatric GWAS signal within CNS-expressed regulatory elements (39), we reasoned that our dataset would provide substantial power and specificity. Indeed, we identified 14,750 genes with heritable cis-regulated brain expression in the PsychENCODE cohort, enabling increased transcriptomic coverage for detection of association signal (Fig. 4). In BD, TWAS prioritizes 17 genes across 14 distinct loci (Bonferroni-corrected P < 0.05; Fig. 4 and table S4), none of which exhibited DE. At loci with multiple hits, we applied conditional analyses to further finemap these regions (21). For orthogonal validation, we conducted summary-data–based Mendelian randomization (SMR), a complementary method that tests for pleiotropic associations in the cis window with an accompanying HEIDI test to distinguish linkage from pleiotropy (48). Eleven genes—BMPR1B, DCLK3, HAPLN4, HLF, LMAN2L, MCHR1, UBE2Q2L, SNAP91, TTC39A, TMEM258, and VPS45—showed consistent association (21) across multiple analyses (table S4). The two isoforms with PRS associations in BD (NCALD, SNF8) were nonsignificant in TWAS, perhaps owing to lack of a nearby genome-wide significant locus or isoform-specific regulation, which suggests that those expression changes may be driven by trans-acting factors.

Fig. 4 Transcriptome-wide association.

Results from a TWAS prioritize genes whose cis-regulated expression in brain is associated with disease. Plots show conditionally-independent TWAS prioritized genes, with lighter shades depicting marginal associations. The sign of TWAS z-scores indicates predicted direction of effect. Genes significantly up- or down-regulated in diseased brain are shown with arrows, indicating directionality. (A) In SCZ, 193 genes (164 outside of MHC) are prioritized at Bonferroni-corrected P < 0.05, including 107 genes with conditionally independent signals. Of these, 23 are also differentially expressed in SCZ brains with 11 in the same direction as predicted. (B) Seventeen genes are prioritized in BD, of which 15 are conditionally independent. (C) In ASD, a TWAS prioritizes 12 genes, of which 5 are conditionally independent.

In ASD, TWAS prioritizes 12 genes across three genomic loci (Bonferroni-corrected P < 0.05; Fig. 4). This includes the 17q21.31 region, which showed multiple PRS associations as described above but did not reach genome-wide significance in the largest GWAS to date (38). Of the seven TWAS-significant genes at this locus, conditional analysis prioritizes one—LRRC37A, which is further supported by SMR and Hi-C interaction in fetal brain (38). LRRC37A is intriguing due to its primate-specific evolutionary expansion, loss-of-function intolerance, and expression patterns in the brain and testis (45). However, common variants in GWAS are also likely tagging the common inversion and other recurrent structural variants present at this locus (45). TWAS additionally prioritizes genes on chromosomes 8 and 20 (Fig. 4). Altogether, five genes showed consistent associations with ASD across multiple methods: LRRC37A, FAM86B3P, PINX1, XKR6, and RP11-481A20.10 (table S4) (21).

In SCZ, TWAS identifies 193 genes, of which 107 remain significant after conditional analysis at each gene within multi-hit loci. Excluding the MHC region, there remained 164 significant genes representing 78 genome-wide significant GWAS loci (Fig. 4 and table S4). A previous TWAS study in SCZ primarily based on non-neural tissue prioritized 157 genes, 37 of which are identified here, a significant overlap (OR = 61, P < 10−42, Fisher’s exact test). Moreover, 60 TWAS-prioritized genes overlapped with the list of 321 high-confidence SCZ risk genes in a companion manuscript (17), identified using gene regulatory networks and a deep learning approach (OR = 34.7, P < 10−60, Fisher’s exact test). Of the 107 conditionally significant genes prioritized by TWAS, 62 were further supported by SMR (PSMR < 0.05, PHEIDI > 0.05), and 11 were also concordantly differentially expressed in SCZ brains in the same direction as predicted by TWAS. Altogether, 64 genes were consistently prioritized across multiple methods, including 10 ncRNAs (table S4) (21). These included a number of previously unknown candidates for SCZ: two down-regulated lysine methyltransferases (SETD6, SETD8); RERE, a down-regulated, mutationally intolerant nuclear receptor co-regulator of retinoic acid signaling associated with a rare neurodevelopmental genetic syndrome; LINC00634, a down-regulated poorly annotated brain-enriched lincRNA; and SLC12A5, which encodes a mitochondrial Ca2+ binding aspartate/glutamate carrier protein, associated with a recessive epileptic encephalopathy. Most genes identified in this analysis show disease-specific effects, as only four genes (MCHR1, VPS45, SNAP91, and DCLK3) showed overlap between SCZ and BD TWAS, and none overlapped with ASD. Overall, this analysis provides a core set of strong candidate genes implicated by risk loci and provides a mechanistic basis for the composite activity of disease risk variants.

Networks refine shared cross-disorder signals

To place transcriptomic changes within a systems-level context and more fully investigate the specific molecular neuropathology of these disorders, we performed weighted gene correlation network analysis (WGCNA) to create independent gene- and isoform-level networks (14, 49, 50), which we then assessed for disease association and GWAS enrichment by using stratified LD score regression [(21); see Resource.PsychENCODE.org for interactive visualization]. Although calculated separately, gene- and isoform-level networks generally reflected equivalent biological processes, as demonstrated by hierarchical clustering (Fig. 5A). However, the isoform-level networks captured greater detail, and a larger proportion were associated with disease GWAS than gene-level networks (61% versus 41% with nominal GWAS enrichment, P = 0.07, χ2; Fig. 5A). Consistent with expectations, modules showed enrichment for gene ontology pathways, and we identified modules strongly and selectively enriched for markers of all major CNS cell types (Fig. 5, A and B, and fig. S12), facilitating computational deconvolution of cell type–specific signatures (14, 49, 51). For ease of subsequent presentation, we grouped gene-isoform module pairs that cocluster, have overlapping parent genes, and represent equivalent biological processes.

Fig. 5 Gene and isoform coexpression networks capture shared and disease-specific cellular processes and interactions.

(A) Coexpression networks demonstrate pervasive dysregulation across psychiatric disorders. Hierarchical clustering shows that separate gene- and isoform-based networks are highly overlapping, with greater specificity conferred at the isoform level. Disease associations are shown for each module (linear regression β value, *FDR < 0.05, –P < 0.05). Module enrichments (*FDR < 0.05) are shown for major CNS cell types. Enrichments are shown for GWAS results from SCZ (59), BD (97), and ASD (38), using stratified LD score regression (*FDR < 0.05, –P < 0.05). (B) Coexpression modules capture specific cellular identities and biological pathways. Colored circles represent module DE effect size in disease, with red outlines representing GWAS enrichment in that disorder. Modules are organized and labeled based on CNS cell type and top gene ontology enrichments. (C) Examples of specific modules dysregulated across disorders, with the top 25 hub genes shown. Edges represent coexpression (Pearson correlation > 0.5) and known protein-protein interactions. Nodes are colored to represent disorders in which that gene is differentially expressed (*FDR < 0.05).

The large sample sizes, coupled with the specificity of isoform-level quantifications, enabled refinement of previously identified gene networks related to ASD, BD, and SCZ (1315, 18, 19, 52). Of a combined 90 modules, including 34 gene-level (geneM) and 56 isoform-level (isoM) modules, 61 (68%) showed significant association with at least one disorder, demonstrating the pervasive nature of transcriptome dysregulation in psychiatric disease. Five modules are shared across all three disorders, 3 up-regulated and 2 down-regulated; 22 modules are shared by two of the three disorders, and 36 demonstrate more specific patterns of dysregulation in either ASD, SCZ, or BD (Fig. 5 and table S5). It is notable that of these 61 coexpression modules with a disease-association, 41 demonstrate cell type enrichments, consistent with the strong cell type disease-related signal that was observed via both supervised and unsupervised methods in a companion study (17). This demonstrates the importance of cell type–specific changes in the molecular pathology of these major psychiatric disorders; the cell type relationships defined by the disease modules substantially enhance our knowledge of these processes, as we outline below.

The five modules shared between ASD, BD, and SCZ can be summarized to represent three distinct biological processes. Two of these processes are up-regulated, including an inflammatory NFkB (nuclear factor κB) signaling module pair (geneM5/isoM5; further discussed in the “Distinct neural-immune trajectories” section) and a module (geneM31) enriched primarily for genes with roles in the postsynaptic density, dendritic compartments, and receptor-mediated presynaptic signaling that are expressed in excitatory neurons and, to a lesser extent, inhibitory neurons (Fig. 5C). Notably, DCLK3, one of the hubs of geneM31, is a genome-wide significant TWAS hit in both SCZ and BD. The third biological process, geneM26/isoM22 (Fig. 5C), is down-regulated and enriched for endothelial and pericyte genes, with hubs that represent markers of the blood-brain barrier, including ITIH5, SLC38A5, ABCB1, and GPR124, a critical regulator of brain-specific angiogenesis (53, 54). This highlights specific, shared alterations in neuronal-glial-endothelial interactions across these neuropsychiatric disorders.

In contrast to individual genes or isoforms, no modules were significantly associated with PRS after multiple-testing correction. However, 19 modules were significantly (FDR < 0.05) enriched for SNP heritability on the basis of published GWASs (21) (Fig. 5A and fig. S13). A notable example is geneM2/isoM13, which is enriched for oligodendrocyte markers and neuron projection developmental pathways and is down-regulated in ASD and SCZ, with a trend in BD (Fig. 5C). isoM13 showed the greatest overall significance of enrichment for SCZ and educational attainment GWAS and was also enriched in BD GWAS to a lesser degree. Further, this module is enriched for genes harboring ultrarare variants identified in SCZ (55) (fig. S13). Finally, we also observe pervasive and distinct enrichments for syndromic genes and rare variants identified through whole-exome sequencing in individuals with neurodevelopmental disorders (table S5 and fig. S13).

Neuronal isoform networks capture disease specificity

Multiple neuronal and synaptic signaling pathways have been previously shown to be down-regulated in a diminishing gradient across ASD, SCZ, and BD brains without identification of clear disease-specific signals for these neuronal-synaptic gene sets (13, 15, 18, 19, 56, 57). We do observe neuronal modules broadly dysregulated across multiple disorders, including a neuronal/synaptic module (isoM18) with multiple isoforms of the known ASD risk gene, ANK2, as hubs. However, the large sample size, coupled with the specificity of isoform-level qualifications, enabled us to identify synaptic modules containing isoforms with distinct disease associations and to separate signals from excitatory and inhibitory neurons (Fig. 5B).

A salient example of differential module membership and disease association of transcript isoforms is RBFOX1, a major neuronal splicing regulator implicated across multiple neurodevelopmental and psychiatric disorders (15, 32, 58, 59). Previous work has identified down-regulated neuronal modules in ASD and SCZ containing RBFOX1 as a hub (13, 15). In this study, we identified two neuronal modules with distinct RBFOX1 isoforms as hub genes (Fig. 6A). The module pair geneM1/isoM2, down-regulated only in ASD (Fig. 6B), contains the predominant brain-expressed RBFOX1 isoform and includes several cation channels (e.g., HCN1, SCN8A). The second most abundant RBFOX1 isoform is in another module, isoM17, which is down-regulated in both ASD and SCZ (Fig. 6B). Experiments in mouse indicate that RBFOX1 has distinct nuclear and cytoplasmic isoforms with differing functions, the nuclear isoform primarily regulating pre-mRNA alternative splicing, and the cytoplasmic isoform binding to the 3′ untranslated region to stabilize target transcripts involved in regulation of neuronal excitability (28, 32, 58, 60). isoM17 shows greater enrichment for nuclear RBFOX1 targets (Fig. 6C), whereas isoM2 shows stronger overlap with cytoplasmic targets (32). Consistent with a predicted splicing-regulatory effect, isoM17 shows greater enrichment for genes exhibiting DS in ASD and SCZ (Fig. 6D). In accordance with a predicted role in regulating excitability, isoM2 shows strong enrichment for epilepsy risk genes (Fig. 6E). Moreover, the two modules show differential association with common genetic risk (Fig. 6E), with isoM2 exhibiting GWAS enrichment across SCZ, BD, and major depressive disorder (MDD). This widespread enrichment of neurodevelopmental and psychiatric disease risk factors—from rare variants in epilepsy to common variants in BD, SCZ, and MDD—is consistent with a model in which broad neuropsychiatric liability emanates from myriad forms of dysregulation in neuronal excitability, all linked via RBFOX1. These results highlight the importance of further studies focused on understanding the relationship between human RBFOX1 transcript diversity and functional divergence, as most of what is known is based on mouse, and the human shows far greater transcript diversity (32, 58, 61).

Fig. 6 Two RBFOX1 isoform modules capture distinct biological and disease associations.

(A) Previous studies have identified RBFOX1 as a critical hub of neuronal and synaptic modules down-regulated across multiple psychiatric disorders (13, 15). We identified two pairs of modules with distinct RBFOX1 isoforms as hub genes. Plots show the top 25 hub genes of modules isoM2 and isoM17, following the same coloring scheme as in Fig. 5C. (B) Distinct module-eigengene trait associations are observed for isoM2 (down-regulated in ASD only) compared with isoM17, which is down-regulated in ASD and SCZ. (C) Modules show distinct enrichments for nuclear and cytoplasmic RBFOX1 targets, defined experimentally in mouse (32). (D) Genes harboring DS events observed in ASD and SCZ show greater overlap with isoM17, consistent with its association with nuclear RBFOX1 targets. (E) Modules show distinct patterns of genetic association. isoM2 exhibits broad enrichment for GWAS signal in SCZ, BD, and MDD, as well as for epilepsy risk genes, whereas isoM17 shows no apparent genetic enrichment (21).

Previous transcriptional networks related to ASD, BD, and SCZ did not separate inhibitory and excitatory neuron signals (13). The increased resolution here allowed us to identify several modules enriched in inhibitory interneuron markers (Fig. 5B), including geneM23/isoM19, which is down-regulated in ASD and SCZ, with a trend toward down-regulation observed in BD; downsampling in the SCZ dataset suggests that the lack of significance in BD may be due to a smaller sample size (fig. S14). This module pair contained as hubs the two major γ-aminobutyric acid (GABA) synthesizing enzymes (GAD1, GAD2), multiple GABA transporters (SLC6A1, SLC24A3), many other known interneuron markers (RELN, VIP), as well as DLX1 and the lncRNA DLX6-AS1, both critical known regulators of inhibitory neuron development (62). This inhibitory neuron–related module is not enriched for common or rare genetic disease–associated variation, although other studies have found enrichment for SCZ GWAS signal among interneuron markers defined in other ways (63).

Several neuronal modules that distinguish between the disorders differentiate BD and SCZ from ASD, including the module pair geneM21/isoM30 (Fig. 5C), which captures known elements of activity-dependent neuronal gene regulation, whose hubs include classic early-response (ARC, EGR1, NPAS4, NR4A1) and late-response genes (BDNF, HOMER1) (40). Although these modules were not significantly down-regulated in ASD, subsampling indicates that the differences between disorders could be driven by sample size (fig. S14). These genes play critical roles in regulating synaptic plasticity and the balance of excitatory and inhibitory synapses (40). Of note, a nearly identical module was recently identified as a sex-specific transcriptional signature of major depression and stress susceptibility (64). We further observed that these modules may be affected by medication exposure. Indeed, geneM21/isoM30 was associated with genes down-regulated by chronic high doses of the antipsychotic haloperidol, as well as genes up-regulated by the antidepressant fluoxetine (fig. S11A). Furthermore, geneM21/isoM30 expression was negatively correlated with the degree of lifetime antipsychotic exposure in the subset of patients for whom these data were available (P = 0.001, Pearson correlation; fig. S11B). As such, it will be worthwhile to determine whether this module is a core driver of the therapeutic response, as has been suggested (65). Finally, other neuronal modules distinguished SCZ and BD from ASD (Fig. 5B), including geneM7, enriched for synaptic and metabolic processes with the splicing regulator NOVA2 (Fig. 5C). This neuronal module was significantly enriched for both BD and SCZ GWAS signals, supporting a causal role for this module.

Distinct neural-immune trajectories

Previous work has identified differential activation of glial and neural-immune processes in brains from patients with psychiatric disorders (15, 52, 57, 6669), including up-regulation of astrocytes in SCZ and BD (13, 57) and both microglia and astrocytes in ASD (19, 70). Evidence supports hyperactive complement-mediated synaptic pruning in SCZ pathophysiology, presumably through microglia (3), although postmortem microglial up-regulation was observed only in ASD (13, 19, 70). We examined whether our large cohort of ~1000 control brains, capturing an age range from birth to 90 years, would enable refinement of the nature and timing of this neuroinflammatory signal and potential relationship to disease pathogenesis (Fig. 7A). Four modules were directly related to neural-immune processes (Fig. 7, A to C), two of which are gene/isoform module pairs that correspond clearly to cell type–specific gene expression: one representing microglia (geneM6/isoM15) and the other astrocytes (geneM3/isoM1), as they are strongly and selectively enriched for canonical cell type–specific marker genes (Fig. 7, C to E). Two additional immune-related modules appear to represent more broadly expressed signaling pathways: interferon (IFN) response (geneM32) and NFkB (geneM5/isoM5). The IFN-response module (geneM32) contains critical components of the IFN-stimulated gene factor 3 (ISGF3) complex that activates the transcription of downstream IFN-stimulated genes, which comprise 59 of the 61 genes in this module (71). The NFkB module pair (geneM5/isoM5) includes four out of five NFkB family members (NFkB1, NFkB2, REL, RELA), as well as many downstream transcription factor targets and upstream activators of this pathway.

Fig. 7 Distinct neural-immune trajectories in disease.

(A) Coexpression networks refine the neural-immune/inflammatory processes up-regulated in ASD, SCZ, and BD. Previous work has identified specific contributions to this signal from astrocyte and microglial populations (13, 19). Here, we identify additional contributions from distinct IFN-response and NFkB signaling modules. (B) Eigengene-disease associations are shown for each of four identified neural-immune module pairs. The astrocyte and IFN-response modules are up-regulated in ASD and SCZ. NFkB signaling is elevated across all three disorders. The microglial module is up-regulated in ASD and down-regulated in SCZ and BD. (C) Top hub genes for each module are shown, along with edges supported by coexpression (light gray; Pearson correlation > 0.5) and known protein-protein interactions (dark lines). Nodes follow the same coloring scheme as in Fig. 5C. Hubs in the astrocyte module (geneM3/isoM1) include several canonical, specific astrocyte markers, including SOX9, GJA1, SPON1, and NOTCH2. Microglial module hub genes include canonical, specific microglial markers, including AIF1, CSF1R, TYROBP, and TMEM119. The NFkB module includes many known downstream transcription factor targets (JAK3, STAT3, JUNB, and FOS) and upstream activators (IL1R1, nine TNF receptor superfamily members) of this pathway. (D) The top four GO enrichments are shown for each module. (E) Module enrichment for known cell type–specific marker genes, collated from sequencing studies of neural-immune cell types (98102). (F) Module eigengene expression across age demonstrates distinct and dynamic neural-immune trajectories for each disorder.

The dynamic trajectories of these processes in cases with respect to controls reveal distinct patterns across disorders (Fig. 7F). The IFN-response and microglial modules are most strongly up-regulated in ASD, peaking during early development, coincident with clinical onset. In contrast, in SCZ and BD, the microglial module is actually down-regulated, driven by a later dynamic decrease, dropping below controls after age ~30. The NFkB module, which is up-regulated across all three disorders, maximally diverges from controls during early adulthood, coincident with typical disease onset in SCZ and BD. Accordingly, this NFkB module contained C4A, the top GWAS-supported, and strongly up-regulated, risk gene for SCZ (3). This pattern is distinct from that of ASD, which shows a dynamic trajectory but remains up-regulated throughout (Fig. 7F).

Noncoding modules and lncRNA regulatory relationships

As many lncRNAs are predicted to have transcriptional regulatory roles, we next assessed whether mRNA-based coexpression networks could provide additional functional annotation for ncRNAs. As a subset of lncRNAs are thought to function by repressing mRNA targets (72), we applied csuWGCNA (73) to identify potential regulatory relationships (21). We identified 39 modules (csuM) using csuWGCNA, all preserved in the signed networks with strong cell type and GWAS enrichments, which captured 7186 negatively correlated lncRNA-mRNA pairs within the same module (fig. S15). We provide a table of putative mRNA targets for these brain-expressed lncRNAs, including 209 exhibiting DE in ASD, 122 in BD, and 241 in SCZ (table S6).

A salient example of the power of this approach for functional annotation is LINC00473, a hub of the neuronal activity–dependent gene regulation module (geneM21/isoM30; Fig. 5C). Expressed in excitatory neurons and down-regulated in SCZ (log2FC −0.16, FDR < 0.002), LINC00473 is regulated by synaptic activity and down-regulates immediate early gene expression (74), consistent with its hub status in this module. Similarly, we identify the lncRNA DLX6-AS1, a known developmental regulator of interneuron specification (62), as the most central hub gene in the interneuron module (geneM23/isoM19), which is down-regulated in ASD and SCZ. This interneuron module also contains LINC00643 and LINC01166, two poorly annotated, brain-enriched lncRNAs. LINC00643 is down-regulated in SCZ (log2FC −0.06, FDR = 0.04), whereas LINC01166 is significantly down-regulated in BD (log2FC −0.17, FDR < 0.05) with trends in ASD and SCZ (FDR < 0.1). Our data suggest a role for these lncRNAs in interneuron development, making them intriguing candidates for follow-up studies. Using fluorescence in situ hybridization (FISH), we confirmed that both LINC00643 and LINC01166 are expressed in GAD1+ GABAergic neurons in area 9 of the adult brain, present both in the cell nucleus and the cytoplasm (Fig. 8A and fig. S16), although expression was also detected in other non-GAD1+ neurons as well.

Fig. 8 LncRNA annotation, ANK2 isoform switching, and microexon enrichment.

(A) FISH images demonstrate interneuron expression for two poorly annotated lincRNAs—LINC00643 and LINC01166—in area 9 of adult human prefrontal cortex. Sections were labeled with GAD1 probe (green) to indicate GABAergic neurons and lncRNA (magenta) probes for LINC00643 (left) or for LINC01166 (right). All sections were counterstained with DAPI (blue) to reveal cell nuclei. Lipofuscin autofluorescence is visible in both the green and red channels and appears orange. Scale bar, 10 μm. FISH was repeated at least twice on independent samples (table S9) (21), with similar results (see also fig. S16). (B) ANK2 isoforms ANK2-006 and ANK2-013 show significant DTU in SCZ and ASD, respectively (*FDR < 0.05). (C) Exon structure of ANK2 highlighting (dashed lines) the ANK2-006 and ANK2-013 isoforms. (Inset) These isoforms have different protein domains and carry different microexons. ANK2-006 is affected by multiple ASD DNMs, while ANK2-013 could be entirely eliminated by a de novo CNV deletion in ASD. (D) Disease-specific coexpressed PPI network. Both ANK2-006 and ANK2-013 interact with NRCAM. The ASD-associated isoform ANK2-013 has two additional interacting partners, SCN4B and TAF9. (E) As a class, switch isoforms are significantly enriched for microexon(s). In contrast, exons of average length are not enriched among switch isoforms. The y axis displays odds ratio on a log2 scale. P values are calculated using logistic regression and corrected for multiple comparisons. (F) Enrichment of 64 genes with switch isoforms for: ASD risk loci (81); CHD8 targets (103); FMRP targets (33); mutationally constraint genes (104); syndromic and highly ranked (1 and 2) genes from SFARI Gene database; vulnerable ASD genes (105); genes with probability of loss-of-function intolerance (pLI) > 0.99 as reported by the Exome Aggregation Consortium (106); genes with likely-gene-disruption (LGD) or LGD plus missense de novo mutations (DNMs) found in patients with neurodevelopmental disorders (21).

Multiple ncRNAs including SOX2-OT, MIAT, and MEG3 are enriched in oligodendrocyte modules (geneM2/isoM13/csuM1; Fig. 5C) that are down-regulated in both SCZ and ASD. SOX2-OT is a heavily spliced, evolutionarily conserved lncRNA exhibiting predominant brain expression and a hub of these oligodendrocyte modules, without previous mechanistic links to myelination (75, 76). The lncRNAs MIAT and MEG3 are negatively correlated with most of the hubs in this module, including SOX2-OT (fig. S15). MIAT is also known to interact with QKI, an established regulator of oligodendrocyte-gene splicing also located in this module (77, 78). These analyses predict critical roles for these often overlooked noncoding genes in oligodendrocyte function (77, 78) and potentially in psychiatric conditions.

Isoform network specificity and switching

To more comprehensively assess whether aspects of disease specificity are conferred by alternative transcript usage or splicing, versus DE, we surveyed genes exhibiting DTU across disorders (21). We identified 134 such “switch isoforms,” corresponding to 64 genes displaying different DTU between ASD and SCZ (table S7). As an example, isoforms of SMARCA2, a member of the BAF-complex strongly implicated in several neurodevelopmental disorders including ASD (79), are up- and down-regulated in ASD and SCZ, respectively (fig. S17). Conversely, the isoforms of NIPBL, a gene associated with Cornelia de Lange syndrome (80), are down- and up-regulated in ASD and SCZ, respectively (fig. S17). Such opposing changes in isoform expression of various genes may represent differences in disease progression or symptom manifestation in diseases such as ASD and SCZ, mediated by genetic risk variants that create subtle differences in isoforms within the same gene that exhibit distinct biological effects in each disorder. A noteworthy example is the ASD risk gene ANK2 (81), whose two alternatively spliced isoforms, ANK2-006 and ANK2-013, are differentially regulated in SCZ and ASD (Fig. 8B). These switch isoforms show markedly different expression patterns, belonging to different coexpression modules, geneM3/isoM1 (Fig. 7C) and isoM18, which are enriched in astrocyte and neuronal cell types, respectively (Fig. 5A and fig. S12). The protein domain structure of these transcripts is also nonoverlapping, with ANK2-006 carrying exclusively ZU5 and DEATH domains and ANK2-013 carrying exclusively ankyrin repeat domains (Fig. 8C). Both isoforms are affected by a de novo ASD CNV, and ANK-006 also carries de novo mutations from neurodevelopmental disorders. Both isoforms bind to the neuronal cell adhesion molecule NRCAM, but ANK2-013 has two additional partners: TAF9 and SCN4B (Fig. 8D), likely cell type–specific interactions that suggest distinct functions of the isoforms of this gene in different neural cell types and diseases.

Finally, several studies have demonstrated that genes carrying microexons are preferentially expressed in the brain and their splicing is dysregulated in ASD (30, 82, 83). This PsychENCODE sample provided the opportunity to assess the role of microexons in a far larger cohort and across disorders. Indeed, we found that switch isoforms with microexons (3 to 27 base pairs) are significantly enriched in both ASD (FDR = 0.03) and SCZ (FDR = 0.03, logistic regression) (Fig. 8E) (21). Genes with switch isoforms are also enriched for the regulatory targets of two ASD risk genes, CHD8 and FMRP, as well as highly mutationally constrained genes (pLI > 0.99), syndromic ASD genes, and in genes with de novo exonic mutations in ASD, SCZ, and BD (Fig. 8F and table S7) (21). These data confirm the importance of microexon regulation in neuropsychiatric disorders beyond ASD, and its potential role in distinguishing among biological pathways differentially affected across conditions. This role for microexons further highlights local splicing regulation as a potential mechanism conferring key aspects of disease specificity, extending the larger disease signal observed at the isoform level in coexpression and DE analyses.

Discussion

We present a large-scale RNA-seq analysis of the cerebral cortex across three major psychiatric disorders, including extensive analyses of the noncoding and alternatively spliced transcriptome, as well as gene- and isoform-level coexpression networks. The scope and complexity of these data do not immediately lend themselves to simple mechanistic reduction. Nevertheless, at each level of analysis, we present concrete examples that provide proofs-of-principle and starting points for investigations targeting shared and distinct disease mechanisms to connect causal drivers with brain-level perturbations.

Broadly, we find that isoform-level changes exhibit the largest effect sizes in diseased brain, are most enriched for genetic risk, and provide the greatest disease specificity when assembled into coexpression networks. Notably, disturbances in the expression of distinct isoforms of more than 50 genes are differentially observed in SCZ and ASD, which in the case of the ASD risk gene ANK2 is predicted to affect different cell types in each disorder. Moreover, we observe disease-associated changes in the splicing of dozens of RNA-binding proteins and splicing factors, most of whose targets and functions are unknown. Similarly, nearly 1000 ncRNAs are dysregulated in at least one disorder, many with significant CNS enrichment but, until now, limited functional annotation.

This work highlights isoform-level dysregulation as a critical, and relatively underexplored, proximal mechanism linking genetic risk factors with psychiatric disease pathophysiology. In contrast to local splicing changes, isoform-level quantifications require imputation from short-read RNA-seq data guided by existing genomic annotations. Consequently, the accuracy of these estimates is hindered by incomplete annotations, as well as by limitations of short-read sequencing, coverage, and genomic biases like GC content (84, 85). This may be particularly problematic in the brain, where alternative splicing patterns are more distinct than in other organ systems (82). We present experimental validations for several specific isoforms but try to focus on the class of dysregulated isoforms, and the modules and biological processes they represent, rather than individual cases, which may be more susceptible to bias. Longer-read sequencing, which provides a more precise means for isoform quantification, will be of great utility as it becomes more feasible at scale.

Several broad shared patterns of gene expression dysregulation have been observed in postmortem brain samples in previous studies—most prominently, a gradient of down-regulation of neuronal and synaptic signaling genes and up-regulation of glial-immune or neuroinflammatory signals. In this study, we refine these signals by distinguishing both up and down-regulated neuron-related processes that are differentially altered across these three disorders. Furthermore, we extend previous work that identified broad neuroinflammatory dysregulation in SCZ, ASD, and BD by identifying specific pathways involving IFN-response, NFkB, astrocytes, and microglia that manifest distinct temporal patterns across conditions. A module enriched for microglial-associated genes, for example, shows a clear distinction between disorders, with strong up-regulation observed in ASD and significant down-regulation in SCZ and BD. Overall, these results provide increased specificity to the observations that ASD, BD, and SCZ are associated with elevated neuroinflammatory processes (69, 8688).

By integrating transcriptomic data with genetic variation, we identify multiple disease-associated coexpression modules enriched for causal variation, as well as mechanisms potentially underlying specific disease loci in each of the diseases. In parallel, by performing a well-powered brain-relevant TWAS in SCZ, and to a lesser extent in BD and ASD, we are further able to elucidate candidate molecular mechanisms through which disease-associated variants may act. TWAS prioritizes dozens of previously unidentified candidate disease genes, including many that are dysregulated in diseased brain. Similar to the eQTLs identified in a companion study (17), the majority of these loci do not overlap with disease GWAS association signals. Rather, most are outside of the LD block and distal to the original association signal, highlighting the importance of orthogonal functional data types, such as transcriptome or epigenetic data (16, 47, 82, 89), in deciphering the underlying mechanisms of disease-associated genetic effects.

As with any case-control association study, multiple potential factors, many of which may represent reactive processes, contribute to gene expression changes in postmortem human brain samples. At each step of analysis, we have attempted to mitigate the contribution of these factors through known and hidden covariate correction, assessment of age trajectories, and enrichment for causal genetic variation. Supporting the generalizability of our results, we find significant correlations of the log2FC between randomly split halves of the data (fig. S3). This likely varies by transcript class, and some of the modest correlations are likely due to low-abundance genes, such as ncRNAs, which we prefer to include, though we recognize the inherent tension between expression level and measurement accuracy. We provide access to this extensive resource, both in terms of raw and processed data and as browsable network modules (Resource.PsychENCODE.org).

A large proportion of disease-associated coexpression modules are enriched for cell type–specific markers, as is overall disease DE signal, indicating that transcriptomic alterations in disease are likely driven substantially by (even subtle) shifts in cell type proportions, or cell type–specific pathways, consistent with our previous observations (13) and those in a companion study (17). Functional genomic studies often remove such cell type–specific signals, through the use of large numbers of expression-derived principal components or surrogate variables as covariates, to mitigate unwanted sources of variation and maximize detection of cis eQTLs (44). We retain the cell type–specific signals as much as possible, reasoning that cell type–related alterations may directly inform the molecular pathology of disease in psychiatric disorders, in which there is no known microscopic or macroscopic pathology. This rationale is supported by the consistent observation of the dynamic and disease-specific microglial up-regulation observed in ASD and the shared astrocyte up-regulation in SCZ and ASD. This approach, however, reduces the ability to detect genetic enrichment from GWAS, as current methods predominantly capture cis-acting regulatory effects. The modesty of genetic enrichments among disease-associated transcriptomic alterations may also indicate that gene expression changes reflect an indirect cascade of molecular events triggered by environmental as well as genetic factors or that genetic factors may act earlier, such as during development.

Finally, these data, while providing a unique, large-scale resource for the field, also suggest that profiling additional brains, especially from other implicated brain regions, will continue to be informative. Similarly, these data suggest that although isoform-level analyses, including the identification of isoform-specific protein-protein interactions (PPI) and cell type specificity, pose major challenges for high-throughput studies, they are likely to add substantial value to our understanding of brain function and neuropsychiatric disorders. Finally, as GWAS studies in ASD and BD increase in size and subsequently in power, their continued integration with these transcriptome data will likely prove critical in identifying the functional impact of disease-associated genetic variation.

Materials and methods summary

The data generated for this manuscript represent Freeze 1 and 2 of the PsychENCODE Consortium dataset. Postmortem human brain samples were collected as part of eight studies, detailed in fig. S1. RNA-seq and genotype array data were generated by each site and then processed together through a unified pipeline (fig. S1) by a central data analysis core. Raw data are available at (90), with processed summary-level data available at http://Resource.PsychENCODE.org.

For this study, we restricted analysis to frontal and temporal cortex brain samples from postnatal time points with at least 10 million total reads (fig. S2). RNA-seq reads were aligned to the GRCh37.p13 (hg19) reference genome via STAR 2.4.2a with comprehensive gene annotations from Gencode v19. Gene- and isoform-level quantifications were calculated using RSEM v1.2.29 (25). QC metrics were calculated from PicardTools v1.128, RNA-SeQC v1.1.8, featureCounts v1.5.1, cutadapt, and STAR. This generated a matrix of 187 QC metrics, which was then summarized by its top principal components, which were used as covariates in downstream analyses.

Genes were filtered to include only those on autosomes longer than 250 base pairs with transcripts per million reads (TPM) > 0.1 in at least 25% of samples, removing immunoglobulin biotypes. Outlier samples with discordant sex or low network connectivity z-scores were identified within each individual study and removed (91). Surrogate variable analysis was performed to identify hidden confounding factors (43). Count-level quantifications were corrected for library size by using trimmed mean of M-values (TMM) normalization and were log2 transformed. DE was assessed using a linear mixed effects model, accounting for known biological, technical, and four surrogate variables as fixed effects and subject-level technical replicates as random effects. Analogous assessments of DTE and DTU were performed using isoform-level expression quantifications and isoform ratios, respectively. P values were corrected for multiple testing using the Benjamini-Hochberg method, with significance set at 5%. Stratified LD-score regression (39) was used to investigate GWAS enrichment among DE gene sets. DE ncRNAs were further annotated for tissue-specificity using GTEx v6 data (10, 82), evolutionary conservation using phyloP and phastCons scores (92, 93), and exon-level selective constraint via CDTS (27). DS analysis was performed using LeafCutter (29), controlling for the same covariates as above after randomly selecting a single technical replicate for each distinct subject.

Robust WGCNA was performed to identify signed coexpression modules using gene- and isoform-level quantifications separately, after first regressing out all covariates except for the diagnostic group (94). Modules were summarized by their first principal component (eigengene), and disease associations were evaluated using a linear mixed-effects model as above. Significance values were FDR-corrected to account for multiple comparisons.

Genotype calls from SNP arrays were generated at each data production site separately and centralized for imputation, as detailed in a companion manuscript (17). Parallel haplotype prephasing and imputation were done using Eagle2, Minimac3, with the HRC reference panel for imputation. Calculation of gene-level eQTL and isoform-level expression QTLs (isoQTL) was done using QTLtools, as described in a companion manuscript (17). PRS were calculated for individuals of European ancestry using LDPred (95) with GWAS summary statistics and 1000 Genomes Phase 3 European subset as an LD reference panel.

TWAS was performed using the FUSION package [http://gusevlab.org/projects/fusion/ (46)] with custom SNP-expression weights generated from our adult transcriptome dataset. We used GCTA (96) to estimate cis SNP heritability for each gene in our dataset, and analysis was restricted to those exhibiting significant heritability (cis h2g P < 0.05). Association statistics were Bonferroni corrected (P < 0.05). SMR and the associated HEIDI test were performed as implemented in the SMR software package [http://cnsgenomics.com/software/smr/ (48)]. Experimental validations of selected splicing and isoform-level changes were performed using RT-PCR. See the supplementary materials and methods for full details.

Supplementary Materials

www.sciencemag.org/content/362/6420/eaat8127/suppl/DC1

Materials and Methods

Figs. S1 to S17

Tables S1 to S9

PsychENCODE Consortium Authors and Affiliations

References (107146)

References and Notes

  1. See supplementary materials and methods.