Research Article

Exome sequencing in amyotrophic lateral sclerosis identifies risk genes and pathways

See allHide authors and affiliations

Science  27 Mar 2015:
Vol. 347, Issue 6229, pp. 1436-1441
DOI: 10.1126/science.aaa3650

New players in Lou Gehrig's disease

Amyotrophic lateral sclerosis (ALS), often referred to as “Lou Gehrig's disease,” is a progressive neurodegenerative disease that affects nerve cells in the brain and the spinal cord. Cirulli et al. sequenced the expressed genes of nearly 3000 ALS patients and compared them with those of more than 6000 controls (see the Perspective by Singleton and Traynor). They identified several proteins that were linked to disease in patients. One such protein, TBK1, is implicated in innate immunity and autophagy and may represent a therapeutic target.

Science, this issue p. 1436; see also p. 1422


Amyotrophic lateral sclerosis (ALS) is a devastating neurological disease with no effective treatment. We report the results of a moderate-scale sequencing study aimed at increasing the number of genes known to contribute to predisposition for ALS. We performed whole-exome sequencing of 2869 ALS patients and 6405 controls. Several known ALS genes were found to be associated, and TBK1 (the gene encoding TANK-binding kinase 1) was identified as an ALS gene. TBK1 is known to bind to and phosphorylate a number of proteins involved in innate immunity and autophagy, including optineurin (OPTN) and p62 (SQSTM1/sequestosome), both of which have also been implicated in ALS. These observations reveal a key role of the autophagic pathway in ALS and suggest specific targets for therapeutic intervention.

Amyotrophic lateral sclerosis (ALS) is a fatal, progressive neurodegenerative disease characterized by loss of motor neuron function for which there is no effective treatment or definitive diagnostic test (most cases are diagnosed clinically) (1). Approximately 10% of ALS cases are familial and inherited in an autosomal dominant, autosomal recessive, or X-linked mode; the remaining cases are apparently sporadic (2, 3). Approximately 20 genes collectively explain a majority of familial cases, but these genes can explain only a minority (about 10%) of sporadic cases (2, 3) (Table 1).

Table 1 Variants in previously described and currently reported ALS genes.

Entries for reported inheritance model, reported FALS explained, and reported SALS explained are adapted from (3, 4, 51) with additional information from (1721, 5254). AD, autosomal dominant; AR, autosomal recessive; XD, X-linked. Best-model data are based on discovery data set for genes not included in the replication data set, and otherwise D = discovery, R = replication, and C = combined. Potential ALS cases explained are calculated as [(cases with variant in best model) – (controls with variant in best model)]; as case variants are risk factors for disease and may not be causal, this represents the potential percentage of cases for which this gene plays a role in disease.

View this table:

Protein and protein-RNA aggregates are a common feature of ALS pathology. These aggregates often include proteins encoded by genes that cause ALS when mutated, including those encoding SOD1, TARDBP (TDP-43), and FUS (4). Multiple genes (e.g., C9orf72, GRN, VCP, UBQLN2, OPTN, NIPA1, SQSTM1) in addition to TARDBP harbor variants pathogenic for TARDBP proteinopathy manifesting as ALS. This pathological TARDBP is part of a common pathway linked to neurodegeneration caused by diverse genetic abnormalities (5). Although murine models of ALS are limited, silencing certain ALS genes can cause regression of the disease phenotypes and clearance of the protein aggregates (6).

Identifying ALS genes

To identify genetic variants associated with ALS, we sequenced the exomes of 2869 patients with ALS and 6405 controls. We ran a standard collapsing analysis in which the gene was the unit of analysis, and we coded individuals according to the presence or absence of “qualifying” variants in each sequenced gene, where qualifying was defined according to one of six different genetic models: dominant coding, recessive coding, dominant not benign, recessive not benign, dominant loss of function (LoF), and recessive LoF (7). A total of 17,249 genes had more than one case or control sample with a genetic variant meeting the inclusion criteria for at least one of the genetic models tested (Fig. 1 and figs. S1 and S2). After correcting for multiple tests, the known ALS gene SOD1 (P = 7.05 × 10−8; dominant coding model) was found to have a study-wide significant enrichment of rare variants in ALS cases relative to controls, with qualifying variants in 0.871% of cases and 0.078% of controls. The genes HLA-B, ZNF729, SIRPA, and TP53 were found to have a significant enrichment of variants in controls; however, these associations appear to be due to sequencing differences and to subsets of the controls having been ascertained on the basis of relevant phenotypes.

Fig. 1 Quantile-quantile plot of discovery results for dominant coding model.

Results for the analysis of 2869 case and 6405 control exomes are shown; 16,491 covered genes passed quality control with more than one case or control carrier for this test. The genes with the top 10 associations are labeled. The genomic inflation factor λ is 1.060. The association with SOD1 passed correction for multiple tests.

On the basis of their associations with ALS in a preliminary discovery-phase analysis that used 2843 cases and 4310 controls, we chose 51 genes (table S4) for analysis in an additional 1318 cases and 2371 controls (sequenced using either whole exome or custom capture) (7). This analysis definitively identified TANK-binding kinase 1 (TBK1) as an ALS gene, with a discovery association P = 1.12 × 10−5, a replication P = 5.78 × 10−7, and a combined P = 3.60 × 10−11 (dominant not benign model). In the combined data set, dominant not benign variants in this gene were found in 1.099% of cases and 0.194% of controls, with LoF variants occurring in 0.382% of cases and 0.034% of controls.

Analysis of clinical features

We also performed gene-based collapsing analyses to identify genes associated with patients’ age of onset, site of onset, and survival time. No genes showed genome-wide significant association with any of these features. When applying multiple-test correction to only known ALS predisposition genes and TBK1, we found that d-amino acid oxidase (DAO) significantly correlated with survival times, with variant carriers showing shorter survival times (P = 5.5 × 10−7, dominant coding model). In mice, DAO is required for the clearance of d-serine. Indeed, d-serine levels are increased in SOD1 mutant mice and in spinal cords from people with familial ALS (FALS) or sporadic ALS (SALS) (8, 9). Known FALS mutations seem to reduce DAO activity, leading to neurotoxicity (10).

ALS patients with mutations in more than one known ALS gene are reported to have a younger age of onset (11). We did not replicate this finding in our data set. Without sequence data for known C9orf72 carriers (by far the most common ALS variant) and without information about ATXN2 expansions, we cannot adequately assess such an association.

Associations with other ALS genes

Although SOD1 was the only previously known ALS gene to attain a genome-wide significant association in our data, many other known ALS genes showed strong associations. For example, rare coding variants in TARDBP occurred in 0.662% of the ALS cases and 0.094% of controls in our study, ranking this gene second to SOD1 genome-wide under the dominant coding model (discovery data set, P = 2.93 × 10−6; Fig. 1). Consistent with previous reports and the ALS pathogenic TARDBP “DM” variants in the Human Genome Mutation Database (HGMD) (3, 12), we observed that the implicated nonsynonymous variants were generally predicted to have a benign effect on protein structure and function by PolyPhen-2 (13) and were clearly concentrated in the 3′ protein-coding portion of the gene in the ALS cases relative to controls (Fig. 2).

Fig. 2 Variants in TARDBP and VCP..

Dominant coding variants are shown in TARDBP and VCP (discovery data set). Case variants are enriched at the 3′ end of the gene in TARDBP and near the cell division protein 48 domain 2 region in VCP. LoF variants are filled in red, and nonsynonymous variants are filled in blue. Case variants are shown with red lines, control variants are shown with blue lines, and variants found in both cases and controls are shown with dashed lines.

In the case of OPTN, we observed rare damaging variants in 0.621% of ALS cases and 0.228% of controls (combined dominant not benign model, P = 0.002). The greatest enrichment was for LoF variants, which occur in 0.334% of cases and 0.114% of controls (combined dominant LoF model, P = 0.013). Whereas the initial studies of OPTN in ALS found a role in only a few families with a recessive genetic model, subsequent studies identified dominant mutations (14, 15). Here, dominant variants appeared to make a substantial contribution to sporadic disease.

Finally, we also observed a modest excess of qualifying variants in VCP (discovery dominant coding model, P = 0.022) and of LoF variants in SPG11 (combined dominant LoF model, P = 0.023). The former was driven by variants near the cell division protein 48 domain 2 region, where variants were found in 71% of case variants as compared to 25% of control variants (Fig. 2). Similar to OPTN, SPG11 has previously been reported as a cause of recessive juvenile ALS, but our data indicate that it could play a broader role because these cases did not have early onset (16).

We did not identify even a nominal association with other previously reported ALS genes in our data set, including the recently reported TUBA4A, MATR3, GLE1, SS18L1, and CHCHD10 (Table 1) (1721). A fraction of our samples were genetically screened for some of the known genes and positive cases had been removed before sequencing, which may partially explain the lack of signal (7). Additionally, a comparison with genes implicated in a recent assessment of the role of 169 previously reported and candidate ALS genes in 242 sporadic ALS cases and 129 controls showed no overlap beyond signals for SOD1 and SPG11 (22). Some of these previously studied genes are mutated so rarely that even the sample size presented here is not sufficient to detect causal variant enrichment, while others simply show comparable proportions of rare variants among cases and controls. Finally, certain genes did not show associations owing to the nature of the causal variation: Most known pathogenic variants in ATXN2 and C9orf72 are repeats that cannot be identified in our sequence data.

TBK1, autophagy, and neuroinflammation

Previous studies have implicated both OPTN (optineurin) (23) and SQSTM1 (p62) (24) in ALS. The current study implicates TBK1 and suggests that OPTN is a more important disease gene than previously recognized. These genes play important and interconnected roles in both autophagy and inflammation, emerging areas of interest in ALS research (Fig. 3) (2527). Mutations in SOD1, TARDBP, and FUS result in the formation of protein aggregates that stain with antibodies to SQSTM1 and OPTN (28). These aggregates are thought to lead to a cargo-specific subtype of autophagy involved in the degradation of ubiquitinated proteins through the lysosome (29). The SQSTM1 and OPTN proteins function as cargo receptors, recruiting ubiquitinated proteins to the autophagosome via their LC3 interaction region (LIR) motifs. TBK1 binds and phosphorylates both OPTN and SQSTM1 (3032) and enhances the binding of OPTN to the essential autophagosome protein LC3, thereby facilitating the autophagic turnover of infectious bacteria coated with ubiquitinated proteins, a specific cargo of the OPTN adaptor (33). Considering that TBK1 colocalizes with OPTN and SQSTM1 in autophagosomes, it is possible that all three proteins associate with protein aggregates in ALS (33). Indeed, TBK1 appears to play a role in the degradation of protein aggregates by autophagy (34). Additionally, OPTN also functions in the autophagic turnover of damaged mitochondria via the Parkin ubiquitin ligase pathway (35). Finally, VCP, encoded by another gene with mutations that cause ALS, also binds to ubiquitinated protein aggregates. VCP and autophagy are required for the removal of stress granules (dense cytoplasmic protein-RNA aggregates), which are a common feature of ALS pathology (36). Thus, OPTN, SQSTM1, VCP, and TBK1 may be critical components of the aggresome pathway required for the removal of pathological ribonucleoprotein inclusions (37). It appears that defects in this pathway can be selective for motor neuron death, in some cases apparently sparing other neuronal cell types.

Fig. 3 Genes and pathways implicated in ALS disease progression.

Genes known to have sequence variants that cause or are associated with ALS are indicated in red. These mutations can lead to the formation of protein or protein-RNA aggregates that appear as inclusion bodies in post mortem samples from both familial and sporadic ALS patients. Some of the mutant proteins adopt “prion-like” structures (see text for more detail). The misfolded proteins activate the ubiquitin-proteasome autophagy pathways to remove the misfolded proteins. Ubiquilin2 (UBQLN2) functions in both the ubiquitin-proteasome and autophagy pathways. TBK-1 (boxed) lies at the interface between autophagy and inflammation and associates with and phosphorylates both optineurin and p62, which can in turn enhance inflammation. ISG15 is induced by type I interferons (α and β) and interacts with p62 and HDAC6 in the autophagosome.

In addition to their roles in autophagy, OPTN, SQSTM1, and TBK1 all function in the NF-κB pathway (Fig. 3) (27, 38). For example, IκB kinases (IKKα and IKKβ) phosphorylate the IKK-related kinase TBK1, which in turn phosphorylates the IκB kinases, suppressing their activity in a negative autoregulatory feedback loop (39). TBK1 also phosphorylates and activates the transcription factor IRF3 (4042) and the critical innate immunity signaling components MAVS and STING (43). The coordinate activation of NF-κB and IRF3 turns on the transcription of many inflammatory genes, including interferon-β (44). The innate immune pathway and neuroinflammation in general are thought to be important aspects of neurodegenerative disease progression (45). Thus, pathogenic variants in OPTN, SQSTM1, or TBK1 would be expected to lead to defects in autophagy and in key innate immunity signaling pathways. Mutations in these genes might therefore interfere with the normal function of these pathways in maintaining normal cellular riboproteostasis (37).

The simple observation of enrichment of qualifying variants in patients shows that some of the variants we have identified influence risk of disease. We cannot determine, however, the extent to which they may interact with any other variants or other risk factors in determining risk. We therefore focus on estimating the proportion of patients in which variants in the relevant genes either cause or contribute to disease by subtracting the proportion of controls with qualifying variants in a gene from the proportion of cases with such variants. Although we saw no enrichment of case variants in SQSTM1, variants in OPTN and TBK1 were estimated to explain or contribute to 1.30% of cases in our data set when taken together (combined data set), suggesting an important subgroup of patients that may have a common biological etiology. No individual ALS cases had qualifying variants in more than one of these three genes.

The case variants found in OPTN and TBK1 were largely heterozygous and LoF, which suggests that a reduction in trafficking of cargo through the autophagosomal pathway or disruption of autophagosomal maturation may promote disease. Although the most obvious enrichment of case variants in TBK1 was seen for LoF, there was also a signal for nonsynonymous variants, which were found in 1.027% of cases and 0.365% of controls (combined data set). If any of these nonsynonymous variants are selective LoF for specific TBK1 functions as opposed to complete LoF variants, they may help elucidate which TBK1 function is most relevant to disease. We did not observe any clear concentration of qualifying variants in any part of the TBK1 gene (Fig. 4).

Fig. 4 Variants in TBK1 and OPTN..

Dominant not benign variants are shown in TBK1 and OPTN (combined data sets). LoF variants are filled in red, and nonsynonymous variants are filled in blue, and splice variants are filled in purple and shown below the protein line. Case variants are shown with red lines, control variants are shown with blue lines, and variants found in both cases and controls are shown with dashed lines.

NEK1 associates with ALS2 and VAPB

Although no additional genes showed sufficiently strong evidence to be definitively declared disease genes at this point, some of the strongly associated genes identified here may be securely implicated as sample sizes increase. One gene of particular interest is NEK1 (NIMA-related kinase 1). This gene just reached experiment-wide significance in the combined discovery and replication data sets (discovery P = 1.06 × 10−6, replication P = 0.001, combined P = 3.15 × 10−9; dominant LoF model). In the combined data set, dominant LoF variants in this gene were found in 0.836% of cases and 0.091% of controls (fig. S3). Additional studies are needed to confirm this suggestive association. Even if LoF variants in this gene do predispose to ALS, their relatively high prevalence in our controls and in public databases indicates that such variants have quite low penetrance, given that the lifetime prevalence of ALS is approximately 0.2%.

NEK1 is a widely expressed multifunctional kinase linked to multiple cellular processes, but it has not been linked to ALS. In an unbiased proteomic search for NEK1-interacting proteins in human embryonic kidney (HEK) 293T cells, we discovered an interaction between NEK1 and two widely expressed proteins previously found to be mutated in familial ALS: (i) the RAB guanine nucleotide exchange factor ALS2 (also called Alsin) involved in endosomal trafficking, and (ii) the endoplasmic reticulum protein VAPB involved in lipid trafficking to the plasma membrane (fig. S4, A and B, and table S5) (46). ALS2 reciprocally associated with NEK1 in HEK293T cells, and both ALS2 and VAPB associated with NEK1 reciprocally in mouse neuronal cell line NSC-34 (fig. S4, C to E).

Other top genes showing interesting association patterns but not obtaining genome-wide significance included ENAH, with variants in 0.263% of cases and 0.011% of controls (combined data set) (discovery P = 1.82 × 10−5, replication P = 0.133, combined P = 9.58 × 10−6; recessive not benign model); CRLF3, with variants in 0.453% of cases and 0.094% of controls (discovery P = 0.0002; dominant coding model); DNMT3A, with variants in 1.003% of cases and 0.456% of controls (combined data set) (discovery P = 0.0002, replication P = 0.261, combined P = 0.0002; dominant not benign model); and LGALSL, with variants in 0.382% of cases and 0.068% of controls (combined data set) (discovery P = 0.0002, replication P = 0.356, combined P = 0.0002; dominant coding model).


Our results implicate TBK1 as an ALS gene, thereby providing insight into disease biology and suggesting possible directions for drug screening programs. We have also provided evidence that OPTN plays a broader role in ALS than previously recognized. Both TBK1 and OPTN are involved in autophagy, with TBK1 possibly playing a crucial role in autophagosome maturation as well as the clearance of pathological aggregates (31, 34). These observations highlight a critical role of autophagy and/or inflammation in disease predisposition. It is also noteworthy that many drugs have been developed that act on TBK1-mediated pathways owing to their role in tumor cell survival (47) and can therefore be used to investigate the effects of drug-dependent loss of function of the kinase.

We also provide a large genetic data set for ALS, which suggests other possible ALS genes and provides a substantial collection of pathogenic variants across ALS genes (for genotype counts for all genes for all cases from this study, see After removing the number of variants expected to be seen on the basis of frequencies of rare variants in controls, we identify more than 70 distinct pathogenic mutations across SOD1, OPTN, TARDBP, VCP, SPG11, and TBK1 that can be used in future efforts to functionally characterize the role of these ALS genes. The identification of TBK1 and the expanded role for OPTN as ALS genes reinforce the growing recognition of the central role of autophagy and neuroinflammation in the pathophysiology of ALS (Fig. 3). These pathways appear to be activated in response to the formation of various types of cellular inclusions, the most prominent of which appear to be ribonucleoprotein complexes; this has led to the proposal that the control of protein misfolding (proteostasis) or ribonucleoprotein/RNA misfolding (“ribostasis”) plays a key role in neurodegenerative diseases (37). Cellular ribonucleoprotein inclusions can be caused by mutations in low-complexity sequence domains or “prion” domains of RNA binding proteins (37, 48) and can be exacerbated by mutations that diminish the autophagy pathway. Remarkably, a hallmark of motor neuron pathology in >95% of sporadic and familial ALS patients is the formation of TARDBP inclusions, which suggests that defects in ribostasis are a common feature of the disease (5, 49). The prominence of this disease mechanism in ALS has been proposed to be the consequence of the normal function of low-complexity domains in RNA binding proteins in the assembly of functional “RNA granules” such as P-bodies and stress granules [see (37) for detailed discussion].

Our exome sequencing study has identified variants that definitively predispose humans to a sporadic, complex human disease. Larger exome sequencing studies may reveal identifiable roles for genes that have not yet achieved significant associations. There is reason for optimism that such studies will begin to fill in an increasingly complete picture of the key genes implicated in ALS, providing multiple entry points for therapeutic intervention (Fig. 3). It is also likely that whole-genome sequencing (especially with longer reads) will prove of particular value in ALS, given that there are many causal variants refractory to identification by contemporary exome sequencing. Finally, we note that effective studies will depend critically on the control samples available. For example, we used the recently released ExAC data set of >60,000 samples to focus on extremely rare variants in our samples (50). Well-characterized, publicly available control sample sets will be of great importance for further discovery of variants associated with complex traits, in particular for whole-genome sequencing studies.

Supplementary Materials

Materials and Methods

Supplementary Text

Figs. S1 to S4

Tables S1 to S6

References (5565)

  • The full author list is included at the end of the manuscript.

References and Notes

  1. See supplementary materials on Science Online.
  2. Acknowledgments: J.W.H. is a consultant for Biogen Idec and Millennium: the Takeda Oncology Company. R.B. is a consultant for Biogen Idec and a cofounder of AviTx. F.B. is a founder of Regenesance. D.B.G. and R.M.M. are consultants for Biogen Idec. Some of the human samples were provided under a material transfer agreement from Washington University. The results presented in this study can be found in table S6. The case genotype counts for all variants in all genes can be found at See the supplementary materials for full acknowledgments including funding sources. FALS Sequencing Consortium members: Peter C. Sapp,1 Claire S. Leblond,2 Diane McKenna-Yasek,1 Kevin P. Kenna,3 Bradley N. Smith,4 Simon Topp,4 Jack Miller,4 Athina Gkazi,4 Ammar Al-Chalabi,4 Leonard H. van den Berg,5 Jan Veldink,5 Vincenzo Silani,6 Nicola Ticozzi,6 John Landers,1 Frank Baas,7 Christopher E. Shaw,4 Jonathan D. Glass,8 Guy A. Rouleau,9 Robert Brown1; other consortium members can be found in the supplementary materials. 1Department of Neurology, University of Massachusetts Medical School, Worcester, MA 01655, USA. 2Montreal Neurological Institute, Department of Neurology and Neurosurgery, McGill University, Montreal, Quebec H3A 2B4, Canada. 3Academic Unit of Neurology, Trinity Biomedical Sciences Institute, Trinity College Dublin, Dublin, Republic of Ireland. 4Department of Basic and Clinical Neuroscience, King’s College London, Institute of Psychiatry, Psychology and Neuroscience, London SE5 8AF, UK. 5Department of Neurology, Brain Center Rudolf Magnus, University Medical Centre Utrecht, 3508 GA Utrecht, Netherlands. 6Department of Neurology and Laboratory of Neuroscience, IRCCS Istituto Auxologico Italiano, Milan 20149, Italy, and Department of Pathophysiology and Transplantation, Dino Ferrari Center, Università degli Studi di Milano, Milan 20122, Italy. 7Department of Genome Analysis, Academic Medical Center. Meibergdreef 9, 1105AZ Amsterdam, Netherlands. 8Department of Neurology, Emory University, Atlanta, GA 30322, USA. 9Montreal Neurological Institute, Department of Neurology and Neurosurgery, McGill University, Montreal, Quebec H3A 2B4, Canada.
View Abstract

Navigate This Article