Report

Nuclear Membrane Proteins with Potential Disease Links Found by Subtractive Proteomics

See allHide authors and affiliations

Science  05 Sep 2003:
Vol. 301, Issue 5638, pp. 1380-1382
DOI: 10.1126/science.1088176

Abstract

To comprehensively identify integral membrane proteins of the nuclear envelope (NE), we prepared separately NEs and organelles known to cofractionate with them from liver. Proteins detected by multidimensional protein identification technology in the cofractionating organelles were subtracted from the NE data set. In addition to all 13 known NE integral proteins, 67 uncharacterized open reading frames with predicted membrane-spanning regions were identified. All of the eight proteins tested targeted to the NE, indicating that there are substantially more integral proteins of the NE than previously thought. Furthermore, 23 of these mapped within chromosome regions linked to a variety of dystrophies.

Many diseases have been linked to the nuclear envelope (NE), the membrane structure that forms the boundary of the nuclear compartment (1, 2). The NE contains three distinct functional domains: the outer membrane, a specialized region of the endoplasmic reticulum (ER) that shares properties with rough and smooth ER; the inner membrane, which is lined by the nuclear lamina, a polymer of intermediate filament-type lamin proteins associated with a number of integral membrane proteins; and the nuclear pore complexes (NPCs), which regulate nucleo-cytoplasmic transport of proteins and RNAs. Two integral membrane proteins are localized to the NPC in mammals (3), but the number specific to the inner nuclear membrane is unknown: It includes at least 11 proteins and their splice variants (1). No proteins specific to the outer membrane have yet been described.

To identify integral proteins of the NE, we took advantage of recent advances in high-throughput shotgun proteomics using multidimensional protein identification technology (MudPIT) (4, 5), by which the coupling of tandem mass spectrometry with multiple liquid chromatography steps allows analysis of the enormous number of peptides generated by direct digestion of a complex biochemical fraction. Eluting peptides are first measured in the ion trap mass spectrometer, then ions are isolated and fragmented by collision-induced dissociation (CID) with the helium bath gas, and the resulting product ions are measured. The fragmentation pattern often yields amino acid sequence information, allowing protein identification from a single unique peptide, thus increasing sensitivity. Avoiding prior separation by polyacrylamide gel electrophoresis removes its chemical and physical biases and the need to solubilize membrane proteins for the analysis (6).

To enrich for NE-specific proteins, we employed a “subtractive proteomics” approach (fig. S1). A microsomal membrane (MM) fraction can be prepared devoid of NEs because intact nuclei sediment readily, yet it contains the membranes that contaminate isolated NEs (e.g., mitochondrial membranes) and that are shared between peripheral ER and the NE. Thus, NE-specific proteins were determined by subtracting the proteins present in MM fractions from those of the NE fractions after proteomic analysis.

NEs and MMs isolated from rodent liver (Fig. 1A) (7, 8) were extracted with 0.1 M NaOH to enrich for transmembrane proteins in the pellet (fig. S2). Four times more MMs than NEs were analyzed to increase representation of minor ER proteins. Separately, NEs were extracted with salt and detergent to identify integral proteins more closely associated with the lamin polymer. Although this fraction is expected to contain more intranuclear contaminants, computational sequence analysis should separate those with predicted transmembrane regions. Proteins in all three pellets were proteolytically cleaved, and the complex peptide mixtures were separated by sequential salt steps followed by acetonitrile gradients to slowly release peptides into the mass spectrometer. Over 30,000 peptides were analyzed, yielding 2391 separate protein identifications between the three fractions (table S1).

Fig. 1.

(A) Schematic of method. Mouse NEs and MMs were both extracted with 0.1 M NaOH to enrich for transmembrane proteins. Separately, rat NEs were extracted with salt and detergent (25 mM Hepes, pH = 7.5; 400 mM NaCl; and 1% β-octylglucoside) to enrich for proteins that have tight associations with the lamin polymer. These three fractions were analyzed by MudPIT (22). (B) Proteins identified in the various fractions. (Left) A primary-color-scheme Venn diagram indicates separately identified proteins in each fraction and overlap between fractions. Circle areas equal set protein counts. Protein identifications were generated by searching spectra from tandem mass spectrometry against a database of 106,360 human, rat, and mouse sequences. The recent addition of ∼25,000 sequences to the rat database suggests that rat proteomic analyses are now viable. (Right) A paradigm for focus on novel transmembrane proteins: blue, total protein hits; green, proteins remaining after subtraction of the MM fraction; yellow, previously uncharacterized proteins (i.e., hypothetical ORFs); red, hypothetical transmembrane proteins. Transmembrane sequences were predicted with the use of Tmpred (12, 13). However, we used a higher stringency, restricting the data set to proteins with scores greater than 1000 in one direction and 1900 cumulative, on the basis of scores for previously characterized integral NE proteins. The final two protein sets yielded a total of 67 previously unknown putative integral NE proteins.

The logic of the subtractive approach was supported by the presence of all previously identified integral NE proteins in the NaOH-extracted NEs (table S2), and their absence from the NaOH-extracted MMs. Furthermore, no lamins (the most abundant NE-specific proteins) were recovered in the MMs. All but two of the known integral NE proteins also appeared in the salt- and detergent-extracted NEs. The absence of LUMA and nurim may indicate a less stringent association with the lamin polymer. All 31 known core NPC proteins (9) (table S3) were identified in the salt- and detergent-extracted fraction: Thus, we conclude that our identification approach is essentially comprehensive.

The dynamic range of MudPIT enabled identification of 1830 separate proteins in the salt- and detergent-extracted NEs, 566 proteins in the NaOH-extracted NEs, and 652 proteins in the NaOH-extracted MMs (Fig. 1B, left, and tables S4 to S6). Forty-one percent of the proteins in the NaOH-extracted NE fraction also appeared in the MMs (Fig. 1B), readily eliminating them through the subtractive approach. Many proteins remaining in the two NE fractions were known chromatin proteins and transcription factors, some of which (histones, HP1, and barrier to autointegration factor) have been shown to bind NE proteins (10). Some proteins were eliminated because they were known components of contaminating organelles such as mitochondria. However, certain ER proteins also have specific functions in the NE (11): Thus, some of those that we have dismissed may subsequently prove to be NE proteins. Nonetheless, we restricted our focus to the 337 uncharacterized open reading frames (ORFs) unique to the two NE fractions. The transmembrane prediction algorithm, TMPred (12), predicted that 34% of those in the NaOH-extracted NEs are integral membrane proteins (13). Predicted integral proteins from the salt- and detergent-extracted NEs were also considered, because some would be expected to have very strong interactions with lamins. Together, both fractions contained 67 previously unknown potential integral NE proteins (Fig. 1B, right, and tables S7 and S8).

To test whether this number is a realistic estimate of previously unknown nuclear integral membrane proteins, we selected a representative sample to characterize their ability to target to the NE in transiently transfected cells. Eight cDNAs were recovered, representing a range of sizes (112 to 674 residues), numbers of predicted transmembrane segments (one to five), and numbers of peptide hits (a crude estimate of abundance).

All eight proteins tested were targeted to the NE (Fig. 2 and fig. S3). Varying amounts of protein also accumulated in the ER and/or in cytoplasmic aggregates, but this is commonly observed for known integral NE proteins when exogenously overexpressed (fig. S3), presumably because binding sites at the NE become saturated. Because all were fused to an N-terminal epitope tag, the retention of the tag suggests that they are type 2 membrane proteins or polytopic with a cytoplasmic N-terminus, as seen for all previously identified integral NE proteins. The eight proteins whose NE-targeting has been confirmed have been assigned the prefix “NET” for nuclear envelope transmembrane protein.

Fig. 2.

Localization of five previously unknown putative nuclear transmembrane proteins. cDNAs recovered from a human liver library were inserted behind a cytomegalovirus promoter and a N-terminal-encoded hemagglutinin-epitope tag and transiently transfected into HeLa or COS7 cells. Cells were first preextracted by three washes with 1% triton x-100, followed by formaldehyde fixation (22). Asterisks indicate proteins that map to chromosome regions linked to dystrophies. During the course of this study, NET56 was separately identified and named Dullard (23); however, its subcellular localization was not determined. For galleries of micrographs and cells not preextracted, see fig. S3.

Preextraction with detergent before fixation removes most NE proteins that are not tightly associated with the insoluble lamin polymer. After this treatment, five of the eight proteins remained at the nuclear rim, arguing that they are normally concentrated at the NE (Fig. 2). Nonetheless, it remains possible that some have functions in both the NE and ER yet failed to appear in the MM fraction. The three putative transmembrane proteins that were not retained after detergent preextraction may normally be concentrated in the outer nuclear membrane or, alternatively, may be weakly associated with the lamina.

The NE targeting of all eight proteins tested argues that most of the remaining 59 are also integral NE proteins. Thus, the 13 integral proteins identified before this study likely represent only a minor fraction of the total. We postulate three reasons why we identified such a large number of proteins as compared to an earlier comparative proteomic analysis (14) that identified LUMA and Unc-84A: avoidance of losses from gel extractions, the sensitivity of tandem mass spectrometry, and the use of whole tissue instead of a cell line. The latter would enhance identification of cell-type specific proteins, because liver contains hepatocytes, Kupffer cells, a sinusoidal epithelia, perisinusoidal lipocytes, an endothelial vasculature, and muscle cells. Indeed, we identified two muscle-cell integral NE proteins, Syne-1 and Syne-2 (15, 16), that were absent from the earlier study. Among the proteins we identified are two (numbers 25 and 66) that contain the LEM domain, named for its occurrence in the NE proteins LAP2, emerin, and MAN1 (17), and one (number 9) that appears to be related to LAP1 through a gene duplication. Twelve of the 67 proteins contained functional domains associated with enzymatic activities such as phosphatases, acetyltransferases, and glycosyltransferases (table S7). Thus, the subtractive method is effective in identifying components of cellular substructures and can be applied to any well-characterized subcellular fractionation system where contaminating fractions can be prepared free of the fraction of interest.

Thirteen human diseases, mostly dystrophies, have been associated with mutations in NE proteins, including both lamins and lamin-binding integral proteins (1, 2). Yet ∼300 dystrophies remain for which a responsible gene has not been identified, some of which have been partially mapped to large chromosome territories. Five of the 67 proteins in our rodent data set did not have apparent human homologs. Of those remaining, 37% (23 genes) mapped within chromosome regions linked to 14 of these dystrophies (Fig. 3). Although any of these linked regions may contain hundreds of genes, there are several compelling arguments that our proteins make good candidates for disease links: (i) NE proteins have been linked to eight dystrophies; (ii) the genes we identified have an increased frequency in loci linked to disease (37%) as compared to random distributions (25%) (twice the frequency if dystrophies only mapped to very large territories are excluded); and (iii) there is a precedent for multiple interacting NE proteins causing variants of the same disease [lamin A and emerin in Emery-Dreifuss muscular dystrophy (18, 19)]. In this light, nine putative integral NE proteins from our data set are located within chromosome regions linked to three Charcot-Marie-Tooth disease variants and two limb-girdle muscular dystrophy variants; both diseases have variants caused by lamin A mutations (20, 21). Four of the proteins that map to dystrophy-linked chromosome regions target to the NE, and two of these are resistant to preextraction with detergent, suggesting an association with the lamina. Thus, it seems highly probable that some of these 62 human putative NE proteins will be linked to disease. We postulate that the reason so many dystrophies have already been mapped to the NE arises from the complex set of functions carried out by the NE and from the intricate network of interacting proteins on which NE organization depends.

Fig. 3.

Possible association of previously unknown integral nuclear membrane proteins with genetic diseases. Chromosome locations of the genes encoding potential integral NE proteins were determined with the use of Genbank's human genome resources. Dystrophies mapped to large chromosome regions were obtained from (24). Putative integral NE proteins encoded within these regions are indicated by connecting lines. The percentage of the total genes in the genome within the disease-linked regions was calculated with the use of Genbank resources to determine the random probability of genes occurring within them. Proteins designated NETs have confirmed NE localization.

Supporting Online Material

www.sciencemag.org/cgi/content/full/301/5638/1380/DC1

Materials and Methods

Figs. S1 to S3

Tables S1 to S8

References and Notes

View Abstract

Navigate This Article