A large fraction of HLA class I ligands are proteasome-generated spliced peptides

See allHide authors and affiliations

Science  21 Oct 2016:
Vol. 354, Issue 6310, pp. 354-358
DOI: 10.1126/science.aaf4384

New players in the repertoire

Antigen-presenting cells, such as macrophages and dendritic cells, activate immunological T cells by presenting them with antigens bound by major histocompatibility complexes (MHCs). The proteasome typically processes these antigens, which include peptides derived from both self and microbial origins. Liepe et al. now report that, surprisingly, a large fraction of peptides bound to class I MHC on multiple human cell types are spliced together by the proteasome from two different fragments of the same protein. Such merged peptides might turn out to be useful in vaccine or cancer immunotherapy development.

Science, this issue p. 354


The proteasome generates the epitopes presented on human leukocyte antigen (HLA) class I molecules that elicit CD8+ T cell responses. Reports of proteasome-generated spliced epitopes exist, but they have been regarded as rare events. Here, however, we show that the proteasome-generated spliced peptide pool accounts for one-third of the entire HLA class I immunopeptidome in terms of diversity and one-fourth in terms of abundance. This pool also represents a unique set of antigens, possessing particular and distinguishing features. We validated this observation using a range of complementary experimental and bioinformatics approaches, as well as multiple cell types. The widespread appearance and abundance of proteasome-catalyzed peptide splicing events has implications for immunobiology and autoimmunity theories and may provide a previously untapped source of epitopes for use in vaccines and cancer immunotherapy.

The presentation of epitopes on the cell surface is a key mechanism by which organisms identify the presence of pathogens, metabolic malfunctioning, or tumors. The HLA class I (HLA-I) immunopeptidome—the set of epitopes allocated onto the HLA-I molecules—impinges on the CD8+ T cell repertoire and the cell-mediated immune response (1). HLA-I immunopeptidomes are usually investigated by sequence identification of peptides eluted from HLA-I molecules by means of tandem liquid chromatography–mass spectrometry (LC-MS/MS) (fig. S1). The key step for the transformation of a protein into HLA-I–restricted epitopes is usually processing by the proteasome (1), which cuts proteins into peptides; alternatively, the proteasome can also cut and paste peptide sequences, thereby releasing peptide antigens that do not correspond to the original protein sequence (2) (fig. S2). This proteasome-catalyzed peptide splicing (PCPS) has long been considered to occur only rarely; partly this has been because the screening of the HLA-I immunopeptidome for proteasome-generated spliced peptides was impeded by methodological challenges.

To overcome these problems, we developed an analytical strategy that accounts for recent discoveries underpinning the PCPS mechanism and can handle the vast proteome-wide human spliced peptide database (fig. S3). With this strategy, we initially analyzed the HLA-I–eluted immunopeptidome of the GR lymphoblastoid cell line (GR-LCL); for a deeper coverage of the immunopeptidome, we adopted a two-dimensional (2D) peptide prefractionation strategy followed by a hybrid peptide fragmentation method [electron-transfer higher-energy collision dissociation (EThcD)] for peptide identification (3, 4) (fig. S1), supplemented by an adapted target-decoy approach (fig. S4). Our analysis led to the identification of 6592 nonspliced and 3417 spliced peptides 9 to 12 amino acid residues in length (9- to 12-mer peptides) (table S1). The latter number represents 34% of the total of identified antigenic peptides (Fig. 1A), thereby increasing the number of identified HLA-I ligands by some 50%. We confirmed the authenticity of the identified spliced antigenic peptides by comparing the LC-MS/MS spectra of 98 exemplary spliced peptides with their corresponding synthetic peptides and computing their correlation score (table S2 and fig. S5). In addition, we verified the proteasome-dependent generation of the spliced antigenic peptides in vitro for three examples by digestions of synthetic polypeptides harboring the corresponding antigenic peptides by purified 20S proteasome (fig. S6).

Fig. 1 Sizes and characteristics of spliced and nonspliced peptide pools.

(A and B) Summary of the 9- to 12-mer peptides presented by HLA-I molecules on the GR-LCL and C1R cell lines and human primary fibroblasts (A), or, as controls, on the GR-LCL and T2 cell lines (B). (C) Summary of the 9- to 12-mer peptides identified among the cell lysates of T2 and GR-LCL cell lines prefiltered for peptides smaller than 3 kDa not trypsin-digested, or of the C1R cell line prefiltered for polypeptides larger than 30 kDa and trypsin-digested. Samples were analyzed by different LC-MS/MS methods, as depicted. Light blue shaded areas are part of the spliced peptide pool.

We queried the HLA-I immunopeptidome mass spectrometry data of GR-LCL against the standard Swissprot human proteome database, which does not account for spliced peptides; this revealed that 655 peptides (i.e., 9% of the total nonspliced antigenic peptides) would be erroneously matched against this incomplete database as nonspliced peptides, because they have much better hits as spliced peptides in our search against the combined spliced and nonspliced peptide database (Fig. 1A). The correct sequences for a set of these antigenic spliced peptides were verified by comparing the LC-MS/MS spectra of the synthetic spliced and nonspliced candidates with the corresponding LC-MS/MS of the GR-LCL HLA-I immunopeptidome (fig. S7, A to I). Our identifications are further supported by the ion score distributions (see fig. S4) of the nonspliced peptides and of those spliced peptides that were wrongly assigned as nonspliced peptides using the standard Swissprot human proteome database, which differ only slightly in their median but not in the overall shape (fig. S7J).

In independent technical replicates of the GR-LCL HLA-I immunopeptidome analyzed without prefractioning (see figs. S1 and S8A) through EThcD or higher-energy collision dissociation (HCD), we identified thousands of peptides; among them, the spliced peptide pools represent 21 to 32% of the HLA-I immunopeptidome diversity (Fig. 1B and table S1A). To corroborate whether this unexpected finding would not be a peculiarity of the GR-LCL cell line, we investigated the HLA-I immunopeptidome of unrelated cell lines. Here, similar results were obtained in the analysis by EThcD of a nonrelated C1R lymphoid cell line (5), where 30% of the HLA-I immunopeptidome variety is represented by spliced peptides (Fig. 1A and table S1B). This large prevalence of spliced peptides in the HLA-I immunopeptidome is not a peculiar characteristic of lymphoblastoid cell lines, because in the HLA-I immunopeptidome of primary human fibroblasts (6), 29% of the identified antigenic peptides were spliced peptides (Fig. 1A and table S1C). Again, if we were to query those data sets only against the standard Swissprot human proteome database, we would wrongly assign 3.7 to 7.2% of the antigenic peptides as nonspliced peptides (Fig. 1, A and B) while missing all spliced peptides.

Spliced peptides were prevalent not only in the HLA-I immunopeptidome but also in the unsorted pool of GR-LCL cell lysate peptides with molecular weight (MW) smaller than 3 kDa, the maximum size of peptides produced in vitro by the proteasome (7) (Fig. 1C). The numbers of both nonspliced and spliced 9- to 12-mer peptides declined when we used an inhibitor of proteasome activity such as epoxomicin (Fig. 1C and fig. S9). Inhibition of proteasome activity led to a longer median length of the nonspliced peptides (fig. S9), which was to be expected since proteasome generates peptides with an average length of 11 residues (7, 8). It also eliminated almost all spliced 9- to 12-mer peptides (Fig. 1C and fig. S9), thereby confirming that the identified spliced peptides are generated by proteasome.

We further verified the proteasome dependency of the spliced peptides by querying the HLA-I immunopeptidome of the T2 cell line, which lacks a functioning transporter associated with antigen processing (TAP). As for other TAP-deficient cell lines, the few antigenic peptides identified so far in the HLA-I immunopeptidome of the T2 cell line derive from both proteasome-mediated and signal peptidase–mediated proteolysis in a similar manner (9, 10). As expected, relative to other HLA-I immunopeptidomes, we identified a drastically reduced number of both spliced and nonspliced peptides eluted from the HLA-I molecules of T2 cells. Among them, spliced peptides represented only 13% of the whole T2 HLA-I immunopeptidome (Fig. 1B). By contrast, the analysis of a cytosolic unsorted pool of 9- to 12-mer peptides of the T2 cells (with MW < 3 kDa) showed a normal frequency of spliced peptides (25%; Fig. 1C). Together, these experiments provide further evidence that spliced peptides presented by HLA-I molecules are produced by the proteasome and that about half of the TAP-independent antigenic peptide pool is generated through proteasome activity (9).

As a final (negative) control, we performed LC-MS/MS analysis of a tryptic lysate of the C1R cell line (fig. S8). In this case, the cell lysate was first filtered to include only proteins with MW > 30 kDa to exclude intracellular peptides generated by the proteasome in the mixture. In this tryptic digest, we identified about 1300 9- to 12-mer nonspliced peptides but only a few 9- to 12-mer spliced sequences (Fig. 1C). In fact, only 2.7% of them are (likely incorrectly) annotated as spliced peptides, which provides us with an experimental limit on the overall false discovery rate (FDR) (see fig. S4B). We thus have independent experiments that provide evidence that about one-third of peptides bound to HLA-I are generated by PCPS.

Although spliced peptides represent one-third of the HLA-I immunopeptidome variety, their relevance from an immunological point of view could be undermined if they were not abundantly presented at the cell surface. We thus set out to quantify the abundance of each HLA peptide by label-free quantification based on the intensity of the MS ion peak area. Although this method is not applicable for single peptides (6, 8), it has recently become well accepted when analyzing large proteomics data sets (6, 11, 12). We substantiated this strategy by titrating two pools of synthetic nonspliced and spliced antigenic peptides; we observed no significant differences between nonspliced and spliced peptides (Fig. 2A). However, label-free quantification did reveal significant differences in the abundance distribution between nonspliced and spliced peptides. Indeed, the spliced antigenic peptides were on average 16 to 26% less abundant than nonspliced peptides in the HLA-I immunopeptidomes of three independent cell lines (Fig. 2, B and C). Therefore, on the basis of our results, we conclude that spliced peptides not only represent one-third of the HLA-I immunopeptidome variety but also approximately one-fourth of the total amount of peptide molecules presented on the HLA-I complex. In agreement with this finding, we recently reported that the abundance of peptides of a small group of melanoma-associated spliced epitopes exposed on the HLA-I complexes is comparable to that of nonspliced melanoma-associated epitopes (13). Moreover, we observed a specific response against those spliced epitopes in the peripheral blood of half of the melanoma patients we studied (13), highlighting the potential biological impact of the much larger set of spliced epitopes presented in this study.

Fig. 2 Semiquantitative comparison of HLA-I–eluted spliced and nonspliced peptides.

(A) Correlation between MS ion peak area and concentration of a pool of 75 nonspliced and 78 spliced peptides from two experimental replicates. The correlation coefficients with confidence intervals are reported in the charts. Linear regression was applied, and the resulting regression lines with their confidence intervals are depicted for spliced peptides (dark blue line and light blue shaded area) and nonspliced peptides (red line and pink shaded area). Neither the correlation coefficients nor the parameters of the regression lines significantly differ between spliced and nonspliced peptides. (B) Distribution of MS ion peak area of spliced and nonspliced peptides eluted from the HLA-I molecules of GR-LCL and C1R cell lines or human fibroblasts. Dashed lines indicate the median of the distribution for spliced peptides (blue) and nonspliced peptides (red). The MS ion peak area distribution of the nonspliced peptides is significantly larger than the distribution of the spliced peptides (Kolmogorov-Smirnov test, P < 10−16). The relative proportion of spliced peptides estimated from the integral of the MS ion peak areas of spliced peptides relative to the integral of the peak area of all peptides is reported. (C) Medians of MS ion peak area of spliced and nonspliced peptides identified in the HLA-I immunopeptidome of the GR-LCL and C1R cell lines or human primary fibroblasts. Samples were not prefractioned; all samples were analyzed by EThcD (GR-LCL and C1R cell lines and synthetic peptide pools) or by HCD (human fibroblasts).

Spliced epitopes could also represent a distinct pool of antigenic peptides with particular characteristics. The few pioneering studies on PCPS had already provided hints about generative mechanisms. For instance, they suggested that PCPS prefers specific peptide sequences, although the limited number of spliced peptides or epitopes identified has so far precluded an analysis with sufficient statistical power (2, 8, 1421). With our large pool of spliced antigenic peptides, this became possible. Contrasting the characteristics of the spliced and nonspliced peptides in the GR-LCL HLA-I immunopeptidome, we observed no significant differences in terms of (i) peptide length distribution (fig. S10A), in agreement with our previous observation on in vitro proteasome-catalyzed reactions (8); (ii) frequency of the number of putative parental proteins that can generate each spliced or nonspliced peptide (fig. S10B); and (iii) the frequencies of spliced and nonspliced peptides derived from a given antigen (fig. S10C). Remarkably, one-third of self antigens are represented on the GR-LCL cell surface only by spliced peptides (Fig. 3A and fig. S10D), which shows that PCPS increases the antigen exposure and expands the HLA-I immunopeptidome–mediated surveillance of the immune system.

Fig. 3 Antigens, characteristics, and sequence motifs of spliced and nonspliced peptides of the GR-LCL HLA-I immunopeptidome.

Data in all panels refer to the GR-LCL 2D-EthcD HLA-I immunopeptidome. (A) Number of antigens presented on the HLA-I molecules by only spliced, only nonspliced, or both spliced and nonspliced peptides. (B) Length distribution of the sequence between the two splice reactants (i.e., the intervening sequence). (C) Length distribution of N- and C-terminal splice reactants generating the antigenic spliced peptides. (D) Frequencies of amino acids at each residue of the nonspliced and spliced 9-mer peptides. (E) Distribution of amino acids in the PN, P1, P1′, and PC positions (see fig. S2) of the spliced peptides. In (D) and (E), all frequency values are normalized for the frequency of the amino acids in the human proteome; that is, they indicate the probability of observing a certain amino acid at a certain position given the frequency of this amino acid in the human proteome. Abbreviations for amino acid residues: A, Ala; C, Cys; D, Asp; E, Glu; F, Phe; G, Gly; H, His; I, Ile; K, Lys; L, Leu; M, Met; N, Asn; P, Pro; Q, Gln; R, Arg; S, Ser; T, Thr; V, Val; W, Trp; Y, Tyr.

Among the identified spliced antigenic peptides in the GR-LCL 2D HLA-I immunopeptidome, we observed a similar number of spliced peptides generated by the ligation of two splice reactants following their orientation in the parental protein or by inverting their order (i.e., reverse PCPS); 50.1% were normal cis spliced peptides (see fig. S2). We did not observe a clear preference for a specific length of intervening sequences (i.e., the sequences excised between two splice reactants) (Fig. 3B and fig. S2). Also, the length of the N- and C-terminal splice reactants (see fig. S2) was almost equally distributed, with the exception of a seemingly preferred length of two residues in the N-terminal splice reactant, which was most apparent for 9-, 10-, and 11-mer peptides (Fig. 3C and fig. S10E). Because the second residue of the antigenic peptide often corresponds to the anchor site of the specific HLA-I molecules, we speculate that preference for specific amino acids for the ligation could have introduced an evolutionary pressure on the HLA allotype selection, as has previously been hypothesized for the proteasome-dependent peptide hydrolysis and the C terminus of the HLA-I–restricted peptides (22).

In the GR-LCL data set, we also compared spliced and nonspliced antigenic peptide motifs in relation to HLA-I haplotypes by applying a neural network–based algorithm (NetMHC ANN) and an algorithm based on the stabilized matrix method (IEDB SMM) (23, 24) to predict in silico their binding to HLA-I molecules. The two algorithms performed similarly when we considered the nonspliced antigenic peptides (fig. S11, A to D). They did, however, differ significantly in the prediction of how efficiently the spliced peptides bind the specific HLA-A and HLA-B variants (fig. S11, A to D). Often, spliced peptides predicted by NetMHC ANN to be barely compatible with the specific HLA-I cleft bound it with an experimentally determined dissociation constant (IC50) below 5000 nM (table S2 and fig. S11E). Such a phenomenon might be due to intrinsic differences in the motifs of spliced and nonspliced antigenic peptides. Indeed, because the algorithms have been trained exclusively on nonspliced epitopes (or nonrandomized peptide libraries), their predictive power for that peptide type could be limited. This hypothesis is also supported by our previous in vitro PCPS analysis (8), where we observed that several spliced peptides were produced by proteasomal cutting at rarely used substrate cleavage sites. Such differences emerge when considering the sequence motifs of the spliced and nonspliced 9-mers within the GR-LCL HLA-I immunopeptidome (Fig. 3D) and the other immunopeptidome data sets (fig. S12), especially in positions 2 and 9, which correspond to frequently used anchor sites.

To gather information about the sequence preference of PCPS, we exploited the large number of spliced antigenic peptides available here, and we computed the amino acid distribution at the PN, P1, P1′, and PC positions (see fig. S2) of the GR-LCL HLA-I immunopeptidome. This outcome matched the data obtained for the replicate GR-LCL immunopeptidome (1D EThcD) but differed from the immunopeptidomes obtained from the C1R cell lineage and human primary fibroblasts (Fig. 3E and fig. S13). The difference between the patterns of the spliced peptide PN, P1, P1′, and PC positions could be due to the HLA anchor site differences among the three cell lines. Our more general analysis differs markedly from the HLA-A*02:01-restricted P1-P1′ position pattern recently published by Berkers et al. (20), thereby confirming that an HLA-unbiased strategy to identify spliced peptide patterns may be essential for developing PCPS prediction algorithms.

Our study shows that the spliced peptides bound to the HLA-I molecules are very frequent and comparable in their amount to the nonspliced peptides but represent a distinct pool of antigens with particular immunological characteristics. One of the key features that may have maintained PCPS through evolutionary history (8, 25, 26) might be its higher degree of freedom of selecting antigenic peptide sequences. Targeting antigens through nonspliced peptides can be limited by the sequence restrictions that antigens have as a result of their function. PCPS is a solution to this problem, as suggested by the fact that a significant portion of the antigens are represented by spliced peptides only. Of course, the unexpectedly large frequency and amount of HLA-I–restricted spliced peptides may—and, we strongly expect, will—have profound implications for the concept of self/nonself peptide presentation: The large variety of potential spliced antigenic peptides would markedly increase the number of antigenic peptides with overlapping sequences derived from either human or pathogen proteomes, with direct implications for autoimmunity (25, 27, 28). On the other hand, the frequency of antigenic spliced peptides and their features could also in turn have positive implications for therapies involving HLA-I–restricted epitopes, such as antiviral vaccinations or mutation-specific adoptive T cell therapy against tumors (2933); mutant antigens lacking HLA-I–restricted nonspliced epitopes could finally be targeted through spliced epitopes.

Supplementary Materials

Materials and Methods

Figs. S1 to S13

Tables S1 and S2

References (3444)

References and Notes

  1. Acknowledgments: Supported by Berlin Institute of Health grant CRG1-TP1 and Einstein Stiftung Berlin grant A2013-174 (P.M.K.); by NC3Rs through David Sainsbury Fellowship NC/K001949/1 (J.L.); by BBSRC grant BB/G007934/1, HFSP grant RGP0061/2011, Leverhulme Trust grant F/07058/BP, and the Royal Society through a Wolfson Research Merit Award (M.P.H.S.); and by Proteins@Work (project number 184.032.201) and the Gravity Program Institute for Chemical Immunology, both funded by the Netherlands Organisation for Scientific Research (F.M., A.J., and A.J.R.H.). We thank P. Henklein, P. Kunert, and B. Brecht-Jachan for the peptide synthesis; C. Keller, M. Bassani-Sternberg, and P. Hitchen for technical support; S. Islam for help in setting up the mass spectrometry analysis pipeline at Imperial College London; and J. Cottrell (Matrix Science Ltd.) for technical support with Mascot. The authors declare no competing financial interests. The RAW files used in this study are available at the archive (doi:10.5061/dryad.r984n). The scripts for the identification of HLA-I spliced peptides are available upon request.
View Abstract

Stay Connected to Science

Navigate This Article