Research Article

A glycine-specific N-degron pathway mediates the quality control of protein N-myristoylation

See allHide authors and affiliations

Science  05 Jul 2019:
Vol. 365, Issue 6448, eaaw4912
DOI: 10.1126/science.aaw4912

Glycine N-degron regulation revealed

For more than 30 years, N-terminal sequences have been known to influence protein stability, but additional features of these N-end rule, or N-degron, pathways continue to be uncovered. Timms et al. used a global protein stability (GPS) technology to take a broader look at these pathways in human cells. Unexpectedly, glycine exposed at the N terminus could act as a potent degron; proteins bearing N-terminal glycine were targeted for proteasomal degradation by two Cullin-RING E3 ubiquitin ligases through the substrate adaptors ZYG11B and ZER1. This pathway may be important, for example, to degrade proteins that fail to localize properly to cellular membranes and to destroy protein fragments generated during cell death.

Science, this issue p. eaaw4912

Structured Abstract

INTRODUCTION

The ubiquitin-proteasome system is the major route through which the cell achieves selective protein degradation. The E3 ubiquitin ligases are the major determinants of specificity in this system, which is thought to be achieved through their selective recognition of specific degron motifs in substrate proteins. However, our ability to identify these degrons and match them to their cognate E3 ligase remains a major challenge.

RATIONALE

It has long been known that the stability of proteins is influenced by their N-terminal residue, and a large body of work over the past three decades has characterized a collection of N-end rule pathways that target proteins for degradation through N-terminal degron motifs. Recently, we developed Global Protein Stability (GPS)–peptidome technology and used it to delineate a suite of degrons that lie at the extreme C terminus of proteins. We adapted this approach to examine the stability of the human N terminome, allowing us to reevaluate our understanding of N-degron pathways in an unbiased manner.

RESULTS

Stability profiling of the human N terminome identified two major findings: an expanded repertoire for UBR family E3 ligases to include substrates that begin with arginine and lysine following an intact initiator methionine and, more notably, that glycine positioned at the extreme N terminus can act as a potent degron. We established human embryonic kidney 293T reporter cell lines in which unstable peptides that bear N-terminal glycine degrons were fused to green fluorescent protein, and we performed CRISPR screens to identify the degradative machinery involved. These screens identified two Cul2 Cullin-RING E3 ligase complexes, defined by the related substrate adaptors ZYG11B and ZER1, that act redundantly to target substrates bearing N-terminal glycine degrons for proteasomal degradation. Moreover, through the saturation mutagenesis of example substrates, we defined the composition of preferred N-terminal glycine degrons specifically recognized by ZYG11B and ZER1.

We found that preferred glycine degrons are depleted from the native N termini of metazoan proteomes, suggesting that proteins have evolved to avoid degradation through this pathway, but are strongly enriched at annotated caspase cleavage sites. Stability profiling of N-terminal peptides lying downstream of all known caspase cleavages sites confirmed that Cul2ZYG11B and Cul2ZER1 could make a substantial contribution to the removal of proteolytic cleavage products during apoptosis. Last, we identified a role for ZYG11B and ZER1 in the quality control of N-myristoylated proteins. N-myristoylation is an important posttranslational modification that occurs exclusively on N-terminal glycine. By profiling the stability of the human N-terminome in the absence of the N-myristoyltransferases NMT1 and NMT2, we found that a failure to undergo N-myristoylation exposes N-terminal glycine degrons that are otherwise obscured. Thus, conditional exposure of glycine degrons to ZYG11B and ZER1 permits the selective proteasomal degradation of aberrant proteins that have escaped N-terminal myristoylation.

CONCLUSION

These data demonstrate that an additional N-degron pathway centered on N-terminal glycine regulates the stability of metazoan proteomes. Cul2ZYG11B- and Cul2ZER1-mediated protein degradation through N-terminal glycine degrons may be particularly important in the clearance of proteolytic fragments generated by caspase cleavage during apoptosis and in the quality control of protein N-myristoylation.

The glycine N-degron pathway.

Stability profiling of the human N-terminome revealed that N-terminal glycine acts as a potent degron. CRISPR screening revealed two Cul2 complexes, defined by the related substrate adaptors ZYG11B and ZER1, that recognize N-terminal glycine degrons. This pathway may be particularly important for the degradation of caspase cleavage products during apoptosis and the removal of proteins that fail to undergo N-myristoylation.

Abstract

The N-terminal residue influences protein stability through N-degron pathways. We used stability profiling of the human N-terminome to uncover multiple additional features of N-degron pathways. In addition to uncovering extended specificities of UBR E3 ligases, we characterized two related Cullin-RING E3 ligase complexes, Cul2ZYG11B and Cul2ZER1, that act redundantly to target N-terminal glycine. N-terminal glycine degrons are depleted at native N-termini but strongly enriched at caspase cleavage sites, suggesting roles for the substrate adaptors ZYG11B and ZER1 in protein degradation during apoptosis. Furthermore, ZYG11B and ZER1 were found to participate in the quality control of N-myristoylated proteins, in which N-terminal glycine degrons are conditionally exposed after a failure of N-myristoylation. Thus, an additional N-degron pathway specific for glycine regulates the stability of metazoan proteomes.

The ubiquitin-proteasome system (UPS) is the major route through which eukaryotic cells achieve selective protein degradation (1). The specificity of this system is provided by E3 ubiquitin ligases, of which more than 600 are encoded in the human genome. E3 ligases recognize specific sequence elements, known as degrons, that are present in substrate proteins (2). However, although a detailed knowledge of the specificity of E3 ligases for degrons will be essential for achieving a systems-level understanding of the UPS, our current knowledge of degron motifs remains remarkably sparse (3).

The first degrons to be discovered were located at the N terminus of proteins (4). N-terminal degrons are targeted by N-degron pathways [formerly known as N-end rule pathways (5)], of which there are two main branches: the Arg/N-degron pathway, through which UBR-family E3 ligases target N-termini typically generated through endoproteolytic cleavage (6, 7), and the Ac/N-degron pathway, through which proteins bearing acetylated N-termini are targeted for degradation by the E3 ligase MARCH6 (also known as TEB4) (8, 9). In addition, a Pro/N-degron pathway was recently described, through which proteins harboring an N-terminal proline residue are degraded by the GID E3 ligase complex (fig. S1) (10). Theoretically, these pathways have the capacity to target the majority of cellular proteins, but the extent to which they affect protein stability in a physiological context remains unclear. For example, loss of N-terminal acetyltransferase (NAT) enzymes has minimal effect on protein stability in yeast (11), which is inconsistent with a widespread role for the Ac/N-degron pathway.

Previously, we modified the Global Protein Stability (GPS) system (12) to develop a high-throughput method to characterize degron motifs in human proteins (13). This approach is based on a lentiviral expression vector that encodes two fluorescent proteins: DsRed, which serves as an internal reference, and green fluorescent protein (GFP) fused to a short peptide of interest, which is translated from an internal ribosome entry site (IRES). Because both DsRed and the GFP-peptide fusion protein are expressed from the same transcript, the GFP/DsRed ratio can be used to quantify the effect of the peptide sequence on the stability of GFP (13). We exploited the ubiquitin-fusion technique (4) to adapt this “GPS-peptidome” approach to search for N-terminal degron motifs. We thereby directly examined the contribution of N-terminal sequences to protein stability in human cells.

Stability profiling of the human N-terminome using GPS-peptidome technology

We synthesized an oligonucleotide library encoding the first 24 amino acids of the primary isoform(s) of all human proteins, both with and without an initiator methionine (~50,000 sequences). These were cloned into the “Ub-GPS” expression vector between the ubiquitin gene and GFP (Fig. 1A). Upon expression of the constructs in human embryonic kidney (HEK) 293T cells, proteolytic cleavage of the ubiquitin moiety by endogenous deubiquitinating enzymes led to the exposure of the peptides at the N terminus of GFP (Fig. 1A). We used fluorescence-activated cell sorting (FACS) to partition the population into six bins of equal size on the basis of the stability of the peptide-GFP fusion. The stability of each fusion was then quantified with Illumina sequencing, with each peptide assigned a protein stability index (PSI) score ranging between 1 (maximally unstable) and 6 (maximally stable) according to the proportion of sequencing reads in each bin (data file S1).

Fig. 1 GPS profiling of the human N-terminome.

(A) Schematic representation of the N-terminome GPS screen, in which the first 24 residues of all human proteins were expressed in the Ub-GPS vector as N-terminal fusions to GFP. (B) Distribution of protein stability scores observed from the screen depicted in (A). (C) Boxplots showing the distribution of stability scores for all peptides that begin with the indicated amino acid, when encoded either with (blue boxes) or without (orange boxes) an upstream methionine residue. (D to F) Heatmaps depicting the mean stability score for all peptides that begin with the indicated two amino acids, when encoded either with (E) or without (D) an upstream methionine residue; (F) illustrates the difference between the two.

We began by assessing the effect of the initiator methionine on protein stability. Overall, peptide-GFP fusions that lacked an initiator methionine were much less stable than their counterparts with an initiator methionine (Fig. 1B). However, this effect was only observed for certain N-terminal residues (Fig. 1C). Reporters that begin with amino acids bearing small side chains (C, V, G, P, T, A, and S) were generally relatively stable and exhibited little or no difference in overall stability, whether or not they were preceded by an upstream methionine residue. (Single-letter abbreviations for the amino acid residues are as follows: A, Ala; C, Cys; D, Asp; E, Glu; F, Phe; G, Gly; H, His; I, Ile; K, Lys; L, Leu; M, Met; N, Asn; P, Pro; Q, Gln; R, Arg; S, Ser; T, Thr; V, Val; W, Trp; and Y, Tyr.) This is consistent with efficient cleavage of the initiator methionine by methionine aminopeptidases when the following amino acid has a sufficiently small radius of gyration (14). By contrast, peptide-GFP fusions that begin with all other residues (except methionine itself) were generally stable only when preceded by an upstream methionine residue and were greatly destabilized in the absence of an initiator methionine (Fig. 1, C to F).

Overall, these data provide strong support for a central role of the Arg/N-degron pathway in protein quality control. Whereas proteins bearing native N-termini [methionine itself, or C/V/G/P/T/A/S, from which methionine is normally removed (14)] are broadly stable, proteins bearing aberrant N-termini (R/K/H/W/Y/F/L/I/D/E/N/Q, without a preceding methionine) are all highly unstable. The latter residues correspond perfectly to the primary type I (R/K/H), primary type II (W/Y/F/L/I), secondary (D/E), and tertiary (N/Q) N-terminal degrons of the Arg/N-degron pathway (fig. S1A). Crucially, however, when these residues were preceded by methionine—as they would be in the context of normal protein synthesis—broad stabilization was observed (Fig. 1F).

Computational identification of destabilizing N-terminal motifs

Subsequently, we focused on understanding the factors that determined the stability of peptide-GFP fusions synthesized with an initiator methionine. Stability scores for these fusions were distributed bimodally, with approximately one-third of the library exhibiting significant instability (Fig. 1B, blue histogram). One key factor that strongly influenced stability was amino acid composition (Fig. 2, A and B). For example, aspartic acid and glutamic acid were depleted from unstable peptides and enriched among the stable peptides, whereas hydrophobic residues such as tryptophan, phenylalanine, isoleucine, and leucine showed the opposite pattern. This effect is not specific to the N terminus, however, because similar rules govern the stability of reporter constructs in which peptides are fused at the C terminus of GFP (13).

Fig. 2 Identification of degron motifs located at protein N-termini.

(A and B) The effect of peptide composition on protein stability. Shown are heatmaps depicting the relative depletion (blue) or enrichment (red) of each amino acid across all positions of the 24-amino acid peptide among (A) unstable peptides versus (B) stable peptides. (C to F) Computational prediction of N-terminal degrons. (C) For all possible combinations of dipeptide motifs, the mean difference in stability between peptides that contain the motif at the extreme N terminus (that is, immediately following the initiator methionine) was compared with all peptides that contain the motif at any other internal position in the peptide. (D) Classes of N-terminal degrons. The majority of the top 100 predicted destabilizing N-terminal motifs encoded either glycine (G2), lysine (K2), arginine (R2), or cysteine (C2) at the second position; some example motifs are annotated. (E) Boxplots showing the distribution of stability scores for all peptides in which the indicated residues were encoded at the second position (colored boxes) versus any other internal position within the peptide (gray boxes). (F) Boxplots showing the distribution of stability scores for all peptides with the indicated residues encoded at the second position.

Most amino acids exerted a similar effect on stability regardless of their position across the 24-amino acid peptide, but we noticed that certain residues exerted differing effects specifically when encoded at the second position (Fig. 2, A and B). We therefore performed a computational analysis to identify motifs that might promote instability specifically when located at or near the N terminus of the peptide. For all possible combinations of dipeptide motifs, we compared the mean stability of all peptide-GFP fusions harboring the motif within the first seven N-terminal amino acids with those harboring the motif at an internal position in the 24-amino acid peptide (Fig. 2C and data file S2). Over 80% of the top 100 candidate destabilizing N-terminal motifs could be grouped into one of four categories solely based on the identity of the second residue: Lysine was present downstream of the initiator methionine in 26 motifs, arginine in 24 motifs, glycine in 22 motifs, and cysteine in nine motifs (Fig. 2D). Reporters that encode these residues at the second position were significantly less stable than reporters that contain these residues at any internal position (Fig. 2E), and globally, peptide-GFP fusions that begin MC-, MR-, MG-, and MK- exhibited the lowest mean stability (Fig. 2F). Thus, considering initiator methionine removal, this analysis identified N-terminal glycine and cysteine in addition to MR- and MK- as candidate destabilizing N-terminal motifs.

Exploring the substrate repertoire of UBR family E3 ligases

Next, we sought to identify the cellular machinery targeting each class of putative N-terminal degron. We began by investigating a role for UBR family E3 ligases. UBR1, UBR2, and UBR4 have been shown functionally to participate in the recognition of N-degrons (15), and so, through sequential rounds of CRISPR/Cas9–mediated gene disruption, we attempted to create a single-cell clone that lacks all three of these UBR proteins. Despite screening ~40 clones, we were unable to identify a clone in which simultaneous ablation of UBR1, UBR2, and UBR4 proteins was observed with immunoblot, suggesting that such a triple-mutant cell may not be viable. However, we were able to generate clones that express substantially reduced levels of two or more of the proteins (Fig. 3A). Ub-GPS reporters in which the initiator methionine of GFP was replaced with either arginine (R), lysine (K), or tyrosine (Y) were strongly destabilized in wild-type cells, but this effect was abrogated in UBR knockout (KO) clone 1 and clone 3 (fig. S2A) and completely abolished in clone 2 (Fig. 3B).

Fig. 3 Assessing the repertoire of UBR substrates among the human N-terminome.

(A to C) Assessing UBR-mediated degradation through N-terminal degron motifs. (A) CRISPR/Cas9–mediated generation of clones expressing reduced levels of UBR1, UBR2, and UBR4. (B) Functional validation of UBR KO clones. Optimal Arg/N-end rule substrates were highly unstable in wild-type cells but not in UBR KO clone 2, as measured with flow cytometry (fig. S2A). (C) UBR proteins target example peptide-GFP reporters in which lysine, arginine, or cysteine, but not glycine, are encoded at the second position. N-terminal peptides derived from the indicated genes were expressed in wild-type or UBR KO clone 2 by using the Ub-GPS system, and their stability was assessed by means of flow cytometry. (D to F) Global identification of N-terminal UBR substrates. (D) Schematic representation of the GPS screen. (E) Venn diagram summarizing the substrates stabilized >0.8 PSI units across the three UBR KO clones. (F) Heatmap showing the relative enrichment (red) or depletion (blue) of each amino acid across all positions of the 24-amino acid peptide comparing peptides stabilized in two or more of the UBR KO clones relative to the whole N-terminome library (fig. S4). (G to I) Characterization of N-terminal UBR degrons through saturation mutagenesis. Each of the first 10 residues of the N-terminal peptides derived from (G) ZNF334 (beginning MK-), (H) AHRR (beginning MR-), and (I) CDX1 (beginning MC-) were mutated to all other possible residues, and their stabilities were measured by means of FACS and Illumina sequencing. The darker the color, the greater the degree of stabilization as compared with that of the wild-type sequence (fig. S5).

We created a panel of Ub-GPS constructs in which either 23-amino acid peptides (Fig. 2C) or 3-amino acid peptides (fig. S2B) that harbor example degron motifs downstream of an initiator methionine were fused to the N terminus of GFP. In both cases, loss of UBR proteins resulted in the stabilization of reporters bearing three of the classes of degrons motifs: MK-, MR-, and N-terminal cysteine. However, loss of UBR proteins had little or no effect on the stability of the GFP-fusion proteins bearing N-terminal glycine, suggesting a role for additional E3 ligase(s) in the recognition of this particular N-terminal degron.

It was not surprising that UBR E3 ligases targeted N-terminal cysteine, given that nitric oxide–mediated oxidation and subsequent arginylation of N-terminal cysteine renders it a substrate for the Arg/N-degron pathway (16). That said, ATE1 disruption only led to modest stabilization of two peptide-GFP substrates that expose N-terminal cysteine (fig. S2, C and D), suggesting that additional routes to UBR-mediated degradation must also exist. UBR-mediated degradation of proteins that begin MK- and MR- was unexpected, however, suggesting that in addition to targeting truncated proteins bearing abnormal N-termini, UBR ligases might also target certain intact proteins that bear their initiator methionine. To confirm that the initiator methionine of these substrates was intact, and thus rule out the possibility that methionine removal was instead exposing canonical Arg/N-degrons, we used mass spectrometry to examine the N terminus of two example peptide-GFP UBR substrates expressed in UBR KO clone 2 (fig. S3A). In both cases, we were readily able to detect the intact N-terminal peptide with the initiator methionine present, whereas we could not detect any peptides corresponding to a putative processed form without an initiator methionine (fig. S3B).

To further examine this property of UBR proteins, we directly compared the stability of the entire Ub-GPS N-terminome library in wild-type cells versus UBR KO clones 1, 2, and 3 (Fig. 3D and data file S3A). Loss of UBR proteins had little effect on the overall stability of reporters synthesized with an N-terminal methionine; only 570 peptide-GFP fusion proteins (<3% of the N-terminome library) exhibited substantial stabilization (>0.8 PSI units) in any of the UBR mutant clones compared with control cells (Fig. 3E). Sequence analysis of the UBR substrates revealed a clear preference for particular N-terminal degron motifs (Fig. 3F and fig. S4, A to H). Consistent with our previous data (Fig. 2C and fig. S2B), peptides starting MC-, MK-, and MR- were all enriched. Peptides starting ML- and MI- were also overrepresented, and for three example peptides in each case, we validated that they were stabilized in UBR KO clone 2 (fig. S4I). In Saccharomyces cerevisiae, Ubr1 has been shown to target proteins that start MΦ- (where Φ is a bulky hydrophobic residue, W/F/Y/L/I) for degradation (17); however, unlike peptides that start ML- and MI-, we did not observe enrichment for peptides that start MF- or MY- among the UBR substrates, and observed only weak enrichment for peptides that start MW-.

Last, only a small proportion of all peptides in the library that begin MK-, MR-, ML-, or MI- were UBR substrates, suggesting that additional residues were essential for degron recognition. Analysis of the composition of all the UBR substrates identified in each category highlighted preferred residues enriched at downstream positions (fig. S4, E to H). Furthermore, for some example peptides that start MK-, MR-, and MC-, we defined the N-terminal UBR degron in detail by performing saturation mutagenesis experiments. We created a Ub-GPS library in which each of the residues from position 2 to position 10 of the 24-amino acid peptide were mutated to all other possible amino acids and measured the stability of the resulting peptide-GFP fusions by means of FACS and Illumina sequencing (data file S4A). These experiments confirmed the critical importance of the lysine, arginine, or cysteine residue encoded at the second position but also demonstrated that certain mutations at the third or fourth position could prevent degron recognition (Fig. 3, G to I, and fig. S5). These data also confirmed the requirement for these degron motifs to be positioned at the extreme N terminus because addition of just a single upstream amino acid (that is, immediately after the initiator methionine) resulted in stabilization of the peptide-GFP fusions (Fig. 3, G to I, and fig. S5, column “add”).

N-terminal glycine can act as a potent degron

We next focused on the one class of N-terminal degron motif that was not a substrate for UBR-mediated degradation: N-terminal glycine. To validate that N-terminal glycine did indeed constitute a degron motif, we performed a series of mutagenesis experiments on a panel of unstable Ub-GPS reporters in which 24-amino acid peptides starting MG- were fused to the N terminus of GFP (Fig. 4A and fig. S6A). In each case, the glycine residue was indeed critical for instability because a single substitution converting the glycine residue to serine (G2S) was sufficient to inhibit degradation (Fig. 4A and fig. S6A, left). Moreover, the position of the glycine residue at the extreme N terminus was also critical because addition of a single serine residue upstream of the glycine (add S) stabilized the peptide-GFP fusions to a similar extent (Fig. 4A and fig. S6A, center). Last, consistent with the notion that the initiator methionine is constitutively cleaved when followed by a small residue such as glycine, deletion of the initiator methionine (ΔMet) had no stabilizing effect on any of the peptide-GFP fusions (Fig. 4A and fig. S6A, right).

Fig. 4 Cul2ZYG11B and Cul2ZER1 target N-terminal glycine.

(A) Glycine at the N terminus can act as a potent degron. The N-terminal peptide derived from SNX11 and a mutant version that lacked the initiator methionine (ΔMet) were highly unstable, whereas mutant versions in which the terminal glycine was mutated to serine (G2S) or in which a serine residue was added between the initiator methionine and the glycine residue (add S) were not (fig. S6A). (B) Defining N-terminal glycine degrons through saturation mutagenesis. Each of the first 10 residues of the SNX11 peptide were mutated to all other possible residues, and their stabilities were measured by means of FACS and Illumina sequencing. The darker the color, the greater the degree of stabilization as compared with that of the wild-type sequence (fig. S6B). (C and D) Cul2 complexes target N-terminal glycine. Stabilization of the SNX11-GFP reporter (C) upon treatment with the CRL inhibitor MLN4924, and (D) after expression of dominant-negative (DN) versions of Cullins (fig. S7). (E and F) CRISPR screens identify the Cul2 substrate adaptors responsible for the recognition of N-terminal glycine. (E) Results of the SNX11-GFP reporter screen, which highlighted two CRL2 complexes (F) (fig. S8). (G to J) Cul2ZYG11B and Cul2ZER1 cooperate to target N-terminal glycine. (G) CRISPR-mediated ablation of both ZYG11B and ZER1 was required for full stabilization of the SNX11-GFP reporter. (H) Exogenous expression of either ZYG11B or ZER1 rescued degradation of the SNX11-GFP reporter in cells lacking endogenous ZYG11B and ZER1. Knockout of ZYG11B and ZER1 stabilized full-length SNX11 fused to the N terminus of GFP, both (I) when expressed in the context of the Ub-GPS system or (J) without upstream ubiquitin fusion (fig. S9).

For some example peptides, we defined the N-terminal glycine degron in detail by performing saturation mutagenesis experiments (data file S4A). This confirmed the absolute requirement for the exposure of glycine at the extreme N terminus because addition of any single amino acid upstream of the glycine resulted in stabilization of the peptide-GFP fusion (Fig. 4B and fig. S6, B to G, column “add”). The size of the degron motif appeared to be relatively small, but some substitutions at the residues immediately downstream of the exposed glycine did exert a stabilizing effect (Fig. 4B and fig. S6, B to G).

Cul2ZYG11B and Cul2ZER1 target N-terminal glycine

We began the search for the E3 ligase(s) responsible for targeting N-terminal glycine by using the small molecule MLN4924. MLN4924 acts as a broad inhibitor of Cullin-RING ligases (CRLs) by blocking Cullin neddylation (18), thus allowing us to narrow the search to either CRL or non-CRL ligase families. All our example Ub-GPS constructs bearing N-terminal glycine were stabilized upon treatment with MLN4924, implicating CRLs in the recognition of N-terminal glycine (Fig. 4C and fig. S7A).

Next, we sought to identify the specific CRL adaptor(s) responsible for recognition of the N-terminal glycine degron. Using dominant-negative constructs to inhibit each of the major Cullins, we determined that either Cul2 or Cul5 was responsible for the degradation of example Ub-GPS reporters harboring N-terminal glycine degrons (Fig. 4D and fig. S7B). Using these reporter substrates, we performed a series of CRISPR/Cas9–mediated genetic screens using a library of single-guide RNAs (sgRNAs) targeting known CRL2/5 substrate adaptor proteins (fig. S8A). Together, these screens identified ZYG11B as the CRL2 substrate adaptor responsible for recognition of the N-terminal glycine degron motif (Fig. 4E, fig. S8B, and data file S5). Intriguingly, ZER1, which is closely related to ZYG11B (29% amino acid identity) (fig. S8C), was enriched at or approaching the level of statistical significance in several screens, suggesting that these two related adaptors may collaborate in the degradation of proteins that expose N-terminal glycine (Fig. 4F). The third member of the ZYG11 family, ZYG11A, did not score in any of the screens, which is consistent with RNA-sequencing data (19) showing that it is rarely expressed across human tissues (fig. S8, C and D).

To examine the possibility of cooperation between ZYG11B and ZER1, we performed individual CRISPR/Cas9–mediated gene disruption experiments, ablating the function of ZYG11B or ZER1 either alone or in combination. Loss of ZYG11B alone did indeed stabilize all of the peptide-GFP fusion proteins (Fig. 4G and fig. S9A), but whereas complete stabilization was observed for two of the reporters, only partial stabilization was observed for the others. By contrast, loss of ZER1 alone had little stabilizing effect on any of the reporters; however, simultaneous disruption of both ZER1 and ZYG11B resulted in complete stabilization (Fig. 4G and fig. S9A). Furthermore, ZYG11B and ZER1 both associated with putative substrates that bear N-terminal glycine degrons (fig. S9B), and exogenous expression of either ZYG11B or ZER1 alone in ZYG11B/ZER1 double-mutant cells fully restored the degradation of a peptide-GFP fusion whose stabilization required ablation of both endogenous ZYG11B and ZER1 (Fig. 4H). Last, we validated that Cul2ZYG11B and Cul2ZER1 were able to mediate the degradation of full-length proteins that bear exposed glycine residues at their N termini (Fig. 4, I and J, and fig. S9, C to E).

To obtain a global view of the substrates targeted by these Cul2 complexes, we compared the stability of the Ub-GPS N-terminome library in wild-type cells with cells that lack either ZYG11B, ZER1, or both ZYG11B and ZER1 (fig. S10A and data file S3B). First, this revealed that ZYG11B and ZER1 share the majority of their substrates: There were 115 fusions stabilized in ZYG11B mutant cells and 36 stabilized in ZER1 mutant cells, whereas 488 were stabilized in the double-mutant cells. Sequence analysis of these shared substrates confirmed that N-terminal glycine was the most enriched feature while also highlighting preferred (F, G, H, K, and Y) and disfavored (D, E, I, P, S, and T) residues at the following position (fig. S10B). Of the substrates that were targeted solely by ZYG11B, over 90% encoded a glycine residue at the second position (fig. S10C). Intriguingly, there was no enrichment of N-terminal glycine among the substrates exclusively targeted by ZER1 (fig. S10D). This finding suggested that (i) any ZER1 substrates bearing an N-terminal glycine were also substrates for ZYG11B, and hence were still targeted for degradation in ZER1 mutant cells, and (ii) although N-terminal glycine was indispensable for recognition by ZYG11B, in some contexts ZER1 might recognize substrates that begin with residues other than glycine. We characterized one such substrate—the N-terminal peptide derived from KCNT2 (which begins MPYL)—in detail (fig. S11). In particular, saturation mutagenesis revealed that the hydrophobic residues encoded at the third and fourth position formed a critical part of the ZER1 degron, whereas some more flexibility was tolerated at the second position (fig. S11G). However, the location of these residues relative to the front of the peptide remained critical because the addition of a single amino acid upstream of the proline residue prevented degradation (fig. S11G).

Defining the N-terminal glycine degrons recognized by ZYG11B and ZER1

To gain further insight into the specific degron motifs recognized by ZYG11B and ZER1, we examined a larger number of potential peptide-GFP substrates that begin with glycine (fig. S12). These could be divided into three categories: peptides fully stabilized upon mutation of ZYG11B alone (fig. S12A); peptides stabilized partially upon mutation of ZYG11B alone, but which required combined mutation of ZYG11B and ZER1 for complete stabilization (fig. S12B); and peptides for which full redundancy was observed between ZYG11B and ZER1 (fig. S12C). For the vast majority of the peptides in the latter two categories, an aromatic residue (H, F, or Y) was located downstream of the terminal glycine, supporting the idea that ZER1 might preferentially recognize bulky residues located further along the peptide chain (fig. S12D).

We tested this hypothesis more rigorously by repeating the saturation mutagenesis experiments in the genetic background of either ZYG11B ablation or ZER1 ablation (data file S4, B and C). The results for some representative peptides are shown in Fig. 5, A to D, and fig. S13. Mutations promoting stability in wild-type cells were identical to those promoting stability in ZER1 mutant cells (Fig. 5, A and B). Therefore, these residues comprise the minimal N-terminal glycine degron, which is recognized by ZYG11B. Conversely, the ZER1 degron (as revealed in ZYG11B mutant cells) is more extensive because mutations that were two or more residues downstream of the terminal glycine interfered with degradation (Fig. 5, C and D). Overall, these data support a model in which both ZYG11B and ZER1 target substrates with exposed glycine residues at their N-termini; however, the recognition motif for ZYG11B is relatively small, comprising just the terminal glycine and the following residue, whereas the recognition motif for ZER1 may extend three or more residues along the polypeptide chain and preferentially comprises amino acids with bulky aromatic side chains.

Fig. 5 N-terminal glycine degrons are depleted from metazoan proteomes.

(A to D) Defining the degrons recognized by ZYG11B and ZER1 through saturation mutagenesis. Each of the first 10 residues of the SNX11 N-terminal peptide were mutated to all possible amino acids; the stability of each mutant in the resulting Ub-GPS library was measured in (A) wild-type, (B) ZER1 mutant, or (C) ZYG11B mutant cells. The color scale reflects the raw PSI measurement for each peptide-GFP fusion, so that the greater the intensity of the red color, the greater the stabilizing effect of the mutation. The heatmap in (D) illustrates the difference between the PSI in ZYG11B mutant cells versus ZER1 mutant cells; thus, a dark red color indicates mutations that prevent recognition by ZER1 but not by ZYG11B, whereas a dark blue color indicates mutations that permit recognition by ZER1 but not by ZYG11B (fig. S13). (E) Normalized amino acid frequencies across the first 10 residues (following the initiator methionine) of human proteins. (F and G) Depletion of N-terminal glycine degrons in metazoan proteomes. The normalized amino acid frequency of glycine encoded at the second position in the indicated proteomes is shown by the blue dots, and is further categorized depending on whether the glycine residue is followed by a residue favoring (orange dots) or disfavoring (green dots) CRL2-mediated degradation. The relationship observed across animal proteomes (F) is not apparent across fungal proteomes (G), which do not possess a ZYG11 ortholog. (***P < 0.001, Fisher’s exact test)

N-terminal glycine degrons are depleted from metazoan proteomes

GPS-peptidome technology has already identified a suite of degron motifs lying at the C terminus of human proteins (13). All of these degron motifs are depleted from the human proteome (13), suggesting evolutionary pressure to avoid degradation by E3 ligases that target terminal degrons. We thus examined the abundance of N-terminal glycine degrons in eukaryotic proteomes. As is the case for the residue at the extreme C terminus of eukaryotic proteins (13), the identity of the residue following the initiator methionine at the N terminus was far more variable than at all neighboring positions, suggesting that its properties are particularly important (Fig. 5E). Nonetheless, glycine was encoded at almost exactly the expected frequency at the second position across a range of metazoan model organisms (Fig. 5F, blue dots). However, classifying glycine residues as those favored (G followed by F, G, H, L, M, or Y) or disfavored (G followed by D, E, I, N, P, R, S, or T) for CRL2-mediated degradation revealed that, compared with sequences located internally, N-terminal glycine degron motifs are depleted from animal proteomes (Fig. 5F, orange dots), whereas N-terminal glycine motifs that are not efficiently recognized by ZYG11B and ZER1 are correspondingly enriched (Fig. 5F, green dots). As a control, we performed a similar analysis on a panel of reference fungal proteomes, which possess Cul2 but no ZYG11B-family ortholog (20). Consistent with the idea that there should be no selective pressure to avoid N-terminal glycine degrons in the absence of Cul2ZYG11B and Cul2ZER1, no such relationship was observed as in animal proteomes (Fig. 5G). Thus, the avoidance of N-terminal glycine motifs appears to have shaped the composition of metazoan proteomes.

ZYG11B and ZER1 target protein fragments bearing N-terminal glycine after proteolytic cleavage

Endoproteolysis generates an additional source of terminal degrons (2123). Caspase cleavage preferentially occurs immediately upstream of glycine residues (Fig. 6A). Of the ~1800 known human caspase cleavage sites, approximately one-third result in the exposure of glycine at the N terminus of the downstream fragment (24), suggesting a potential role for ZYG11B and ZER1 in the degradation of proteins cleaved during apoptosis. Moreover, in contrast to the situation at the native N-termini of human proteins (Fig. 5F), we found that N-terminal glycine degrons that favor CRL2-mediated degradation were enriched at caspase cleavage sites (Fig. 6B).

Fig. 6 ZYG11B and ZER1 target N-terminal glycine degrons generated through endoproteolytic cleavage.

(A and B) Caspase cleavage preferentially generates fragments that bear N-terminal glycine. (A) Logoplot depicting the consensus sequence surrounding all caspase cleavage sites annotated in Degrabase (24). (B) Compared with their frequency across the human proteome, preferred glycine degrons (orange bar) are enriched at known caspase cleavage sites, whereas disfavored glycine degrons (green bar) are depleted. (C to E) Caspase cleavage events generate N-terminal glycine degrons targeted by ZYG11B and ZER1. (C) Schematic representation of the caspase cleavage product Ub-GPS screen. (D) Heatmap showing the relative enrichment (red) or depletion (blue) of each amino acid across all positions of the 24-amino acid peptide comparing peptides stabilized in both ZYG11B/ZER1 double mutant cells with the whole caspase cleavage site library. (E) Profiles of example substrates. Residues flanking the caspase cleavage site (indicated by arrows) are shown (fig. S14A). (F and G) CRL2-mediated degradation of proteolytic cleavage products bearing N-terminal glycine degrons. (F) The full-length downstream caspase cleavage products of the indicated proteins were expressed by using the Ub-GPS system, and their stability was assessed in wild-type (gray) and dual ZYG11B/ZER1 mutant cells (red) by means of flow cytometry. (G) The caspase site in the indicated full-length open reading frames (ORFs) was replaced with a TEV protease cleavage site. Upon TEV expression (blue histograms), destabilization of the downstream cleavage products bearing N-terminal glycine degrons was observed in wild-type cells (top row) but not in dual ZYG11B/ZER1 mutant cells (bottom row) (fig. S14B).

We used GPS to assess a potential role for ZYG11B and ZER1 in the removal of proteolytic fragments. We generated a Ub-GPS peptide library in which the 24 residues downstream of all caspase cleavage events annotated in Degrabase (24) and PROSPER (25) were fused to the N terminus of GFP, and we profiled the stability of these peptide-GFP fusions in wild-type cells versus combined ZYG11B/ZER1 mutant cells (Fig. 6C and data file S6). The results confirmed that Cul2ZYG11B and Cul2ZER1 could target many caspase cleavage products: 225 substrates were stabilized >0.5 PSI units in both ZYG11B/ZER1 double-mutant lines, of which 219 (97%) harbored an N-terminal glycine residue (Fig. 6D; the GPS profiles of some example substrates are shown in Fig. 6E and fig. S14A).

We validated these findings in two ways. First, for a panel of example cleavage products exposing N-terminal glycine degrons, we verified that the full-length protein fragments downstream of the cleavage site were stabilized in ZYG11B/ZER1 double-mutant cells (Fig. 6F). Second, we demonstrated that these fragments would also be substrates for ZYG11B and ZER1 after endoproteolytic cleavage. Our initial attempts to perform these experiments by inducing the dimerization of caspase 9 (26) resulted in rapid cell death. Therefore, in order to decouple proteolytic cleavage from cell death, we engineered mutant versions of four example substrates in which the caspase cleavage site was replaced with the Tobacco Etch Virus (TEV) protease cleavage site (Fig. 6G). TEV protease recognizes the amino acid sequence ENLYFQ/G (where “/” represents the cleavage position), thus exposing an N-terminal glycine on the downstream fragment, and is active when expressed in mammalian cells (27, 28). Upon expression of TEV protease, we observed destabilization of the downstream cleavage products that bear N-terminal glycine degrons in wild-type cells, but this effect was abrogated in ZYG11B/ZER1 double-mutant cells (Fig. 6G and fig. S14B). Thus, ZYG11B and ZER1 are likely to be involved in the clearance of proteolytic fragments after caspase cleavage during apoptosis.

ZYG11B and ZER1 function in the quality control of N-myristoylated proteins

Last, we considered whether the recognition of N-terminal glycine degrons might be conditionally regulated through posttranslational modifications. Intriguingly, N-myristoylation, the process through which the 14-carbon fatty acid myristate is attached to the N terminus of a subset of eukaryotic proteins (29), occurs exclusively on N-terminal glycine (Fig. 7A). Given that our mutagenesis experiments showed that addition of just a single amino acid to the N terminus prevented ZYG11B- and ZER1-mediated recognition, we reasoned that N-myristoylation would prevent CRL2-mediated degradation from N-terminal glycine. Thus, we hypothesized that ZYG11B and ZER1 might play an important role in “myristoylation quality control,” degrading proteins that bear N-terminal glycine degrons conditionally exposed after a failure of N-myristoylation.

Fig. 7 Cul2ZYG11B and Cul2ZER1 target proteins that fail to undergo N-myristoylation for proteasomal degradation.

(A) N-myristoylation occurs on N-terminal glycine. (B) Schematic representation of the GPS screen designed to assess the effect of loss of N-myristoylation on protein stability. (C) Immunoblot validation of NMT1/2 knockout clones. Arrowheads indicate bands of the expected molecular weight. (D) Loss of N-myristoylation destabilizes peptide-GFP fusions that start with glycine. The heatmap shows the relative enrichment (red) or depletion (blue) of each amino acid across all positions of the 24-amino acid peptide, comparing the 91 peptides that exhibit substantial destabilization in all three NMT1/2 mutant clones relative to the whole N-terminome library. (E and F) ZYG11B and ZER1 target N-terminal glycine degrons exposed after a failure of N-myristoylation. (E) The first 24 amino acids from the indicated proteins were expressed as N-terminal fusions to GFP, and their stability in the indicated genetic backgrounds was measured by means of flow cytometry. The destabilization observed upon loss of NMT1/2 (gold histograms) is rescued in the absence of ZYG11B and ZER1 (purple histograms). (F) The abundance of the indicated myristoylated proteins was assessed by means of immunoblot in either control (AAVS1) or ZYG11B/ZER1 double-mutant cells (DKO), either with or without simultaneous ablation of NMT1/2. Src, which did not exhibit significant destabilization in NMT1/2 mutant cells in the GPS screen, is shown as a negative control. (G) Model depicting the role of Cul2ZYG11B and Cul2ZER in the quality control of N-myristoylated proteins.

Given that the N-myristoyltransferase enzymes (NMT1 and NMT2 in human cells) require less than the first 20 residues for substrate recognition (29), we reasoned that the peptide-GFP fusion proteins expressed from our N-terminome Ub-GPS library should undergo native N-myristoylation. In order to examine the effect of N-myristoylation on protein stability, we profiled the N-terminome Ub-GPS library in the presence or absence of NMT1/2 (Fig. 7B and data file S3C). Although we were not able to generate clones in which both NMT1 and NMT2 were completely ablated after CRISPR/Cas9–mediated gene disruption—a finding consistent with the notion that N-myristoylation is an essential process (30)—we did isolate three clones that retained only residual levels of one NMT enzyme as assessed by means of immunoblot (Fig. 7C). When we analyzed the composition of all the peptide-GFP fusion proteins whose stability was substantially reduced in all three NMT1/2 mutant clones, N-terminal glycine was the most enriched feature (Fig. 7D). Thus, a failure to undergo N-myristoylation can lead to instability of the unmodified protein.

To investigate a possible role for ZYG11B and ZER1 in this process, we examined the stability of a panel of example substrates (fig. S15A) in which N-terminal peptides derived from proteins known to undergo N-myristoylation (31) were expressed in the presence and absence of both NMT1/2 and ZYG11B/ZER1. These peptide-GFP fusion proteins were efficiently myristoylated, as evidenced by membrane localization in wild-type cells but not in NMT1/2 mutant cells (fig. S15B). Validating the screen results, in each case we observed destabilization of the peptide-GFP fusion protein upon loss of NMT1/2 (Fig. 7E, yellow histograms); moreover, ZYG11B and ZER1 were primarily responsible for this instability because complete or near-complete restabilization was observed upon ablation of both NMT1/2 and ZYG11B/ZER1 (Fig. 7E, purple histograms). The true magnitude of this effect is likely to be even greater because addition of the small-molecule NMT1/2 inhibitor IMP-1088 (32) to the NMT1/2 mutant clones, thus inhibiting the residual N-myristoyltransferase activity remaining in the cell, further enhanced the destabilization of the peptide-GFP substrates (fig. S15C). Moreover, the small degree of stabilization observed with some of the fusion proteins upon ablation of ZYG11B and ZER1 in wild-type (that is, NMT1/2-sufficient) cells (Fig. 7E, top row) suggested that some fraction of protein molecules do normally escape N-myristoylation, emphasizing the necessity for a degradative mechanism to remove these aberrant species.

Last, we wanted to validate that endogenous N-myristoylated proteins behaved in a similar manner. We observed a significant reduction in the steady-state levels of a panel of example substrates in NMT1/2 mutant cells, which was abrogated upon concurrent ablation of ZYG11B and ZER1 (Fig. 7F). However, unlike the complete or near-complete stabilization that we observed using the peptide-GFP fusion constructs (Fig. 7E), here combined mutation of ZYG11B and ZER1 only resulted in partial restabilization. Thus, in the context of full-length proteins, multiple degrons in addition to N-terminal glycine may be exposed after a failure of N-myristoylation, rendering them substrates for additional E3 ligases. Altogether, these data demonstrate a physiological role for ZYG11B and ZER1 in the surveillance of myristoylated proteins: Successful N-myristoylation shields proteins from degradation, but a failure to undergo N-myristoylation results in the exposure of N-terminal glycine degrons and CRL2-mediated degradation (Fig. 7G).

Discussion

We exploited GPS technology to directly examine the contribution of N-terminal sequences to protein stability across the human proteome. Unexpectedly, in addition to targeting abnormal proteins that lack an initiator methionine, we discovered that UBR-family E3 ligases also targeted proteins with a native N terminus in which an arginine or lysine residue follows an intact initiator methionine. We also found that cysteine exposed at the N terminus of GFP conferred instability in a UBR-dependent manner. Nitric oxide–mediated oxidation of N-terminal cysteine renders it a substrate for arginylation by ATE1 and hence UBR-mediated degradation (16). However, here substrates that bear N-terminal cysteine were not stabilized to the same extent in ATE1 mutant cells as in cells that lack UBR proteins; thus, if UBR proteins do not directly bind N-terminal cysteine, an ATE1-independent pathway must exist that permits this class of degrons to be recognized by UBR E3 ligases.

Most notably, we uncovered an additional N-degron pathway centered on N-terminal glycine. There are intriguing mechanistic similarities between the ZYG11B- and ZER1-mediated recognition of N-terminal glycine degrons and the KLHDC2-, KLHDC3-, and KLHDC10-mediated recognition of C-terminal glycine degrons (13), with both processes involving multiple related members of CRL2 substrate adaptor families. Like the Kelch repeats found in the KLHDC family proteins, the leucine-rich repeats and the armadillo-like repeats present in the ZYG11 family adaptors also have the propensity to form solenoid structures (33), raising the possibility of a common structural mode through which terminal glycine residues are engaged (34). Furthermore, like their C-terminal counterparts, the ZYG11 family of substrate adaptors have also shaped the proteome, with N-terminal glycine degrons being broadly avoided across metazoa.

Our data suggests two contexts in which the targeting of N-terminal glycine degrons may play an important physiological role. N-myristoylation is a posttranslational modification that regulates the membrane localization and other properties of several hundred human proteins (29), a group that comprises notable members including Arf family GTPases, G protein α subunits, and Src family tyrosine kinases (31). We propose a model in which a failure of N-myristoylation conditionally exposes N-terminal glycine degrons to ZYG11B and ZER1, which are normally occluded upon successful modification. Further work will be required to ascertain whether other classes of terminal degrons function in analogous quality control pathways to ensure the efficient deposition of posttranslational modifications.

Furthermore, the strong enrichment for favored ZYG11B/ZER1 glycine degrons at the N-termini of known caspase cleavage products suggested a potential role for these CRL2 complexes during apoptosis, and we confirmed experimentally that many caspase cleavage events would generate substrates efficiently degraded by Cul2ZYG11B and Cul2ZER1. Following glycine, the next most commonly generated N-terminal residue after caspase cleavage is serine, accounting for ~28% of annotated caspase sites. Intriguingly, in complete contrast to glycine, serine is the most stabilizing residue when exposed at the N terminus. Our caspase cleavage product GPS screen showed that fragments bearing N-terminal serine were generally extremely stable (fig. S14C). This may be useful where caspases need to activate a target, such as in the case of ataxia-telangiectasia mutated (ATM), whose C-terminal cleavage product acts in a dominant-negative manner to prevent DNA repair during apoptosis (35), or RAD21, whose C-terminal cleavage product acts as a pro-apoptotic factor (36).

The importance of this glycine-specific N-degron pathway to human biology is underscored by the frequency of heterozygous loss-of-function mutations in humans for both ZYG11B and ZER1 being far lower than would be predicted. ZYG11B and ZER1 both have a pLi value of 1 in the ExAC database (37), indicating that loss-of-function variants are strongly selected against in the heterozygous state, demonstrating potent haploinsufficiency and counter selection in humans. Misregulation of Src-family tyrosine kinases could be deleterious to development. In Caenorhabditis elegans, the ZYG11 ortholog is required for the metaphase-to-anaphase transition and M phase exit at meiosis II (20, 38, 39). In humans, ZYG11B and ZER1 are both expressed in the testes and ovaries, and hence a similar role in the regulation of meiosis could also explain the strong selection against loss-of-function mutations. Altogether, the comprehensive analysis of N-terminal degrons presented here has illuminated multiple aspects of N-degron proteolytic pathways and revealed that a family of E3 ligases specific for N-terminal glycine has shaped the human proteome.

Materials and methods

Cell Culture

HEK-293T (ATCC® CRL-3216) cells were grown in Dulbecco’s Modified Eagle’s Medium (DMEM) (Life Technologies) supplemented with 10% fetal bovine serum (HyClone) and penicillin/streptomycin (Thermo Fisher Scientific).

Transfection and lentivirus production

Lentivirus was generated through the transfection of HEK-293T cells using PolyJet In Vitro DNA Transfection Reagent (SignaGen Laboratories). Cells seeded at approximately 80% confluency were transfected as recommended by the manufacturer with the lentiviral transfer vector plus four plasmids encoding Gag-Pol, Rev, Tat and VSV-G. The media was changed 24 hours post-transfection and lentiviral supernatants collected a further 24 hours later. Cell debris was removed by centrifugation (800x g, 5 min) and virus was stored in single-use aliquots at -80°C. Transduction of target cells was achieved by adding the virus in the presence of 8 μg/ml hexadimethrine bromide (Polybrene, Sigma-Aldrich).

Inhibitors

The proteasome inhibitor Bortezomib was obtained from APExBio and the pan-CRL inhibitor MLN4924 was obtained from Active Biochem; both were used at a final concentration of 1 μM. The NMT1/2 inhibitor IMP-1088 was purchased from Cayman Chemical and was used at a final concentration of 1 μM for 24 hours.

Antibodies

Primary antibodies used in this study were: rabbit anti-UBR1 (Bethyl, A302-988A), rabbit anti-UBR2 (Bethyl, A305-416A), rabbit anti-UBR4 (Bethyl, A302-278A), rabbit anti-GFP (Abcam, ab290), rabbit anti-FOXJ3 (Bethyl, A303-107A), rabbit anti-ALKBH1 (Abcam, ab195376), rabbit anti-CHMP3 (Bethyl, A305-397A), mouse anti-vinculin (Sigma, V9131), rabbit anti-Fyn (Cell Signaling, 4023T), rabbit anti-LAMTOR (Cell Signaling, 8975T), rabbit anti-Yes (Cell Signaling, 3201S), rabbit anti-Lyn (Bethyl, A302-683A-T), rabbit anti-NDUFAF4 (ABclonal, A14345), rabbit anti-Src (Cell Signaling, 2123T) and rabbit anti-GAPDH (Cell Signaling, 5174). The HA and FLAG epitope tags were detected using rat anti-HA peroxidase (Sigma-Aldrich, 12013819001) and rabbit anti-FLAG peroxidase (Cell Signaling, #2044S), respectively. HRP-conjugated goat anti-mouse IgG and goat anti-rabbit IgG secondary antibodies were obtained from Jackson ImmunoResearch (#111-035-003).

Plasmids

Lentiviral vectors encoding dominant-negative Cullin constructs were a generous gift from W. Harper. For exogenous expression of CRL2 substrate adaptors, the pHRSIN-PSFFV-GFP-WPRE-PPGK-Hygro vector was used (a gift from P. Lehner), with the constructs cloned in place of GFP using the Gibson assembly method (NEBuilder HiFi Cloning Kit). Plasmids encoding ZYG11A and ZYG11B were obtained from Addgene [plasmids #110550 and #110551, deposited by E. Kipreos (40)], while an entry vector encoding ZER1 was obtained from the Ultimate ORF Clone collection (Thermo Fisher Scientific). A plasmid encoding TEV protease was also obtained from Addgene [plasmid #64276, deposited by X. Shu (27)].

For individual CRISPR/Cas9-mediated gene disruption experiments, the lentiCRISPR v2 vector was used (Addgene #52961, deposited by Feng Zhang). Oligonucleotides encoding the top and bottom strands of the sgRNAs were synthesized (IDT), annealed and phosphorylated (T4 PNK; NEB) and cloned into the lentiCRISPR v2 vector as described (41). Nucleotide sequences of the sgRNAs used were:

sg-AAVS1: GGGGCCACTAGGGACAGGAT

sg1-UBR1: GTGAGAGGATGGAAATCAGCG

sg2-UBR1: GATTCTAACTTGTGGACCGAA

sg1-UBR2: GAGGAGGAGAGAAGATGGCGT

sg2-UBR2: GTACCCAAAATCTACTGCAG

sg1-UBR4: GCCTCTCGAAGATGAACACCG

sg2-UBR4: GCTGACCCCTGGACAGACAG

sg-ATE1: GTATCAGGATCTCATAGACCG

sg1-ZYG11B: GCGCTCGTAAGGATCCTCGA

sg2-ZYG11B: GAAGCTCGAAGGCCAGAAAGC

sg1-ZER1: GTATGAGGAGGAGAACCCAGG

sg2-ZER1: GCCGCAGCAGGGACTCCACA

sg1-NMT1: GCAGGGTGTAGGGCTCCTGG

sg2-NMT1: GAAGCTCTACCGACTGCCAG

sg1-NMT2: GATAGACGGGGACAATGAGG

sg2-NMT2: GGACACGTGCGGGATAGACG

Flow cytometry

Analysis of HEK-293T cells by flow cytometry was performed on a BD LSRII instrument (Becton Dickinson) and the resulting data was analyzed using FlowJo. Cell sorting was performed on a MoFlo Astrios (Beckman Coulter).

Generation of Ub-GPS libraries

Human protein coding sequences were downloaded from the Gencode database (release 27). The first 72 nucleotides of entries that (i) started with a methionine residue, (ii) had a transcript_support_level equal to 1 or 2 and (iii) were common to both the Ensembl and Havana databases were included in the oligonucleotide design. After removal of identical 72-nucleotide sequences, the final N-terminome library consisted of a total of 24,638 sequences. The oligonucleotide pool was synthesized by Agilent Technologies and amplified by PCR (Q5 Hot Start High-Fidelity DNA Polymerase, NEB). The PCR product was then cloned into the Ub-GPS vector between a unique SalI site engineered into the 3′ end of the ubiquitin gene and a unique NdeI site at the 5′end of GFP using the Gibson assembly method (NEBuilder HiFi Cloning Kit), such that the resulting vector encoded the peptides immediately downstream of ubiquitin, followed by a short linker (ATSALGT) and GFP (commencing SKGEEL-). At least 100-fold representation of the library was maintained at each step.

Ub-GPS libraries for saturation mutagenesis were generated in an identical manner. For each peptide selected for analysis, each amino acid encoded at position 2 through to position 10 was mutated to all other possible 19 amino acids. For each peptide 9 reference sequences were also synthesized, in which the same wild-type amino acid sequence was encoded by different nucleotide sequences.

Two databases were used to collate cleavage products for the caspase cleavage site Ub-GPS library: Degrabase (24) and PROSPER (25). All annotated cleavage sites in human proteins occurring after aspartic acid in each databases were included for oligonucleotide design, which, after removal of duplicates, resulted in a total of 2,234 sequences. Amino acid sequence were converted into nucleotide sequences using the following codons:

A: GCC, C: TGC, D: GAC, E: GAG, F: TTC, G: GGC, H: CAC, I: ATC, K: AAG, L: CTG, M: ATG, N: AAC, P: CCC, Q: CAG, R: AGA, S: TCC, T: ACC, V: GTG, W: TGG, and Y: TAC. The oligonucleotide pool was synthesized by Agilent Technologies and amplified and cloned into the Ub-GPS vector as above.

Ub-GPS screens

GPS plasmid libraries were packaged into lentiviral particles which were used to transduce HEK-293T cells at a multiplicity of infection of ~0.2 (achieving approximately 20% DsRed+ cells) and at sufficient scale to achieve ~500-fold coverage of the library (a total of ~12 million transduced cells in the case of the human N-terminome library). Puromycin (1.5 μg/ml) was added two days post-transduction to eliminate untransduced cells. Surviving cells were pooled, expanded and then partitioned by FACS into six bins 7 days post-transduction based on the GFP/DsRed ratio. Genomic DNA was extracted from each of the pools separately (Gentra Puregene Cell Kit, Qiagen) and the fusion peptides amplified by PCR (Q5 Hot Start Polymerase, NEB) using a forward primer annealing to the end of the ubiquitin gene and a reverse primer annealing to the front of GFP; sufficient reactions were performed to amplify a total mass of DNA equivalent to the mass of genomic DNA from cells representing 500-fold coverage of the library. All PCR products were pooled, and one-tenth of the mix was purified using a spin column (Qiagen PCR purification kit). Finally, 200 ng of the purified PCR product was used as the template for a second PCR reaction using primers to add the Illumina P5 sequence and a 7 bp ‘stagger’ region to the 5′ end, and Illumina indexes and P7 sequence at the 3′ end. Samples to be multiplexed were then pooled, purified on an agarose gel (QIAEXII Gel Extraction Kit, Qiagen) and sequenced on an Illumina NextSeq instrument.

CRISPR screens

A custom sgRNA library was designed targeting 43 E2 enzymes, 11 core CRL components, and 109 CRL2/5 adaptors at a depth of 6 sgRNAs per gene. The sgRNA sequences together with flanking BbsI restriction enzyme recognition sites were synthesized by Twist Bioscience. The oligonucleotide pool was amplified by PCR (Q5 Hot Start Polymerase, NEB) and the product purified (Qiagen PCR purification kit) and digested with BbsI (NEB). The digested product was concentrated by ethanol precipitation and then visualized on a 10% TBE PAGE gel (Thermo Fisher Scientific) stained with SYBR Gold (Thermo Fisher Scientific). DNA was isolated from the 28 bp band using the “crush-and-soak” method, concentrated by ethanol precipitation, and then cloned into lentiCRISPR v2 (Addgene #52961) digested with BsmBI (NEB).

The sgRNA library DNA was packaged into lentiviral particles. HEK-293T cells stably expressing unstable peptide-GFP fusion proteins were transduced at a multiplicity of infection of ~0.3 at sufficient scale to maintain at least 1000-fold representation of the library. Untransduced cells were eliminated through puromycin selection commencing two days post-transduction. The top ~5% of the surviving cells based on the GFP/DsRed ratio were isolated by FACS, which was performed 7 days post-transduction. For each screen genomic DNA was extracted from both the sorted cells and the unselected library as a reference. The sgRNAs in both pools were amplified by PCR and sequenced on an Illumina NextSeq instrument.

Immunoprecipitation and immunoblotting

HEK-293T cells stably expressing epitope-tagged CRL2 substrate adaptors and peptide-GFP fusions were grown in 10 cm plates. Following treatment with Bortezomib (1 μM, 5 hours), cells were lysed in ice-cold lysis buffer (50 mM Tris, 100 mM NaCl, 0.5% NP-40, pH 7.5 supplemented with EDTA-free protease inhibitor tablet and Phos-Stop phosphatase inhibitor tablet (Roche)) for 30 min on ice. Nuclei were pelleted by centrifugation (14,000x g, 10 min, 4°C). Beads coated with anti-HA (Pierce anti-HA magnetic beads, Thermo Fisher Scientific) or anti-FLAG (anti-FLAG M2 magnetic beads, Sigma-Aldrich) antibodies were added to the supernatants and incubated with rotation overnight at 4°C. The beads were then washed three times with lysis buffer before bound proteins were eluted upon incubation with SDS-PAGE sample buffer (95°C, 10 min). Proteins were subsequently resolved by SDS-PAGE (NuPAGE Bis-Tris gels, Thermo Fisher Scientific) and transferred to a nitrocellulose membrane (Trans-Blot Turbo System, Bio-Rad) which was then blocked in 10% nonfat dry milk in PBS + 0.1% Tween-20 (PBS-T). The membrane was incubated with primary antibody overnight at 4°C, and then, following three washes with PBS-T, HRP-conjugated secondary antibody was added for 1 hour at room temperature. Following a further three washes in PBS-T, reactive bands were visualized using Western Lightning Plus ECL (Perkin Elmer) and HyBlot CL film (Denville Scientific).

Mass spectrometry

UBR KO clone 2 cells stably expressing peptide-GFP fusions growing in 15 cm plates were lysed as described above, and immunoprecipitation performed in a similar way using GFP-Trap_MA magnetic agarose beads (Chromotek). Elution of the peptide- GFP fusion proteins was achieved by treatment with 2 M glycine for 1 min, followed by neutralization with 1 M Tris base, pH 10.4. Eluted proteins were reduced using DTT (Thermo Fisher) and alkylated with iodoacetamide (Sigma). Following TCA precipitation (Sigma), proteins were digested with Glu-C (Thermo Fisher) then cleaned up on C-18 stage tips (3M).

Mass spectrometry data were collected using a Q Exactive mass spectrometer (Thermo Fisher) coupled with a Famos Autosampler (LC Packings) and an Accela600 liquid chromatography pump (Thermo Fisher). Peptides were separated on a 100 μm inner diameter microcapillary column packed with ∼25 cm of Accucore C18 resin (2.6 μm, 150 Å, Thermo Fisher). Peptides were separated using a 120 gradient of 5 to 25% acetonitrile in 0.125% formic acid at a flow rate of ∼300 nl/min. The scan sequence began with an Orbitrap MS1 spectrum with resolution 70,000, scan range 300−1500 Th, automatic gain control (AGC) target 1 × 105, maximum injection time 250 ms, and centroided data type. The top twenty precursors were selected for MS2 analysis which consisted of HCD (high-energy collision dissociation) with the following parameters: resolution 17,500, AGC 1 × 105, maximum injection time 100 ms, isolation window 1.6 Th, normalized collision energy (NCE) 27, and centroid spectrum data type. Unassigned charge states were excluded from MS2 analysis, but singly charged species were included. Dynamic exclusion was set to automatic. Mass spectra were processed using a Sequest-based in-house software pipeline.

Bioinformatics

N-terminome Ub-GPS screen

Raw Illumina reads derived from each GPS bin were first trimmed of constant sequences derived from the Ub-GPS vector backbone using Cutadapt (42). Resulting 72 nt reads were mapped to the reference input library using Bowtie 2 (43), and count tables were generated from reads that aligned perfectly to the reference sequence. Following correction for sequencing depth, the protein stability index (PSI) metric was calculated for each peptide-GFP fusion. The PSI score is given by the sum of multiplying the proportion of reads in each bin by the bin number (1-6 in this case), thus yielding a stability score between 1 (maximally unstable) and 6 (maximally unstable): PSI=i=16Ri*i (where i represents the number of the bin and Ri represents the proportion of Illumina reads present for a peptide in that given bin i). Read counts and associated stability score for each peptide-GFP fusion are detailed in data file S1.

Prediction of destabilizing N-terminal motifs

The stability data derived from the Ub-GPS N-terminome screen was used to identify potential destabilizing N-terminal degron motifs (Fig. 2, C and D). Varying exactly two residues at a time between position 2 and position 7, for all possible combinations of di-peptide motifs (allowing gaps) the mean PSI of all peptides containing that motif at the front of the peptide (that is, immediately following the initiator methionine) was compared to the mean PSI of all peptides containing the same motif at any internal location within the 24-nucleotide oligomer peptide. The full data for all possible N-terminal motifs is tabulated in data file S2.

N-terminome Ub-GPS screen in different genetic backgrounds

The Ub-GPS N-terminome screens with the UBR mutant clones and the ZYG11B and ZER1 mutant cells were performed in a similar manner as above, except that only the half of the N-terminome library comprising peptides bearing an initiator methionine was used. Read counts for all peptide-GFP fusions are detailed in data file S3A. Subsequently, comparisons between wild-type cells and combined ZYG11B/ZER1 mutant cells (data file S3B) and between wild-type, control (AAVS1) mutant and three NMT1/2 mutant clones (data file S3C) were performed in a similar way. In each case, a ΔPSI score was generated for each peptide-GFP fusion reflecting the difference in raw PSI scores between the wild-type or control mutant cells and the experimental mutant cells. For the plots shown in Fig. 3 and fig. S4, peptide-GFP fusions were defined as UBR substrates if they were stabilized >0.8 PSI units (Fig. 3E) or >0.6 PSI units (fig. S4, A to D) in the UBR KO clone compared to control cells, but also not stabilized >0.3 PSI units in either ZYG11B or ZER1 mutant cells. The logoplots shown in fig. S4, E to H were generated with iceLogo (44) and rendered using Seq2Logo (45): the “experimental set” comprised all peptides starting with the indicated motif that were identified as a UBR substrate in any of the three UBR KO clones, the “reference set” comprised all peptides in the N-terminome library starting with that same motif, and the “percentage difference” scoring system was used. Residues significantly enriched at P < 0.05 are displayed. For the heatmap shown in Fig. 7D, peptide-GFP fusions destabilized >0.5 PSI units in all three NMT1/2 mutant clones were included for analysis.

Saturation mutagenesis Ub-GPS screens

The heatmaps displayed in Figs. 3, G to I, and 4B and figs. S5, A to C, and S6 illustrate the difference between the PSI for each individual mutant peptide and the median PSI of all the unmutated peptides; the darker the red color, the greater the stabilizing effect of the mutation. For the heatmaps shown in Fig. 5, A to C, and fig. S13 the color scales indicate the raw PSI stability measurement, which lies between 1 (maximally unstable; dark blue) and 6 (maximally stable; dark red); the exception is the comparison between ZYG11B mutant and ZER1 mutant cells (right columns) where a ΔPSI score reflecting the difference between raw PSI score in ZYG11B mutant cells and ZER1 mutant cells for each peptide is depicted. The full data for all mutant peptides in all genetic backgrounds is detailed in data file S4, A to C.

CRISPR screens

Constant regions derived from the backbone of the lentiCRISPR v2 expression vector were removed from Illumina reads using Cutadapt, and count tables were generated from the remaining variable portion of the sgRNA sequences using Bowtie 2. The Model-based Analysis of Genome-wide CRISPR/Cas9 Knockout (MAGeCK) algorithm (46) was used to rank the performance of individual genes targeted by multiple sgRNAs enriched in the selected cells versus the unsorted populations. The full MAGeCK output for each screen is detailed in data file S5. For the scatterplots shown in Fig. 4E and fig. S8B, the MAGeCK score plotted on the y axis is calculated as the negative log10 of the “pos|score” value generated by MAGeCK.

Proteome composition analysis

Canonical protein sequences were downloaded from the Swiss-Prot database. For each position between position 2 and position 10 at the N terminus of the proteins the total abundance of each amino acid was quantified, expressed as a proportion of the total number of protein sequences, and then normalized to the mean proportion observed across the 9 N-terminal residues between position 2 and position 10. For the analysis of N-terminal glycine degrons shown in Fig. 5, F and G, we further categorized glycine residues as either “favored” for CRL2-mediated degradation through ZYG11B and ZER1 if they were followed by F, G, H, L, M or Y, or “disfavored” if followed by D, E, I, N, P, R, S or T. The mean abundance of Gfavored and Gdisfavored across the 8 N-terminal residues between position 3 and position 10 was then compared to their abundance at position 2. For the analysis of human caspase cleavage sites presented in Fig. 6A, all unique cleavage sites occurring downstream of aspartic acid (D) annotated in Degrabase 1.0 (24) were analyzed using iceLogo. In Fig. 6B, all unique cleavage sites occurring downstream of aspartic acid (D) and upstream of glycine (G) were analyzed; the frequency of Gfavored and Gdisfavored at the N terminus of these caspase cleavage products was compared to the frequency of Gfavored and Gdisfavored at all glycine residues in the human proteome.

Caspase cleavage site Ub-GPS screen

The Ub-GPS screen with peptides derived from human caspase cleavage sites was analyzed in the same way as above, generating a ΔPSI score for each peptide reflecting the difference in raw PSI scores between either control (AAVS1) mutant cells or combined ZYG11B/ZER1 double mutant cells versus wild-type cells (data file S6). For the heatmap shown in Fig. 6D, peptide-GFP fusions stabilized >0.5 PSI units in both ZYG11B/ZER1 double mutant cell lines but <0.25 PSI units in control knockout cells were included; the intensity of the colors represent the depletion (blue) or enrichment (red) of each amino acid comparing this pool of ZYG11B/ZER1 substrates to all peptides detected in the caspase cleavage site library.

Supplementary Materials

science.sciencemag.org/content/365/6448/eaaw4912/suppl/DC1

Figs. S1 to S15

Data Files S1 to S6

Reference (47)

References and Notes

Acknowledgments: We are grateful to C. Araneo and his team for FACS and J. Paulo for mass spectrometry. We thank A. Varshavsky, J. Wells, and E. Tate for advice. Funding: R.T.T. is a Sir Henry Wellcome Postdoctoral Fellow (201387/Z/16/Z). Z.Z. is a Croucher Foundation Honorary Ph.D. Scholar. This work was supported by an NIH grant (AG11085) to S.J.E. and J.W.H.; S.J.E. is an investigator with the Howard Hughes Medical Institute. Author contributions: Conceptualization, R.T.T., I.K., and S.J.E.; investigation, R.T.T., I.K., D.Y.R., Z.Z.; writing, R.T.T., I.K., and S.J.E; supervision, J.W.H. and S.J.E. Competing interests: The authors declare no competing interests. Data and materials availability: All data are available in the main text or the supplementary materials.
View Abstract

Stay Connected to Science

Navigate This Article