Tight Regulation of Unstructured Proteins: From Transcript Synthesis to Protein Degradation

See allHide authors and affiliations

Science  28 Nov 2008:
Vol. 322, Issue 5906, pp. 1365-1368
DOI: 10.1126/science.1163581


Altered abundance of several intrinsically unstructured proteins (IUPs) has been associated with perturbed cellular signaling that may lead to pathological conditions such as cancer. Therefore, it is important to understand how cells precisely regulate the availability of IUPs. We observed that regulation of transcript clearance, proteolytic degradation, and translational rate contribute to controlling the abundance of IUPs, some of which are present in low amounts and for short periods of time. Abundant phosphorylation and low stochasticity in transcription and translation indicate that the availability of IUPs can be finely tuned. Fidelity in signaling may require that most IUPs be available in appropriate amounts and not present longer than needed.

Up to one third of all eukaryotic proteins have large segments that are unstructured and are commonly referred to as intrinsically IUPs. These proteins lack a unique structure, either entirely or in parts, when alone in solution (1). The lack of structure is thought to provide several advantages, such as (i) an increased interaction surface area, (ii) conformational flexibility to interact with several targets, (iii) the presence of molecular recognition elements that fold upon binding, (iv) accessible posttranslational modification sites, and (v) the availability of short linear interaction motifs (25). These and other properties are ideal for proteins that mediate signaling and coordinate regulatory events, and indeed proteins that participate in regulatory and signaling functions are enriched in unstructured segments (68) [supporting online material (SOM) text S1]. Because of their unusual structural and important functional properties, the presence of IUPs in a cell may need to be carefully monitored. In fact, altered abundance of IUPs is associated with several disease conditions. For instance, overexpression of thyroid cancer 1 (TC-1) (9) or underexpression of adenosine 5′-diphosphate (ADP) ribosylation factor (Arf) (10) and p27 (11) has been linked with various types of cancer. Similarly, overexpression of α-synuclein and tau proteins increases the risk of aggregate formation and has been linked to Parkinson's disease and Alzheimer's disease (12, 13). We therefore tested whether specific control mechanisms affect the availability of IUPs (that is, their abundance and residence time) within a cell.

Using the Disopred2 software (6), which reports unstructured residues based on the protein sequence, we computed the fraction of the polypeptide that is predicted to be unstructured for every protein in Saccharomyces cerevisiae (14). This allowed us to categorize 1971 sequences as highly structured (“S”; 0 to 10% of the total length is unstructured), 2711 sequences as moderately unstructured (“M”; 10 to 30% of the protein is unstructured), and 2020 sequences as highly unstructured (“U”; 30 to 100% of the protein is unstructured) (Fig. 1). This information was integrated with different genome-scale datasets that describe most of the regulatory steps that influence protein synthesis or degradation (table S1 and fig. S1), and we compared the distributions of the values for the proteins in the U and S groups with statistical tests (14).

Fig. 1.

The S. cerevisiae proteome was grouped into three categories, highly structured (S), moderately unstructured (M), and highly unstructured (U), based on the fraction of unstructured residues over the entire protein length.

We compared the transcription of genes encoding highly unstructured proteins with that of genes encoding more structured proteins. Because the steady-state amount of mRNA could be affected by the rate at which the transcripts are produced or degraded, we investigated whether the transcriptional rate or the degradation rate were different for the transcripts that encode highly structured and unstructured proteins. The number of transcription factors (TFs) that regulate a gene was comparable between the two groups (P = 0.55, Wilcoxon test) (Fig. 2A). However, mRNAs encoding highly unstructured proteins were generally less abundant than transcripts encoding more structured proteins (P = 1 × 10–6, Wilcoxon test) (Fig. 2B). The mRNA half-lives of the transcripts that encode highly unstructured proteins were lower than transcripts that encode more structured proteins (P< 10–16, Wilcoxon test) (Fig. 2C) and a comparison of the distribution of transcriptional rates showed that the difference between the two groups was less significant (P = 1 × 10–8, Wilcoxon test) (table S3). Thus, differences in decay rates appear to be a major factor leading to differences in mRNA abundance (SOM text S5).

Fig. 2.

Box-plot of the distribution and histogram of values for various regulatory and cellular properties for the three different groups of proteins (S, M, and U) in S. cerevisiae. Box-plot identifies the middle 50% of the data, the median, and the extreme points. The entire set of data points is divided into quartiles and the inter-quartile range (IQR) is calculated as the difference between ×0.75 and ×0.25. The range of the 25% of the data points above (×0.75) and below (×0.25) the median (×0.50) is displayed as a filled box. The horizontal line and the notch represent the median and confidence intervals, respectively. Data points greater or less than 1.5·IQR represent outliers and are shown as dots. The horizontal line that is connected by dashed lines above and below the filled box (whiskers) represents the largest and the smallest nonoutlier data points, respectively (see also tables S3 and S4). (A) Transcriptional complexity, (B) mRNA abundance, (C) mRNA half-life, (D) poly(A) tail length, (E) ribosomal density, (F) protein abundance, (G) protein half-life, (H) PEST-sequence content, (I) TATA box content, and (J) noise in protein production. A plus sign denotes statistically significant differences between the S and U groups (table S3). ORF, open reading frame; CV, coefficient of variation.

We analyzed polyadenylate [poly(A)] tail length because the two major pathways of mRNA decay are initiated by removal of the poly(A) tail. A significantly larger fraction of the unstructured proteins had a short poly(A) tail (table S1) than did structured proteins (P <10–16, Fisher's exact test) (Fig. 2D). Analysis of transcript binding by Puf family RNA-binding proteins, which affect transcript stability, showed that Puf5p binding was enriched for transcripts that encode highly unstructured proteins. In fact, 108 of the 224 transcripts bound by Puf5p encode highly unstructured proteins, a much greater number than expected by chance, which was 68 transcripts (z score = 5.3; P = 5 × 10–10) (14). Thus, poly(A) tail length and interaction with specific RNA-binding proteins may modulate the stability of transcripts encoding IUPs (SOM text S5).

Unstructured proteins tend to be less abundant than structured proteins (P< 10–16; Wilcoxon test) (Fig. 2F, fig. S2, and SOM text S2 and S5). The rate of protein synthesis was significantly lower (P <10–16, Wilcoxon test) (Fig. 2E) and protein half-life was shorter (P = 1 × 10–15, Wilcoxon test) (Fig. 2G and SOM text S5) for highly unstructured than for more structured proteins. Two pathways that mediate ubquitin proteasome–dependent degradation are the N-end–rule pathway and PEST–mediated degradation pathway. Although the distribution of N-end residues was not significantly different (SOM text S3 and figs. S3 and S4), a significantly greater fraction of the unstructured proteins contained PEST motifs (regions rich in proline, glutamic acid, serine, and threonine) (P <10–16, Fisher's exact test) (Fig. 2H and SOM text S5) as previously observed (1, 15). Therefore, it appears that the availability of many IUPs is regulated via proteolytic degradation and a reduced translational rate.

For certain IUPs (for example, p27), post-translational modifications such as phosphorylation (11, 16) can affect their abundance or half-life in a cell. In fact, recent computational studies using phosphorylation site–prediction methods have suggested that unstructured regions are enriched for sites that can be posttranslationally modified (17). We analyzed the experimentally determined yeast kinase-substrate network and found that highly unstructured proteins are on average substrates of twice as many kinases as are structured proteins (P = 1 × 10–12, Wilcoxon test) (SOM text S4 and fig. S5). On average, 51 ± 19% (SD) of all substrates of the kinases are highly unstructured, whereas only 19 ± 13% (SD) are highly structured [the remaining 30 ± 14%(SD) of all substrates are moderately unstructured]. This is a significant bias as compared with the expected genomewide distribution of ∼30% highly unstructured and ∼30% structured proteins based on our categorization (P< 10–16, Fisher's exact test) (14). We found that 85% of the kinases for which more than 50% of their substrates are highly unstructured (table S2) are either regulated in a cell cycle–dependent manner (for example, Cdc28) or activated upon exposure to particular stimuli (for example, Fus3) or stress (for example, Atg1). This suggests that posttranslational modification of IUPs through phosphorylation may be an important mechanism in fine-tuning their function and possibly their availability under different conditions.

We investigated whether genes that encode highly unstructured proteins display low stochasticity in their expression levels among individuals in a population of genetically homogeneous cells. An important source of such stochasticity in cellular systems is random noise in transcription and translation, which results in very different amounts of transcripts and protein products. We used the presence of a TATA box in the promoter region to infer genes that might be more subjected to noise in gene expression (18) and found that genes encoding highly unstructured proteins are less likely to have a TATA box than those that encode structured proteins (P = 8 × 10–7, Fisher's exact test) (Fig. 2I). At the protein level, analysis of direct experimental data revealed that unstructured proteins have lower noise levels than do structured proteins (P = 3 × 10–11, Wilcoxon test) (Fig. 2J). These results indicate that highly unstructured proteins may display less noise in transcription and translation than more structured proteins.

To assess regulation of IUPs in other organisms, we analyzed datasets (table S1) for Schizosaccharomyces pombe and Homo sapiens. Similar trends to those observed for S. cerevisiae were evident in these organisms (Fig. 3 and tables S3 and S4). Thus, both unicellular and multicellular organisms appear to regulate the availability of IUPs. The observed differences between structured and unstructured proteins were independent of the IUP prediction method used, protein length, localization within the major subcellular compartments, different grouping of proteins, or the number of interaction partners per protein (tables S6 to S11). Though the distributions of the values for the proteins in the structured and unstructured groups are broad and overlap, the differences reported here are consistently statistically significant with two independent statistical tests, the Wilcoxon rank-sum test and the Kolmogorov-Smirnov test (tables S3 to S5) (14). Thus, the reported trends appear to be attributable to the intrinsically unstructured nature of the proteins. Of all the IUPs, those that contain polyglutamine [poly(Q)] or polyasparagine [poly(N)] stretches seem to be more tightly controlled (table S3 and S4).

Fig. 3.

Box-plot of the distribution and histogram of values for various regulatory and cellular parameters for the three groups of proteins (S, M, and U) in S. pombe (A to D) and humans (E to G). All reported differences are statistically significant (table S4). A.U., arbitrary units; mya, million years ago.

Proteins with unstructured regions predominantly have signaling or regulatory roles and are often reused in multiple pathways to produce different physiological outcomes (2, 68). Accordingly, increased abundance of IUPs can result in undesirable interactions (for example, titration of unrelated proteins by inappropriate interaction through exposed peptide motifs), thereby disturbing the fine balance of the signaling networks leading to dampened or inappropriate activities (19). Spatial and temporal segregation of signaling proteins as well as an increased signaling complexity may contribute to fidelity in regulation (SOM text S5 and fig. S6). In addition, tight regulation of signaling and regulatory IUPs could minimize the potentially harmful effects of ectopic interactions and may provide robustness to signaling processes by ensuring that such proteins are present in appropriate amounts and time periods. Indeed, free IκB [an IUP that interacts with and inhibits the nuclear factor κB (NF-κB) transcription factor] must be continuously degraded to allow for rapid and robust NF-κB activation and to make the pathway sensitive for a signaling input (20, 21). In contrast, stabilization of free IκB by removal of the PEST motif reduces NF-κB activation (20, 21). In mammalian cells exposed to mild endoplasmic reticulum stress, survival is favored as a direct consequence of the intrinsic instability of mRNAs and of proteins that drive apoptosis (such as Chop, an IUP) as compared with that of proteins that promote adaptation and cell longevity (for example, the chaperone protein BiP, a structured protein) (22).

Although the abundance of many IUPs is strictly controlled, certain IUPs are present in cells in large amounts or for long periods of time. In fact, in some cases (for example, the fibrous muscle protein titin), large amounts may be required throughout the lifetime of a cell. Fine-tuning of IUP availability can be achieved with posttranslational modifications and interactions with other factors (1, 3, 11, 16). Both mechanisms can promote increased abundance or longer half-life through changes in cellular localization or by protection from the degradation machinery (for example, certain phosphorylated forms of the cyclin-dependent kinase inhibitory protein p27kip1 and the spinocerebellar ataxia type 1 protein ataxin-1) (23, 24). Although association with other proteins may increase their stability, free IUPs are likely to be rapidly degraded by the 20S proteasome via degradation by default (25), as shown for the unbound forms of p21cip1 (26), p27kip1 (27), α-synuclein (28), and tau (29). Certain posttranslational modifications may promote regulated degradation (for example, that of p27Kip1) (11, 16). In this context, our finding that many IUPs tend to be phosphorylated by multiple kinases and display low noise in transcription and translation suggests that their abundance and half-lives may be finely tuned in cells (SOM text S5).

Our studies reveal an evolutionarily conserved tight control of synthesis and clearance of most IUPs. The discovery was made possible by integrating multiple large-scale datasets that describe control mechanisms during transcription, translation, and post-translational modification with structural information on proteins. Besides the elucidation of general trends, the framework describing multiple levels of regulation introduced here may facilitate investigation of how individual IUPs are fine-tuned in different cell types and how perturbations to this tight control might influence disease conditions (30).

Supporting Online Material

Materials and Methods

SOM Text S1 to S5

Figs. S1 to S6

Tables S1 to S12


References and Notes

View Abstract

Stay Connected to Science

Navigate This Article