Pervasive degeneracy and epistasis in a protein-protein interface

Science  06 Feb 2015:
Vol. 347, Issue 6222, pp. 673-677
DOI: 10.1126/science.1257360

Exploring the limits of protein sequence space

Exploring the variability of individual functional proteins is complicated by the vast number of combinations of possible amino acid sequences. Podgornaia and Laub take on this challenge by analyzing four amino acids critical for the interaction between two signaling proteins in Escherichia coli. They build all the possible 160,000 variants of one of the two proteins and find that over 1650 are functional. Even though there can be very high variability in the composition of the interface between the two proteins, there are nonetheless strong context-dependent constraints for some amino acids, which suggests why many functional variants are not seen in nature.

Mapping protein sequence space is a difficult problem that necessitates the analysis of 20N combinations for sequences of length N. We systematically mapped the sequence space of four key residues in the Escherichia coli protein kinase PhoQ that drive recognition of its substrate PhoP. We generated a library containing all 160,000 variants of PhoQ at these positions and used a two-step selection coupled to next-generation sequencing to identify 1659 functional variants. Our results reveal extensive degeneracy in the PhoQ-PhoP interface and epistasis, with the effect of individual substitutions often highly dependent on context. Together, epistasis and the genetic code create a pattern of connectivity of functional variants in sequence space that likely constrains PhoQ evolution. Consequently, the diversity of PhoQ orthologs is substantially lower than that of functional PhoQ variants.

