Multiplexed gene synthesis in emulsions for exploring protein functional landscapes

See allHide authors and affiliations

Science  19 Jan 2018:
Vol. 359, Issue 6373, pp. 343-347
DOI: 10.1126/science.aao5167
  • Fig. 1 DropSynth assembly and optimization.

    (A) We amplified array-derived oligos and exposed a single-stranded region that acts as a gene-specific microbead barcode. Barcoded beads display complementary single-stranded regions that selectively pull down the oligos necessary to assemble each gene. The beads are then emulsified, and the oligos are assembled by means of PCA. The emulsion is then broken, and the resultant assembled genes are barcoded and cloned. (B) We used a model gene library that allowed us to monitor the level of specificity and coverage of the assembly process. We then optimized various aspects of the protocol—including purification steps, DNA ligase, and bead couplings—in order to improve the specificity of the assembly reaction. Enrichment is defined as the number of specific assemblies observed relative to what would be observed by random chance in a full combinatorial assembly. (C) We attempted 96-plex gene assemblies with three, four, five, or six oligos, and the resultant libraries displayed the correct-sized band on an agarose gel. (D) The distribution of read counts for all 96 assemblies (four-oligo assembly) as determined with NGS.

  • Fig. 2 DropSynth assembly of 10,752 genes.

    (A) We used DropSynth to assemble 28 libraries of 10,752 genes representing 1152 homologs of PPAT and 4992 homologs of DHFR. The number of library members with at least one perfect assembly and the median percent perfects determined by using constructs with at least 100 barcodes is shown for each library. (B) We observed that 872 PPAT homologs (75%) had at least one perfect assembly, and 1002 homologs (87%) had at least one assembly within a distance of five amino acids from design. (C) We assembled two codon variants for each designed DHFR homolog, allowing us to achieve higher coverage.

  • Fig. 3 PPAT complementation assay.

    (A) We used DropSynth to assemble a library of 1152 homologs of PPAT, an essential enzyme catalyzing the second-to-last step in CoA biosynthesis, and functionally characterized them using a pooled complementation assay. The barcoded library was transformed into E. coli ΔcoaD cells containing a curable rescue plasmid expressing E. coli coaD. The rescue plasmid was removed, allowing the homologs and their mutants to compete with each other in batch culture. We tracked assembly barcode frequencies over four serial 1000-fold dilutions and used the frequency changes to assign a fitness score. (B) This phylogenetic tree shows 451 homologs each with at least five assembly barcodes, a subset of the full data set, in which leaves are colored by fitness. Despite having a median 50% sequence identity, we found that the majority of PPAT homologs are able to complement the function of the native E. coli PPAT, with 70% having positive fitness values, whereas low-fitness homologs are dispersed throughout the tree, without much clustering of clades.

  • Fig. 4 Broad mutational scanning analysis.

    (A) The fitness landscape of 497 complementing PPAT homologs and their 71,061 mutants (within a distance of five amino acids) is projected onto the E. coli PPAT sequence, with each point in the heatmap showing the average fitness over all sequences containing that amino acid at each aligned position. Mutations are highly constrained at a core group of residues involved in catalytic function. Other positions show relatively little loss of function, when averaged over many homologs, despite known interactions with the substrates. The E. coli wild-type (WT) sequence is indicated by green squares, and the average position fitness, fitness of a residue deletion, mean EVmutation evolutionary statistical energy (22), site conservation, relative solvent accessibility, and secondary structure information is shown above. (B) The average fitness at each position, with blue and red representing low and high fitness, respectively, overlaid on the E. coli PPAT [Protein Data Bank 1QJC and 1GN8 (23)] structure complexed with 4′-phosphopantetheine and ATP. We observed loss of function for mutations occurring at the active site, whereas other residues involved with allosteric regulation by CoA or dimer interfaces show large promiscuity, highlighting different strategies used among homologs. (C) In addition to complementing homologs, we can also analyze mutants of the 129 low-fitness (<–2.5) homologs, finding 385 GOF mutants across 55 homologs. We project this data onto the E. coli PPAT sequence and plot the number of GOF mutants at each position, shaded by the number of different homologs represented. We found a total of eight statistically significant positions (residues 34, 35, 64, 68, 69, 103, 134, and 135) corresponding to four regions in the PPAT structure.

Supplementary Materials

  • Multiplexed gene synthesis in emulsions for exploring protein functional landscapes

    Calin Plesa, Angus M. Sidore, Nathan B. Lubock, Di Zhang, Sriram Kosuri

    Materials/Methods, Supplementary Text, Tables, Figures, and/or References

    Download Supplement
    • Materials and Methods
    • Figs. S1 to S25
    • Tables S1 to S7
    • References
    Tables S8 to S14
    Table S8 – DHFR oligos
    The oligo sequences used to assemble all 30 libraries of DHFR homologs, the source organism, and corresponding microbead barcodes.

    Table S9 – PPAT oligos

    The oligo sequences used to assemble all 3 libraries of PPAT homologs, the source organism, and corresponding microbead barcodes.

    Table S10 – DHFR homolog info
    The assembled gene sequences for DHFR homologs.

    Table S11 – PPAT homolog info

    The assembled gene sequences for PPAT homologs.

    Table S12 – PPAT homolog fitness

    The pooled complementation assay fitness values for each PPAT homolog with perfect assemblies.

    Table S13 – BMS

    The average fitness, number of data points, and standard deviation for all amino acid and position combinations in the PPAT broad mutational scanning (BMS) dataset.

    Table S14 – GoF

    The fitness for each gain-of-function mutant. Each different mutation is shown on it's own row.

    Images, Video, and Other Media

    Movie S1
    An overview of the DropSynth gene assembly process.

Stay Connected to Science

Navigate This Article