Report

Self-Assembling Protein Microarrays

See allHide authors and affiliations

Science  02 Jul 2004:
Vol. 305, Issue 5680, pp. 86-90
DOI: 10.1126/science.1097639

Abstract

Protein microarrays provide a powerful tool for the study of protein function. However, they are not widely used, in part because of the challenges in producing proteins to spot on the arrays. We generated protein microarrays by printing complementary DNAs onto glass slides and then translating target proteins with mammalian reticulocyte lysate. Epitope tags fused to the proteins allowed them to be immobilized in situ. This obviated the need to purify proteins, avoided protein stability problems during storage, and captured sufficient protein for functional studies. We used the technology to map pairwise interactions among 29 human DNA replication initiation proteins, recapitulate the regulation of Cdt1 binding to select replication proteins, and map its geminin-binding domain.

To exploit the growing number of expression-ready cDNA clone collections, high-throughput (HT) methods to study protein function are needed (15). The development of protein microarrays offers one compelling approach (68). Protein microarrays are currently available in two general formats. Antibody arrays contain an array of antibodies that measure the abundance of specific proteins (or other molecules) in samples (9). Our work focuses on target protein arrays, which present arrayed proteins of interest. They can be used to examine target protein interactions with other molecules, such as drugs, antibodies, nucleic acids, lipids, or other proteins. In addition, the arrays can be interrogated to find substrates for enzymes (10, 11).

The current approach to generate target protein microarrays is to produce proteins separately and then spot them on the arrays with the use of a variety of linkage chemistries (68, 12). Despite these demonstrations of feasibility, target protein microarrays have not been widely adopted. In part, this may be due to the labor and technical issues associated with HT protein production. Challenges remain to find HT expression systems for mammalian proteins with good yield and purity, and under conditions conducive to functional protein studies. Moreover, once isolated, there are concerns regarding protein stability during storage, either before or after spotting on the array.

Building upon the successful use of in vitro translated protein in standard scale applications (1315), our approach substitutes the use of purified proteins with the use of cDNAs encoding the target proteins at each feature of the microarray. The proteins are transcribed and translated by a cell-free system and immobilized in situ by means of epitope tags fused to the proteins. This approach eliminates the need to express and purify proteins separately and produces proteins at the time of the assay, abrogating concerns about protein stability during storage. Mammalian proteins can be expressed in a mammalian milieu, providing access to vast collections of cloned cDNAs.

With a nucleic acid programmable protein array (NAPPA), we aimed to exploit this biochemical strategy and enable proteome-scale experiments. This required a high-density format that minimized the use of cell-free extract, and for convenience and accessibility, a readily available matrix (such as standard glass microscope slides) that did not require specially micromachined wells (10) and that used existing technology for printing and reading DNA microarrays.

Through testing a variety of cDNA printing schemes, we found that an optimum balance was required between binding DNA efficiently and maintaining a DNA conformation that supported efficient transcription and translation (fig. S1). The most efficient strategy coupled a psoralen-biotin conjugate to the expression plasmid DNA with the use of ultraviolet light, which was then captured on the surface by avidin (Fig. 1).

Fig. 1.

NAPPA approach. Biotinylation of DNA: Plasmid DNA is cross-linked to a psoralen-biotin conjugate with the use of ultraviolet light (17). (A) Printing the array. Avidin (1.5 mg/ml, Cortex), polyclonal GST antibody (50 μg/ml, Amersham), and Bis (sulfosuccinimidyl) suberate (2 mM, Pierce) are added to the biotinylated plasmid DNA. Samples are arrayed onto glass slides treated with 2% 3-aminopropyltriethoxysilane (Pierce) and 2 mM dimethyl suberimidate. 2HCl (Pierce). (B) In situ expression and immobilization. Microarrays were incubated with 100 μl per slide rabbit reticulocyte lysate with T7 polymerase (Promega) at 30°C for 1.5 hours then 15°C for 2 hours in a programmable chilling incubator (Torrey Pines). (C) Detection. Target proteins are expressed with a C-terminal GST tag and immobilized by the polyclonal GST antibody. All target proteins are detected using a monoclonal antibody to GST (Cell Signaling Technology) against the C-terminal tag confirming expression of full-length protein.

The addition of a C-terminal glutathione S-transferase (GST) tag to each protein enabled its capture to the array through an antibody to GST printed simultaneously with the expression plasmid (fig. S2). Other protein fusion tags and capture molecules can be easily substituted for the GST fusion and antibodies to GST used here (16). The resulting array was dried and stored at room temperature.

To activate and use the array, a cell-free coupled transcription and translation system (such as reticulocyte lysate containing T7 polymerase) was added as a single continuous layer (not discrete spots) covering the arrayed cDNAs on the microscope slide. To test the system, expression plasmids encoding eight genes were immobilized onto an array at a density of 512 spots per slide (900-μm spacing). Expression of target protein was confirmed with an antibody to GST (different from the capture GST antibody), and the signals were measured with a standard glass slide DNA-microarray scanner (Fig. 2A and fig. S3). We observed an easily detectable signal for all proteins [average signal-to-noise ratio (±SD) = 53 ± 14], demonstrating that 100 μl of reticulocyte lysate is sufficient to support protein expression in all 512 spots of the array simultaneously (17). Not surprisingly, there was modest variation in protein expression from gene to gene (coefficient of variation = ∼24%). We have subsequently found that these differences can often be corrected by adjusting the amount of printed plasmid template. By comparing signal intensities to control spots containing purified GST, we estimate that about 10 fmol (∼675 pg) of protein were produced and captured at each spot, which compares favorably to existing methods (8).

Fig. 2.

Expression of target proteins and detection of protein interactions on a NAPPA microarray format. (A) Eight target plasmid DNAs encoding C-terminal GST fusion proteins in pANT7_cGST (fig. S2) were immobilized onto the glass slide at a density of 512 spots per slide (900-μm spacing). The target proteins were expressed with 100 μl rabbit reticulocyte lysate supplemented with T7 polymerase. Signals were detected with antibody to GST and tyramide signal amplification (TSA) reagent (PerkinElmer). To verify that the detected proteins were the expected target proteins, and to confirm that there was no cross-talk across the slide, we used target protein–specific antibodies, which detected only their relevant spots (fig. S3). (B and C) The eight genes were queried for potential interactors with (B) Jun and (C) p16. Query DNA encoding an N-terminal HA tag was added to the reticulocyte lysate before expressing the target proteins (fig. S2). Target and query proteins were coexpressed and the interaction was detected with an antibody to HA (12CA5). The bar graphs show average intensity (+SD) from 64 samples for each interaction. Images were quantified using ScanAlyze software (Michael Eisen, Lawrence Berkeley National Laboratory, CA). The signals were corrected for local background.

NAPPA is well suited to the detection of protein-protein interactions because both the target proteins (bound to the array) and the query protein (used to probe the array) can be transcribed and translated in the same extract. As validation, the query protein Jun was tagged with a hemagglutinin (HA) epitope and coexpressed with the target proteins (Fig. 2B). The interaction was visualized with an antibody to HA, which revealed that Jun query protein bound to the Fos target [dissociation constant Kd ∼ 50 nM (12)]. To determine if binding selectivity is preserved, we tested the Cdk inhibitor p16, which binds selectively to Cdk4 and Cdk6 but not the closely related Cdk2. As shown in Fig. 2C, this specificity was recapitulated with NAPPA.

To apply NAPPA to a biological question, we studied the human DNA replication complex. Experiments in yeast, Xenopus, and human cells have led to a detailed model for the initiation of eukaryotic DNA replication. Origins of replication are “licensed” in the G1 phase of the cell cycle when the origin replication complex (ORC) recruits the initiation factors, Cdt1 and Cdc6, and the minichromosome maintenance complex (MCM2-7) to form the prereplication complex (pre-RC). In S phase, the pre-RC is converted into an active replication fork by the protein kinases Cdc7 and Cdk2, a process that involves origin binding of at least two additional initiation factors, MCM10 and Cdc45, leading to DNA synthesis (18).

Sequence-verified human genes for 29 proteins involved in DNA replication initiation (in addition to Fos and Jun as positive controls) were immobilized and expressed on NAPPA (Fig. 3A). Signals were readily detected for all of the target proteins, showed high reproducibility between duplicates, and ranged from 270 pg (4 fmols) to 2600 pg (29 fmols), a sevenfold range that falls well within the range observed in protein-spotting protein microarrays [10 to 950 pg (8)]. Each of the 29 DNA replication proteins was used as a query to probe a pair of duplicate arrays to generate a 29 × 29 protein interaction matrix. Examples of the interaction data are shown in Fig. 3, B and C.

Fig. 3.

Expression of human DNA replication proteins and interaction mapping. (A) Target DNAs representing 29 human DNA replication proteins and two positive controls were immobilized and expressed on the array in duplicate. The legend (right panel) lists all genes expressed on the array. Expression of all target proteins was confirmed by an antibody to GST (left panel). Two protein registration markers, purified recombinant GST (22 μg/ml, Sigma) and whole-mouse immunoglobulin G (IgG) (550 μg/ml, Pierce), were also printed as registration spots and were able to monitor protein expression and slide variation (inset, bottom). (B) Replicate slides from (A) were probed with each member of the DNA replication proteins expressed as HA-tagged query proteins, repeating each query protein on two slides. (B) is a superposition of images of slides probed with HA-ORC3 (red) and HA-MCM2 (green); target spots that interacted with both queries appear as yellow. Interactions were detected with an antibody to HA and quantified with ScanAlyze. (C) The signals for all interactions, including (B), were calculated by subtracting local background and then standardized with the intensity of whole-mouse IgG registration marker. Interactions were considered positive when the signal was greater than three times the standard deviation of the background for all instances of the interaction. The bar graph shows signal intensity for interactions with ORC3 (red) and MCM2 (green) shown in (B). (D) Interaction map shows interactions among the ORC and MCM complex in blue (lines and shaded oval) and green (lines and shaded oval), respectively. Intercomplex interactions are shown in dark blue. Interactions with proteins involved in the formation of pre-RC and preinitiation complex are shown in red and additional regulatory proteins are shown in brown. All other interactions are shown in orange. The arrows of the connector show the direction (from target to query) of the interaction and the weight given to the connector depicts the strength of the signal.

We found 110 interactions among the proteins in the replication complex, averaging 7.7 interactions per protein (range of 3 to 16; Fig. 3D and table S1). Detected interactions included 47 previously identified by any method including genetic, two-hybrid, and biochemical interactions (based on our literature survey), and 63 previously undetected interactions. Of the gold-standard interactions that had been demonstrated biochemically with purified proteins, we detected 17 of 20 (85%) (1820). We also detected 19 of the 36 reported interactions (53%) on the basis of coimmunoprecipitation (IP) (18, 2123). A difference here is expected because NAPPA only detects binary interactions, whereas IP also reports interactions mediated by bridging proteins. In fact, a NAPPA network in which two proteins shared a common binding partner could be identified for each of the 17 IP interactions not detected by NAPPA. Overlap was lowest (42%) with interactions reported by yeast two-hybrid (18, 20, 22, 24).

A variety of biochemical experiments have identified two stable complexes, ORC and MCM2-7, in the pre-RC of many species (18). Consistent with this, the microarray experiments detected many interactions (28% of all detected interactions) within and between these two complexes (Fig. 3D) including 10 unique interactions among the six ORC subunits (Fig. 3D, blue) in agreement with the current ORC model (25). Similarly, we observed most known interactions within the MCM complex except those involving MCM6, which was among the proteins that showed low expression as both the target and query (Fig. 3D, green). Interactions among Cdc6, Cdt1, and the ORC proteins required for pre-RC formation were not previously understood. Here, we find that Cdc6 interacts directly with all of the ORC proteins except ORC4 and Cdt1 interacts specifically with ORC1 and ORC2 (Fig. 3D, red).

In the S phase, the loading of Cdc45 to the chromatin is postulated to activate the helicase activity of the bound MCM2-7 complex (26, 27). Interestingly, we did not observe any direct interactions between Cdc45 and the MCM2-7 proteins. Cdc45 interacted with MCM10, which in turn interacted with several MCM2-7 proteins (Fig. 3D, red), suggesting that MCM10 could act to recruit Cdc45 to the MCM2-7 complex. This is consistent with recent experiments showing that MCM10 is indeed required for Cdc45 binding to chromatin (28).

Cdc6 and Cdt1 are both necessary to recruit the MCM2-7 complex onto chromatin (18). We detected many interactions among these proteins but none between Cdt1 and the MCM2-7 proteins, although they coimmunoprecipitate (29, 30). Cdt1 and MCM2 share Cdc6 as a binding partner (fig. S4), suggesting that Cdc6 could bridge Cdt1 to the MCM2-7 complex. The open format of NAPPA supports the expression of proteins in addition to the target and query, allowing the examination of multiprotein complexes and their regulation. By exploiting this feature, we demonstrated that MCM2 bound to Cdt1 only in the presence of coexpressed Cdc6 (Fig. 4A). Thus, it is likely that Cdc6 acts as a bridging protein, although enzymatic or allosteric effects cannot be ruled out, showing that simple regulatory mechanisms can be recapitulated in the protein microarray format.

Fig. 4.

Characterization of Cdt1. (A) Cdt1 regulation. NAPPA was used to test whether Cdc6 could act as a bridging protein between Cdt1 and MCM2. Target proteins Cdt1, Cdc45 (negative control), and MCM5 (positive control) were expressed in duplicate (top panel) and confirmed by an antibody to GST. The target proteins were probed with either HA-MCM2 alone (left panel) or in the presence of coexpressed His-Cdc6 (right panel). The binding of MCM2 was detected with an antibody to HA. (B) Cdt1 deletion mapping. NAPPA was used to map the binding domain of geminin on Cdt1. Fragments from various regions of Cdt1 (as indicated) were generated by PCR and cloned into target expression vectors. The partial or full-length polypeptides were expressed and detected on the array with an antibody to GST (left panel). To identify the binding region of geminin, the array was queried with HA-geminin (right panel) and developed with an antibody to HA. Vertical lines (dashed) delimit the small (15-aa) region common to all the fragments that bound geminin. A small 78-aa domain (135 to 212 aa) containing the noted 15 aa (198 to 212 aa) was expressed along with full-length Cdt1 (bottom left), which was again queried with geminin (bottom right).

To further examine Cdt1 protein function, we focused on its interaction with geminin. Previous work has mapped the binding to a relatively large domain of Cdt1 [177 to 380 amino acids (aa) (20)]. We used NAPPA to map more precisely the binding domain of geminin on human Cdt1 by generating a series of end deletion fragments of Cdt1, expressing the partial length proteins on the array, and probing the array with HA-geminin as query protein (Fig. 4B). Using this approach, we localized a ∼15-aa sequence (198 to 212 aa) that was necessary for binding. We then tested a 78–aa fragment (135 to 212 aa) containing this sequence and demonstrated that it was sufficient for geminin binding, albeit somewhat more weakly.

There still remain technical challenges for NAPPA. First, as in virtually all other HT protein interaction techniques, there is the possibility that bridging proteins or inhibitors (e.g., from the cell-free expression system) may affect some interactions. Second, the use of peptide tags, also common to most interaction methods, may lead to steric effects that block important binding domains, although with NAPPA tags can be configured on either end of the proteins. Third, some posttranslational modifications may be absent in NAPPA. However, the open format of NAPPA allows for the addition of enzymes or extracts if needed. Finally, because NAPPA lacks the spatial and temporal compartmentalization of the cell and because the folding and activity of proteins in vitro may not always reflect protein activity in vivo, it will be important to confirm that previously unidentified interactions make biological sense.

Supporting Online Material

www.sciencemag.org/cgi/content/full/305/5680/86/DC1

Materials and Methods

Figs. S1 to S4

Table S1

References and Notes

View Abstract

Navigate This Article