Research Article

Comprehensive serological profiling of human populations using a synthetic human virome

See allHide authors and affiliations

Science  05 Jun 2015:
Vol. 348, Issue 6239, aaa0698
DOI: 10.1126/science.aaa0698

Viral exposure—the complete history

In addition to causing illness, viruses leave indelible footprints behind, because infection permanently alters the immune system. Blood tests that detect antiviral antibodies can provide information about both past and present viral exposures. Typically, such tests measure only one virus at a time. Using a synthetic representation of all human viral peptides, Xu et al. developed a blood test that identifies antibodies against all known human viruses. They studied blood samples from nearly 600 people of differing ages and geographic locations and found that most had been exposed to about 10 viral species over their lifetime. Despite differences in the rates of exposure to specific viruses, the antibody responses in most individuals targeted the same viral epitopes.

Science, this issue 10.1126/science.aaa0698

Structured Abstract

Introduction

The collection of viruses found to infect humans can have profound effects on human health. In addition to directly causing acute or chronic illness, viral infection can alter host immunity in more subtle ways, leaving an indelible footprint on the immune system. This interplay between virome and host immunity has been implicated in the pathogenesis of complex diseases such as type 1 diabetes, inflammatory bowel disease, and asthma. Despite the growing appreciation for the importance of interactions between the virome and host, a comprehensive method to systematically characterize these interactions has yet to be developed.

Rationale

Current serological methods to detect viral infections are predominantly limited to testing one pathogen at a time and are therefore used primarily to address specific clinical hypotheses. A method that could simultaneously detect responses to all human viruses would allow hypothesis-free analysis to detect associations between past viral infections and particular diseases or population structures. Humoral responses to infection typically arise within 10 to 14 days of initial exposure and can persist over years or decades, thus providing a rich source of the history of pathogen encounters. In this work, we present VirScan, a high-throughput method that allows comprehensive analysis of antiviral antibodies in human sera. VirScan uses DNA microarray synthesis and bacteriophage display to create a uniform, synthetic representation of peptide epitopes comprising the human virome. Immunoprecipitation and high-throughput DNA sequencing reveal the peptides recognized by antibodies in the sample. The analysis requires less than 1 μl of blood.

Results

We screened sera from 569 human donors across four continents, assaying a total of over 108 antibody-peptide interactions for reactivity to 206 human viral species and >1000 strains. We found that VirScan’s performance in detecting known infections and distinguishing between exposures to related viruses is comparable to that of classical serum antibody tests for single viruses. We detected antibodies to an average of 10 viral species per person and 84 species in at least two individuals. Our approach maps antibody targets at 56–amino acid resolution, and our results nearly double the number of previously established viral B cell epitopes. Although rates of specific virus exposure varied depending on age, HIV status, and geographic location of the donor, we observed strong similarities in antibody responses across individuals. In particular, we found multiple instances of single peptides that were recurrently recognized by antibodies in the vast majority of donors. We performed tiling mutagenesis and found that these antibody responses targeted substantially conserved “public epitopes” for each virus, suggesting that antibodies with highly similar specificities, and possibly structures, are elicited across individuals.

Conclusion

VirScan is a method that enables human virome-wide exploration, at the epitope level, of immune responses in large numbers of individuals. We have demonstrated its effectiveness for determining viral exposure and characterizing viral B cell epitopes in high throughput and at high resolution. Our preliminary studies have revealed intriguing general properties of the human immune system, both at the individual and the population scale. VirScan may prove to be an important tool for uncovering the effect of host-virome interactions on human health and disease and could easily be expanded to include new viruses as they are discovered, as well as other human pathogens, such as bacteria, fungi, and protozoa.

Systematic viral epitope scanning (VirScan).

This method allows comprehensive analysis of antiviral antibodies in human sera. VirScan combines DNA microarray synthesis and bacteriophage display to create a uniform, synthetic representation of peptide epitopes comprising the human virome. Immunoprecipitation and high-throughput DNA sequencing reveal the peptides recognized by antibodies in the sample. The color of each cell in the heatmap depicts the relative number of antigenic epitopes detected for a virus (rows) in each sample (columns).

Abstract

The human virome plays important roles in health and immunity. However, current methods for detecting viral infections and antiviral responses have limited throughput and coverage. Here, we present VirScan, a high-throughput method to comprehensively analyze antiviral antibodies using immunoprecipitation and massively parallel DNA sequencing of a bacteriophage library displaying proteome-wide peptides from all human viruses. We assayed over 108 antibody-peptide interactions in 569 humans across four continents, nearly doubling the number of previously established viral epitopes. We detected antibodies to an average of 10 viral species per person and 84 species in at least two individuals. Although rates of specific virus exposure were heterogeneous across populations, antibody responses targeted strongly conserved “public epitopes” for each virus, suggesting that they may elicit highly similar antibodies. VirScan is a powerful approach for studying interactions between the virome and the immune system.

The collection of viruses found to infect humans (the “human virome”) can have profound effects on human health (1). In addition to directly causing acute or chronic illness, viral infection can also alter host immunity in more subtle ways, leaving an indelible footprint on the immune system (2). For example, latent herpesvirus infection has been shown to confer symbiotic protection against bacterial infection in mice through prolonged production of interferon-γ and systemic activation of macrophages (3). This interplay between virome and host immunity has also been implicated in the pathogenesis of complex diseases such as type 1 diabetes, inflammatory bowel disease, and asthma (4). Despite this growing appreciation for the importance of interactions between the virome and host, a comprehensive method to systematically characterize these interactions has yet to be developed (5).

Viral infections can be detected by serological or nucleic acid–based methods (6). However, nucleic acid tests fail in cases where viruses have already been cleared after causing or initiating tissue damage and can miss viruses of low abundance or viruses not normally present in the sampled fluid or surface. In contrast, humoral responses to infection typically arise within 2 weeks of initial exposure and can persist over years or decades (7). Tests detecting antiviral antibodies in peripheral blood can therefore identify ongoing and cleared infections. However, current serological methods are predominantly limited to testing one virus at a time and are therefore only used to address specific clinical hypotheses. Scaling serological analyses to encompass the complete human virome poses substantial technical challenges, but would be of great value for better understanding host-virus interactions and would overcome many of the limitations associated with current clinical technologies. In this work, we present VirScan, a programmable, high-throughput method to comprehensively analyze antiviral antibodies using immunoprecipitation and massively parallel DNA sequencing of a bacteriophage library displaying proteome-wide coverage of peptides from all human viruses.

Results

The VirScan platform

VirScan uses the phage immunoprecipitation sequencing (PhIP-seq) technology previously developed in our laboratory (8). Briefly, we used a programmable DNA microarray to synthesize 93,904 200-mer oligonucleotides, encoding 56-residue peptide tiles, with 28-residue overlaps, that together span the reference protein sequences (collapsed to 90% identity) of all viruses annotated to have human tropism in the UniProt database (Fig. 1A, a and b) (9). This library includes peptides from 206 species of virus and over 1000 different strains. We cloned the library into a T7 bacteriophage display vector for screening (Fig. 1A, c).

Fig. 1 General VirScan analysis of the human virome.

(A) Construction of the virome peptide library and VirScan screening procedure. (a) The virome peptide library consists of 93,904 56–amino acid peptides tiling, with 28–amino acid overlap, across the proteomes of all known human viruses. (b) The 200-nt DNA sequences encoding the peptides were printed on a releasable DNA microarray. (c) The released DNA was amplified and cloned into a T7 phage display vector and packaged into virus particles displaying the encoded peptide on its surface. (d) The library is mixed with a sample containing antibodies that bind to their cognate peptide antigen on the phage surface. (e) The antibodies are immobilized, and unbound phage are washed away. (f) Last, amplification of the bound DNA and high-throughput sequencing of the insert DNA from bound phage reveals peptides targeted by sample antibodies. Ab, antibody; IP, immunoprecipitation. (B) Antibody profile of randomly chosen group of donors to show typical assay results. Each row is a virus; each column is a sample. The label above each chart indicates whether the donors are over 10 years of age or at most 10 years of age. The color intensity of each cell indicates the number of peptides from the virus that were significantly enriched by antibodies in the sample. (C) Scatter plot of the number of unique enriched peptides (after applying maximum parsimony filtering) detected in each sample against the viral load in that sample. Data are shown for the HCV-positive and HIV-positive samples for which we were able to obtain viral load data. For the HIV-positive samples, red dots indicate samples from donors currently on highly active anti-retroviral therapy (HAART) at the time the sample was taken, whereas blue dots indicate different donors before undergoing therapy. IU, international units. (D) Overlap between enriched peptides detected by VirScan and human B cell epitopes from viruses in IEDB. The entire pink circle represents the 1392 groups of nonredundant IEDB epitopes that are also present in the VirScan library (out of 1559 clusters total). The overlap region represents the number of groups with an epitope that is also contained in an enriched peptide detected by VirScan. The purple-only region represents the number of nonredundant enriched peptides detected by VirScan that do not contain an IEDB epitope. Data are shown for peptides enriched in at least one (left) or at least two (right) samples. (E) Overlap between enriched peptides detected by VirScan and human B cell epitopes in IEDB from common human viruses. The regions represent the same values as in (D) except only epitopes corresponding to the indicated virus are considered, and only peptides from that virus that were enriched in at least two samples were considered. (F) Distribution of number of viruses detected in each sample. The histogram depicts the frequency of samples binned by the number of virus species detected by VirScan. The mean and median of the distribution are both about 10 virus species.

To perform a screen, we incubate the library with a serum sample containing antibodies, recover the antibodies by using a mixture of protein A– and G–coated magnetic beads, and remove unbound phage particles by washing (Fig. 1A, d and e). Last, we perform polymerase chain reaction (PCR) and massively parallel sequencing on the phage DNA to quantify enrichment of each library member resulting from antibody binding (Fig. 1A, f). Each sample is screened in duplicate to ensure reproducibility. VirScan requires only 2 μg of immunoglobulin (<1 μl of serum) per sample and can be automated on a 96-well liquid handling robot (10). PCR product from 96 immunoprecipitations can be individually barcoded and pooled for sequencing, reducing the cost for a comprehensive viral antibody screen to about $25 per sample.

After sequencing, we tally the read count for each peptide before (“input”) and after (“output”) immunoprecipitation. We then fit a zero-inflated generalized Poisson model to the distribution of output read counts for each input read count and regress the parameters as a function of input read count (fig. S1). With use of this model, we calculate a –log10(P value) for the significance of each peptide’s enrichment. Last, we call a peptide significantly enriched if its –log10(P value) is greater than the reproducibility threshold of 2.3 in both replicates (fig. S2).

Characterizing VirScan’s sensitivity and specificity

Figure 1B shows the antibody profiles of a set of human viruses in sera from a typical group of individuals in a heat map format that illustrates the number of enriched peptides from each virus. We frequently detected antibodies to multiple peptides from common human viruses, such as Epstein-Barr virus (EBV), cytomegalovirus (CMV), and rhinovirus. As expected, we observed more peptides to be enriched from viruses with larger proteomes, such as EBV and CMV, likely because there are more epitopes available for recognition. We noticed fewer enriched peptides in samples from individuals less than 10 years of age compared with their geographically matched controls, in line with an accumulation of viral infections throughout adolescence and adulthood. However, there were occasional samples from young donors with very strong responses to viruses that cause childhood illness, such as parvovirus B19 and herpesvirus 6B, which cause the “fifth disease” and “sixth disease” of the classical infectious childhood rashes, respectively (11). These observations are examined in greater detail in Fig. 2.

Fig. 2 Population stratification of the human virome immune response.

The bar graphs depict the differences in exposure to viruses between donors who are (A) less than 10 years of age versus over 10 years of age, (B) HIV-positive versus HIV-negative residing in the United States, (C) residing in Peru versus residing in the United States, (D) residing in South Africa versus residing in the United States, and (E) residing in Thailand versus residing in the United States. Asterisks indicate false discovery rate < 0.05.

We developed a computational method to identify the set of viruses to which an individual has been exposed, based on the number of enriched peptides identified per virus. Briefly, we set a threshold number of significant non-overlapping enriched peptides for each virus. We empirically determined that a threshold of three non-overlapping enriched peptides gave the best performance for detecting herpes simplex virus 1 (HSV1) compared with a commercial serologic test, described below (Table 1). For other viruses, we adjusted the threshold to account for the size of the viral proteome (fig. S3). Next, we tally the number of enriched peptides from each virus. Antibodies generated against a specific virus can cross-react with similar peptides from a related virus. This would lead to false positives, because an antibody targeted to an epitope from one virus to which a donor was exposed would also enrich a homologous peptide from a related virus to which the donor may not have been exposed. In order to address this issue, we adopted a maximum parsimony approach to infer the fewest number of virus exposures that could elicit the observed spectrum of antiviral peptide antibodies. Groups of enriched peptides that share a seven–amino acid subsequence may be recognized by a single specific antibody, so we only count them as one epitope for the virus that has the greatest number of other enriched peptides. If this adjusted peptide count is greater than the threshold for that virus, the sample is considered positive for the virus. For this analysis, we also filtered out peptides that were enriched in only 1 of the 569 samples to avoid spurious hits.

Table 1 VirScan’s sensitivity and specificity on samples with known viral infections.

Sensitivity is the percentage of samples positive for the virus as determined by VirScan out of all n known positives. Specificity is the percentage of samples negative for the virus by VirScan out of all n known negatives.

View this table:

With this analytical framework, we measured the performance of VirScan by using serum samples from individuals known to be infected or not infected with human immunodeficiency virus (HIV) and hepatitis C virus (HCV), based on commercial enzyme-linked immunosorbent assay (ELISA) and Western blot assays. For both viruses, VirScan achieves very high sensitivities and specificities of ~95% or higher (Table 1) over a wide range of viral loads (Fig. 1C). The viral genotype was also known for the HCV-positive samples. Despite the over 70% amino acid sequence conservation among HCV genotypes (12), which poses a problem for all antibody-based detection methods, VirScan correctly reported the HCV genotype in 69% of the samples. We also compared VirScan to a commercially available serology test that is type-specific for the highly related HSV1 and HSV2 (Table 1). These results demonstrate that VirScan performs well in distinguishing between closely related viruses and viruses that range in size from small (HIV and HCV) to very large (HSV1 and HSV2) with high sensitivity and specificity.

Population-level analysis of viral exposures

After ascertaining the performance of VirScan for a panel of viruses, we undertook a large-scale screening of samples with unknown exposure history. By using our multiplex approach, we assayed over 106 million antibody-peptide interactions with samples from 569 human donors in duplicate. We detected antibody responses to an average of 10 species of virus per sample (Fig. 1F). Each person is likely exposed to multiple distinct strains of some viral species. We detected antibody responses to 62 of the 206 species of virus in our library in at least five individuals and 84 species in at least two individuals. The most frequently detected viruses are generally those known to commonly infect humans (Table 2 and table S1). We occasionally detected what appear to be false positives that may be due to antibodies that cross-react with nonviral peptides. For example, 29% of the samples positive for cowpox virus were right at the threshold of detection and had antibodies against a peptide from the C4L gene that shares an eight–amino acid sequence (SESDSDSD; D, Asp; E, Glu; S, Ser) with the clumping factor B protein from Staphylococcus aureus, against which humans are known to generate antibodies (13). This will become less of an issue when we test more examples of sera from individuals with known infections to determine the set of likely antigenic peptides for a given virus. However, the fact that we do not detect high rates of very rare viruses strengthens our confidence in VirScan’s specificity (see supplementary discussion).

Table 2 Frequently detected viruses.

The % column indicates the percentage of samples that were positive for the virus by VirScan. Known HIV- and HCV-positive samples were excluded when performing this analysis.

View this table:

We frequently detected antibodies to rhinovirus and respiratory syncytial virus, which are normally found only in the respiratory tract, indicating that VirScan using blood samples is still able to detect viruses that do not cause viremia. We also detected antibodies to influenza, which is normally cleared, and poliovirus, to which most people in modern times generate antibodies through vaccination. Because the original antigen is no longer present, we are likely detecting antibodies secreted by long-lived memory B cells (14).

We detected antibodies to certain viruses less frequently than expected based on previous seroprevalence studies that used optimized serum ELISAs. For example, the frequency at which we detect influenza (53.4%) and poliovirus (33.7%) is lower than expected given that the majority of the population has been exposed to or vaccinated against these viruses. This may be due to reduced sensitivity because of a gradual narrowing and decrease of the long-lived B cell response in the absence of persistent antigen. We also rarely detected antibody responses to small viruses, such as JC virus (JCV) and torque teno virus, which are frequently detected by using specific tests. We believe that the disparity is due to low titers of antibodies to unmodified, linear epitopes from these viruses. For example, serum antibodies against the major capsid protein of JCV are reported to only recognize conformational epitopes (15). Last, the frequency of detecting varicella zoster virus (chicken pox) antibodies is also lower than expected (24.3%), even though the frequency of detecting other latent herpesviruses, such as EBV (87.1%) and CMV (48.5%), is similar to the prevalence reported in epidemiological studies (1618). This may reflect differences in how frequently these viruses shed antigens that stimulate B cell responses or a more limited humoral response that relies on epitopes that cannot be detected in a 56-residue peptide. It might also be possible to increase the sensitivity of detection of these viral antibodies by stimulating memory B cells in vitro to probe the history of infection more deeply.

To assess differences in viral exposure between populations, we split the samples into different groups based on age, HIV status, and geography. We first compared results from children under the age of 10 to adults within the United States (HIV-positive individuals were excluded from this analysis) (Fig. 2A). Fewer children were positive for most viruses, including EBV, HSV1, HSV2, and influenza virus, which is consistent with our preliminary observations comparing the number of enriched peptides (Fig. 1B). In addition to the fact that children may generate lower antibody titers in general, these younger donors probably have not yet been exposed to certain viruses, for example, HSV2, which is sexually transmitted (19).

When comparing results from HIV-positive to HIV-negative samples, we found more of the HIV-positive samples to also be seropositive for additional viruses, including HSV2, CMV, and Kaposi’s sarcoma–associated herpesvirus (KSHV) (false discovery rate q < 0.05, Fig. 2B). These results are consistent with prior studies indicating higher risk of these co-infections in HIV positive patients (2022). Patients with HIV may engage in activities that put them at higher risk for exposure to these viruses. Alternatively, these viruses may increase the risk of HIV infection. HIV infection may reduce the immune system’s ability to control reactivation of normally dormant resident viruses or to prevent opportunistic infections from taking hold and triggering a strong adaptive immune response.

Last, we compared evidence of viral exposure among samples taken from adult HIV-negative donors residing in countries (United States, Peru, Thailand, and South Africa) from four different continents. In general, donors outside the United States had higher frequencies of seropositivity (Fig. 2, C to E). For example, CMV antibodies were found in significantly higher frequencies in samples from Peru, Thailand, and South Africa. Other viruses, such as KSHV and HSV1, were detected more frequently in donors from Peru and South Africa but not Thailand. The observed detection frequency of different adenovirus species varies across populations. Adenovirus C seropositivity was found at similar frequencies in all regions, but adenovirus D seropositivity was generally higher outside the United States, whereas adenovirus B seropositivity was higher in Peru and South Africa but not in Thailand. The higher rates of virus exposure outside the United States could be due to differences in population density, cultural practices, sanitation, or genetic susceptibility. Additionally, influenza B seropositivity was more common in the United States compared with other countries, especially Peru and Thailand. The global incidence of influenza B is much lower than influenza A, but the standard influenza vaccination contains both influenza A and B strains, so the elevated frequency of individuals with seroreactivity may be due to higher rates of influenza vaccination in the United States. Other viruses, such as rhinovirus and EBV, were detected at very similar frequencies in all the geographic regions.

Analysis of viral epitope determinants

After analyzing responses on the whole-virus level, we focused our attention on the specific peptides targeted by these antibodies. We detected antibodies to a total of 8425 peptides in at least two samples, and 15,052 in at least one sample. Because of the presence of many related peptides in our library and the Immune Epitope Database (IEDB), for the following analysis we consider a peptide unique only if it does not contain a continuous seven-residue subsequence, the estimated size of a linear epitope, in common with another peptide. Analyzed as such, our VirScan database nearly doubles the 1559 unique human B cell epitopes from human viruses in the IEDB (23). The epitopes identified in our unbiased analysis demonstrate a significant overlap with those contained in the IEDB (P < 10−30, Fisher’s exact text, Fig. 1D). The amount of overlap is even greater for epitopes from viruses that commonly cause infection (Fig. 1E). We would likely have detected even more antigenic peptides in common with the IEDB if we had tested more samples from individuals infected with rare viruses. We next analyzed the amino acid composition of recurrently enriched peptides. Enriched peptides tend to have more proline and charged amino acids and fewer hydrophobic amino acids, which is consistent with a previous analysis of B cell epitopes in the IEDB (fig. S4) (24). This trend likely reflects enrichment for amino acids that are surface-exposed or can form stronger interactions with antibodies.

B cell responses target highly similar viral epitopes across individuals

We compared the profile of peptides recognized by the antibody response in different individuals. We found that, for a given protein, each sample generally only had strong responses against one to three immunodominant peptides (Fig. 3). Unexpectedly, we found that the vast majority of seropositive samples for a given virus recognized the same immunodominant peptides, suggesting that the antiviral B cell response is highly stereotyped across individuals. For example, in glycoprotein G from respiratory syncytial virus, there is only a single immunodominant peptide comprising positions 141 to 196 that is targeted by all samples with detectable antibodies to the protein, regardless of the country of origin (Fig. 3A).

Fig. 3 The human antivirome response recognizes a similar spectrum of peptides among infected individuals.

In the heat-map charts, each row is a peptide tiling across the indicated protein, and each column is a sample. The colored bar above each column, labeled at the top of the panels, indicates the country of origin for that sample. The samples shown are a subset of individuals with antibodies to at least one peptide from the protein. The color intensity of each cell corresponds to the –log10(P value) measure of significance of enrichment for a peptide in a sample (greater values indicates stronger antibody response). Data are shown for (A) human RSV attachment glycoprotein G (G), (B) human adenovirus C penton protein (L2), and (C) EBV nuclear antigen 1 (EBNA1). Data shown are the mean of two replicates.

For other antigens, we observed interpopulation serological differences. For example, two overlapping peptides from positions 309 to 364 and 337 to 392 of the penton base protein from adenovirus C frequently elicited antibody responses (Fig. 3B). However, donors from the United States and South Africa had much stronger responses to peptide 309-364 (P < 10−6, t test) relative to donors from Thailand and Peru. We observed that, for the EBNA1 protein from EBV, donors from all four countries frequently had strong responses to peptide 393-448 and occasionally to peptide 589-644. However, donors from Thailand and Peru had much stronger responses to peptide 57-112 (P < 10−6, t test) (Fig. 3C). These differences may reflect variation in the strains endemic in each region. In addition, polymorphism of major histocompatibility complex (MHC) class II alleles, immunoglobulin genes, and other modifiers that shape immune responses in each population likely play a role in defining the relative immunodominance of antigenic peptides.

To determine whether the humoral responses that target an immunodominant peptide are actually targeting precisely the same epitope, we constructed single-, double-, and triple-alanine scanning mutagenesis libraries for eight commonly recognized peptides. These were introduced into the same T7 bacteriophage display vector and subjected to the same immunoprecipitation and sequencing protocol using samples from the United States. Mutants that disrupt the epitope diminish antibody binding affinity and peptide enrichment. We found that, for all eight peptides tested, there was a single, largely contiguous subsequence in which mutations disrupted binding for the majority of samples. As expected, the triple mutants abolished antibody binding to a greater extent, and the enrichment patterns were similar among single, double, and triple mutants of the same peptide (Fig. 4 and figs. S5 to S11). For four of the eight peptides, a 9– to 15–amino acid region was critical for antibody recognition in >90% of samples (Fig. 4 and figs. S5 to S7). One other peptide had a region of similar size that was critical in about half of the samples (fig. S8). In another peptide, a single region was important for antibody recognition in the majority of the samples, but the extents of the critical region varied slightly for different samples, and occasionally there were donors that recognized a completely separate epitope (fig. S9). The remaining two peptides contained a single triple mutant that abolished binding in the majority of samples, but the critical region also extended further to different extents depending on the sample (figs. S10 and S11). Unexpectedly, in one of these peptides, in addition to the main region surrounding positions 13 and 14 that is critical for binding, a single Gly36→Ala36 (G36A) mutation disrupted binding in almost half of the samples, whereas none of the double- or triple-alanine mutants that also included the adjacent positions [Lys35 (L35) and G37] affected binding (fig. S11). It is possible that G36 plays a role in helping the peptide adopt an antigenic conformation and multiple mutants containing the adjacent Leu or Gly residues rescue this ability. We occasionally saw other examples of mutations that resulted in patterns of disrupted binding with no simple explanation, illustrating the complexity of antibody-antigen interaction.

Fig. 4 Recognition of common epitopes within an antigenic peptide from human adenovirus C penton protein (L2) across individuals.

Each row is a sample. Each column denotes the first mutated position for the (A) single-, (B) double-, and (C) triple-alanine mutant peptide starting with the N terminus on the left. Each double- and triple-alanine mutant contains two or three adjacent mutations, respectively, extending toward the C terminus from the colored cell. The color intensity of each cell indicates the enrichment of the mutant peptide relative to the wild-type. For double-mutants, the last position is blank. The same is true for the last two positions for triple mutants. Data shown are the mean of two replicates. Single-letter amino acid abbreviations are as follows: F, Phe; H, His; I, Ile; K, Lys; N, Asn; P, Pro; Q, Gln; R, Arg; T, Thr; V, Val; and Y, Tyr.

The discovery of recurring targeted epitopes led us to ask whether we could apply this knowledge to improve the sensitivity of viral detection with VirScan. We hypothesized that samples showing a strong response to a recurrently targeted “diagnostic” peptide, which we defined as a peptide enriched in at least 30% of known positive samples, are likely to be seropositive even if they do not meet our stringent cutoff requiring at least two non-overlapping enriched peptides. We tested how this modified criterion affected our sensitivity and specificity in detecting HIV and HCV and found that it reduced the number of false negatives without affecting the specificity of the assay (fig. S13). We next turned our attention to respiratory syncytial virus (RSV), a virus for which our detected seroprevalence was lower than reported epidemiological rates, suggesting imperfect sensitivity of our assay. We tested sera from 60 individuals for antibodies to RSV by ELISA and found that 95% were positive, above the reported sensitivity of the assay and consistent with near-universal exposure to this pathogen. Applying the modified criterion to these samples increased our rate of detection by VirScan from 63% to 97% (table S2). These data suggest that assigning more weight to recurrently targeted epitopes can enhance the sensitivity of VirScan and that the performance of the assay can be improved by screening known positives for a particular virus.

Discussion

We have developed VirScan, a technology for identifying viral exposure and B cell epitopes across the entire known human virome in a single, multiplex reaction using less than a drop of blood. VirScan uses DNA microarray synthesis and bacteriophage display to create a uniform, synthetic representation of peptide epitopes comprising the human virome. Immunoprecipitation and high-throughput DNA sequencing reveals the peptides recognized by antibodies in the sample. VirScan is easily automated in 96-well format to enable high-throughput sample processing. Barcoding of samples during PCR enables pooled analysis that can dramatically reduce the per-sample cost. The VirScan approach has several advantages for studying the effect of viruses on the host immune system. By detecting antibody responses, it can identify infectious agents that have been cleared after an effective host response. Current serological methods of antiviral antibody detection typically use the selection of a single optimized antigen in order to achieve high accuracy. In contrast, VirScan’s unique approach does not require such optimization in order to obtain similar performance. VirScan achieves sensitive detection by assaying each virus’s complete proteome to detect any antibodies directed to epitopes that can be captured in a 56-residue fragment and specificity by computationally eliminating cross-reactive antibodies. This unbiased approach identifies exposure to less well-studied viruses for which optimal serological antigens are not known and can be rapidly extended to include new viruses as they are discovered (25).

Although sensitive and selective, VirScan has a few limitations. First, it cannot detect epitopes that require post-translational modifications. Secondly, it cannot detect epitopes that involve discontinuous sequences on protein fragments greater than 56 residues. In principle, the latter can be overcome by using alternative technologies that allow for the display of full-length proteins, such as parallel analysis of translated open reading frames (PLATO) (26). Third, VirScan is likely to be less specific compared with certain nucleic acid tests that discern highly related virus strains. However, VirScan demonstrates excellent serological discrimination among similar virus species, such as HSV1 and HSV2, and can even distinguish the genotype of HCV 69% of the time. We envision that VirScan will become an important tool for first-pass unbiased serologic screening applications. Individual viruses or viral proteins uncovered in this way can subsequently be analyzed in further detail by using more focused assays, as we have demonstrated for a panel of immunodominant epitopes.

We have demonstrated that VirScan is a sensitive and specific assay for detecting exposure to viruses across the human virome. Because it can be performed in high-throughput and requires minimal sample and cost, VirScan enables rapid and cost-effective screening of large numbers of samples to identify population-level differences in virus exposure across the human virome. In this work, we analyzed over 106 million antibody-viral peptide interactions in a comprehensive study of pan-virus serology in a large, diverse population. In doing so we detected 84 different viral species in two or more individuals. This is likely to be an underestimate of the history of viral infection, because only low levels of circulating antibodies may remain from infections that were cleared in the distant past. In addition, an individual could be infected by multiple distinct strains of each viral species. We identified known and novel differences in virus exposure between groups differing in age, HIV status, and geographic location across four different continents. Our results are largely consistent with previous studies, validating the effectiveness of VirScan. For example, CMV antibodies were found in significantly higher frequencies in Peru, Thailand, and South Africa, whereas KSHV and HSV1 antibodies were detected more frequently in Peru and South Africa but not in Thailand (16, 2731). We also uncovered previously undocumented serological differences, such as an increased rate of antibodies against adenovirus B and RSV in HIV-positive individuals compared with HIV-negative individuals. These differences may provide insight into how HIV co-infection alters the balance between host immunity and resident viruses, as well as help to identify pathogens that may increase susceptibility to HIV and other heterologous infections. HIV infection may reduce the immune system’s ability to control reactivation of normally dormant resident viruses or to prevent opportunistic infections from taking hold and triggering a strong adaptive immune response. Beyond the epidemiological applications demonstrated here, VirScan could also be applied to identify viral exposures that correlate with disease or other phenotypes in virome-wide association studies.

Our results identified a large number of novel B cell epitopes, cumulatively nearly doubling the number of all previously identified viral epitopes. We have used our data to identify globally immunodominant and commonly recognized “public” epitopes. For most species of viruses, one or more peptides are individually recognized in over 70% to 95% of samples positive for that species (table S3). We identified a set of two peptides that together are recognized by >95% of all screened samples and a set of five peptides that together are recognized in >99% of screened samples. These public epitopes could be used to improve vaccine design by piggybacking on the existing antibody response against them. Fusing a public B cell epitope to a protein in a vaccine to which we hope to induce an immune response may increase a vaccine’s efficacy among a broad population by improving presentation of that protein and aiding affinity maturation. Preexisting B cells recognizing the public epitope can act as antigen presenting cells to process and present T cell epitopes of the fused vaccine target on MHC class I and II (32). Antibodies secreted by these B cells can also participate in immune complexes with the fused vaccine target, which are critical for follicular dendritic cells to prime class switching and affinity maturation of B cells recognizing other epitopes on the fused antigen (33). Last, we demonstrated that applying more weight to these public epitopes increases the sensitivity of VirScan without significantly affecting specificity, suggesting that this limited subset of peptides can serve as the basis for the next generation of our assay or for other novel diagnostics.

We also found that the precise epitopes recognized by the B cell response are highly similar among individuals across many viral proteins. One possible model for this notable similarity is that these regions possess properties favorable for antigenicity, such as accessibility. Another model is that the same or highly similar B cell receptor sequences that recognize these epitopes are commonly generated. Identical T cell receptor sequences (“public” clonotypes) have been found in multiple individuals and are thought to be the result of biases during the recombination process that favor certain amino acid sequences (34). V(D)J recombination of the immunoglobulin heavy- and light-chain loci is also heavily biased (35). Highly similar or even identical complementarity determining region 3 (CDR3) sequences have been observed in dengue virus–specific antibodies from different individuals (36). It is possible that, rather than being an exception for dengue-specific antibodies, this represents a general phenomenon: Inherent biases in V(D)J recombination generate the same or similar antibodies in multiple individuals that recognize highly similar epitopes. Slight differences in the antibody CDR3 sequence may subtly alter antibody-antigen interaction, leading to the slight variations observed in the extent of critical epitope regions. Sequencing of antigen-specific antibody genes will be required to investigate these possibilities. The same principle may also apply to T cell epitopes and their cognate T cell receptors.

VirScan is a method that enables human virome-wide exploration—at the epitope level—of immune responses in large numbers of individuals. We have demonstrated its effectiveness for determining viral exposure and characterizing viral B cell epitopes in high throughput and at high resolution. Our preliminary studies have revealed intriguing general properties of the human immune system, both at the individual and population scale. VirScan will be an important tool in uncovering the effect of host-virome interactions on human health and disease and could easily be expanded to include other human pathogens such as bacteria, fungi, and protozoa.

Materials and methods

Human donor samples

Specimens originating from human donors were collected after informed written consent was obtained and under a protocol approved by the local governing human research protection committee. Secondary use of all samples for the purposes of this work was exempted by the Brigham and Women’s Hospital Institutional Review Board (protocol number 2013P001337). Samples included donors residing in Thailand (n = 48), Peru (n = 48), South Africa (n = 48), and the Unites States, including HIV+ donors (n = 61) and HCV+ donors (n = 26). All serum and plasma samples were stored in aliquots at –80°C until use.

Design and cloning of viral peptide and scanning mutagenesis library sequences

For the virome peptide library, we first downloaded all protein sequences in the UniProt database from viruses with human host and collapsed on 90% sequence identity [www.uniprot.org/uniref/?query=uniprot:(host: “Human+[9606]”)+identity:0.9]. The clustering algorithm UniProt represents each group of protein sequences sharing at least 90% sequence similarity with a single representative sequence. Then, we created 56–amino acid (aa) peptide sequences tiling through all the proteins with 28-aa overlap. We reverse-translated these peptide sequences into DNA codons optimized for expression in Escherichia coli, making synonymous mutations when necessary to avoid restriction sites used in subsequent cloning steps (EcoRI and XhoI). Last, we added the adapter sequence AGGAATTCCGCTGCGT to the 5′ end and CAGGGAAGAGCTCGAA to the 3′ end to form the 200-nucleotide (nt) oligonucleotide sequences.

For the scanning mutagenesis library, we first took the sequences of the peptides to be mutagenized. For each peptide, we made all single-mutant, and consecutive double- and triple-mutant, sequences scanning through the whole peptide. Non-alanine amino acids were mutated to alanine, and alanines were mutated to glycine. We reverse-translated these peptide sequences into DNA codons, making synonymous mutations when necessary to avoid restriction sites used in subsequent cloning steps (EcoRI and XhoI). We also made synonymous mutations to ensure that the 50 nt at the 5′ end of peptide sequence is unique to allow unambiguous mapping of the sequencing results. Last, we added the adapter sequence AGGAATTCCGCTGCGT to the 5′ end and CAGGGAAGAGCTCGAA to the 3′ end to form the 200-nt oligonucleotide sequences.

The 200-nt oligonucleotide sequences were synthesized on a releasable DNA microarray. We PCR-amplified the DNA with the primers T7-PFA AATGATACGGCGGGAATTCCGCTGCGT) and T7-PRA (CAAGCAGAAGACTCGAGCTCTTCCCTG), digested the product with EcoRI and XhoI, and cloned the fragment into the EcoRI/SalI site of the T7FNS2 vector (8). The resulting library was packaged into T7 bacteriophage by using the T7 Select Packaging Kit (EMD Millipore) and amplified by using the manufacturer suggested protocol.

Phage immunoprecipitation and sequencing

We performed phage immunoprecipitation and sequencing by using a slightly modified version of previously published PhIP-Seq protocols (8, 10). First, we blocked each well of a 96-deep-well plate with 1 ml of 3% bovine serum albumin in TBST overnight on a rotator at 4°C. To each preblocked well, we added sera or plasma containing about 2 μg of immunoglobulin G (IgG) [quantified using a Human IgG ELISA Quantitation Set (Bethyl Laboratories)] and 1 ml of the bacteriophage library diluted to ~2 × 105 fold representation (2 × 1010 plaque-forming units for a library of 105 clones) in phage extraction buffer (20 mM Tris-HCl, pH 8.0, 100 mM NaCl, 6 mM MgSO4). We performed two technical replicates for each sample. We allowed the antibodies to bind the phage overnight on a rotator at 4°C. The next day, we added 20 μl each of magnetic protein A and protein G Dynabeads (Invitrogen) to each well and allowed immunoprecipitation to occur for 4 hours on a rotator at 4°C. With a 96-well magnetic stand, we then washed the beads three times with 400 μl of PhIP-Seq wash buffer (50 mM Tris-HCl, pH 7.5, 150 mM NaCl, 0.1% NP-40). After the final wash, we resuspended the beads in 40 μl of water and lysed the phage at 95°C for 10 m. We also lysed phage from the library before immunoprecipitation (“input”) and after immunoprecipitation with beads alone.

We prepared the DNA for multiplexed Illumina sequencing by using a slightly modified version of a previously published protocol (36). We performed two rounds of PCR amplification on the lysed phage material using hot start Q5 polymerase according to the manufacturer-suggested protocol (NEB). The first round of PCR used the primers IS7_HsORF5_2 (ACACTCTTTCCCTACACGACTCCAGTCAGGTGTGATGCTC) and IS8_HsORF3_2 (GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCCGAGCTTATCGTCGTCATCC). The second round of PCR used 1 μl of the first-round product and the primers IS4_HsORF5_2 (AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACTCCAGT) and a different unique indexing primer for each sample to be multiplexed for sequencing (CAAGCAGAAGACGGCATACGAGATxxxxxxxGTGACTGGAGTTCAGACGTGT, where “xxxxxxx” denotes a unique 7-nt indexing sequence). After the second round of PCR, we determined the DNA concentration of each sample by quantitative PCR and pooled equimolar amounts of all samples for gel extraction. After gel extraction, the pooled DNA was sequenced by the Harvard Medical School Biopolymers Facility using a 50–base pair read cycle on an Illumina HiSeq 2000 or 2500. We pooled up to 192 samples for sequencing on each lane and generally obtained ~100 million to 200 million reads per lane (500,000 to 1,000,000 reads per sample).

Informatics and statistical analysis

We performed the initial informatics and statistical analysis by using a slightly modified version of the previously published technique (8, 10). We first mapped the sequencing reads to the original library sequences by using Bowtie and counted the frequency of each clone in the “input” and each sample “output” (37). Because the majority of clones are not enriched, we used the observed distribution of output counts as a null distribution. We found that a zero-inflated generalized poisson distribution fits our output counts well. We used this null distribution to calculate a P value for the likelihood of enrichment for each clone. The probability mass function for the zero-inflated generalized poisson distribution is

Embedded Image

We used maximum likelihood estimation to regress the parameters π, θ, and λ to fit the distribution of counts after immunoprecipitation for all clones present at a particular frequency count in the input. We repeated this procedure for all of the observed input counts and found that θ and λ are well fit by linear regression and π by an exponential regression as a function of input count (fig. S1). Last, for each clone we used its input count and the regression results to determine the null distribution based on the zero-inflated generalized poisson model, which we used to calculate the –log10(P value) of obtaining the observed count.

To call hits, we determined the threshold for reproducibility between technical replicates based on a previously published method (10). Briefly, we made scatter plots of the log10 of the –log10 (P values) and used a sliding window of width 0.005 from 0 to 2 across the axis of one replicate. For all the clones that fell within each window, we calculated the median and median absolute deviation of the log10 of the –log10 (P values) in the other replicate and plotted it against the window location (fig. S2). We called the threshold for reproducibility the first window in which the median was greater than the median absolute deviation. We found that the distribution of the threshold –log10 (P value) was centered around a mean of ~2.3 (fig. S12). So we called a peptide a hit if the –log10 (P value) was at least 2.3 in both replicates. We eliminated the 593 hits that came up in at least 3 of the 22 immunoprecipitations with beads alone (negative control for nonspecific binding). We also filtered out any peptides that were not enriched in at least two of the samples.

To call virus exposures, we grouped peptides according to the virus the peptide is derived from. We grouped all peptides from individual viral strains for which we had complete proteomes. The sample was counted as positive for a species if it was positive for any strain from that species. For viral strains that had partial proteomes, we grouped them with other strains from the same species to form a complete set and bioinformatically eliminated homologous peptides (see next paragraph). We set a threshold number of hits per virus based on the size of the virus. We found that there is approximately a power-law relationship between size of the virus and the average number of hits per sample (fig. S3). In comparing results from VirScan to samples with known infection, we empirically determined that a threshold of three hits for HSV1 worked the best. We used this value and the slope of the best fit line to scale the threshold for other viruses. We also set a minimum threshold of at least two hits in order to avoid false positives from single spurious hits.

To bioinformatically remove cross-reactive antibodies, we first sorted the viruses by total number of hits in descending order. We then iterated through each virus in this order. For each virus, we iterated through each peptide hit. If the hit shared a subsequence of at least 7 aa with any hit previously observed in any of the viruses from that sample, that hit was considered to be from a cross-reactive antibody and would be ignored for that virus. Otherwise, the hit is considered to be specific, and the score for that virus is incremented by one. In this way, we summed only the peptide hits that do not share any linear epitopes. We compared the final score for each virus to the threshold for that virus to determine whether the sample is positive for exposure to that virus.

To identify differences between populations, we first used Fisher’s exact test to calculate a P value for the significance of association of virus exposure with one population versus another. Then, we constructed a null distribution of Fisher’s exact P values by randomly permuting the sample labels 1000 times and recalculating the Fisher’s exact P value for each virus. With use of this null distribution, we calculated the false discovery rate by dividing the number of permutation P values more extreme than the one observed by the total number of permutations.

IEDB epitope overlap analysis

We downloaded data for all continuous human B cell epitopes from IEDB and filtered out all nonviral epitopes (22). To avoid redundancy in these 4549 viral epitopes, we grouped together epitopes that are 100% identical or share a 7-aa subsequence, giving us 1559 nonredundant epitope groups. Of these groups, 1392 contain a member epitope that is also a subsequence of a peptide in the VirScan library. This represents the total number of epitopes we could detect by VirScan. To determine the number of epitopes we detected, we tallied the number of epitope groups with at least one member that is contained in a peptide that was enriched in one or two samples. Last, to determine the number of nonredundant new epitopes we detected, we grouped non-IEDB epitopes containing peptides that share a seven-residue subsequence and counted the number of these nonredundant peptide groups.

Scanning mutagenesis data analysis

First, we estimated the fractional abundance of each peptide by dividing the number of reads for that peptide by the total number of reads for the sample. Then, we divided the fractional abundance of each peptide after immunoprecipitation by the fractional abundance before immunoprecipitation to get the enrichment. To calculate relative enrichment, we divided enrichment of the mutated peptide by enrichment of the wild-type peptide. Because most of the single-mutant peptides had wild-type levels of enrichment, we averaged enrichment of the wild-type peptide enrichment with the middle two quartiles of enrichment of single-mutant peptides to get a better estimate of the wild-type peptide enrichment.

RSV and HSV1 and 2 serology

Serum from 44 donors was tested for HSV1 and HSV2 antibodies by using the HerpeSelect 1 and 2 Immunoblot IgG kit (Focus Diagnostics) according to manufacturer’s protocol. Serum from 60 donors was tested for RSV antibodies by using anti-RSV IgG Human ELISA Kit (ab108765) according to manufacturer’s protocol.

Supplementary Materials

www.sciencemag.org/content/348/6239/aaa0698/suppl/DC1

Supplementary Text

Figs. S1 to S14

Tables S1 to S3

References and Notes

  1. Acknowledgments: We thank E. Unger and S. Buranapraditkun for providing reagents, K. Wucherpfennig (Harvard) and H. Ploegh (MIT) for critical reading of the manuscript, and TWIST Bioscience for providing access to their advanced oligonucleotide synthesis technology. The cohort in Durban, South Africa, was funded by the NIH (R37AI067073) and the International AIDS Vaccine Initiative (UKZNRSA1001). T.N. received additional funding from the South African Research Chairs Initiative, the Victor Daitz Foundation, and an International Early Career Scientist Award from the Howard Hughes Medical Institute. R.T.C. was funded by grants NIH DA033541 and AI082630. C.B. and J.S. were supported by NIH N01-AI-30024 and N01-Al-15422, NIH–National Institute of Dental and Craniofacial Research R01 DE018925-04, the HIVACAT program, and CUTHIVAC 241904. K.R. is supported by TRF Senior Research Scholar, the Thailand Research Fund; the Chulalongkorn University Research Professor Program, Thailand; and NIH grant N01-A1-30024. G.J.X. and T.K. were supported by the NSF Graduate Research Fellowships Program. S.J.E. and B.W. are Investigators with the Howard Hughes Medical Institute. G.J.X., T.K., H.B.L., and S.J.E. are inventors on a patent application (application no. PCT/US14/70902) filed by Brigham and Women’s Hospital, Incorporated that covers the use of phage display libraries to detect antiviral antibodies.
View Abstract

Navigate This Article