Research Article

Viral epitope profiling of COVID-19 patients reveals cross-reactivity and correlates of severity

See allHide authors and affiliations

Science  29 Sep 2020:
DOI: 10.1126/science.abd4250


Understanding humoral responses to SARS-CoV-2 is critical for improving diagnostics, therapeutics, and vaccines. Deep serological profiling of 232 COVID-19 patients and 190 pre-COVID-19 era controls using VirScan revealed over 800 epitopes in the SARS-CoV-2 proteome, including 10 epitopes likely recognized by neutralizing antibodies. Pre-existing antibodies in controls recognized SARS-CoV-2 ORF1, while only COVID-19 patients primarily recognized spike and nucleoprotein. A machine learning model trained on VirScan data predicted SARS-CoV-2 exposure history with 99% sensitivity and 98% specificity; a rapid Luminex-based diagnostic was developed from the most discriminatory SARS-CoV-2 peptides. Individuals with more severe COVID-19 exhibited stronger and broader SARS-CoV-2 responses, weaker antibody responses to prior infections, and higher incidence of CMV and HSV-1, possibly influenced by demographic covariates. Among hospitalized patients, males make greater SARS-CoV-2 antibody responses than females.

Coronaviruses comprise a large family of enveloped, positive-sense single-stranded RNA viruses that cause diseases in birds and mammals (1). Among the strains infecting humans are the alpha- coronaviruses HCoV-229E and HCoV-NL63 and the beta-coronaviruses HCoV-OC43 and HCoV-HKU1, which cause common colds (Fig. 1A). Three additional beta-coronavirus species result in severe infections in humans: Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV), Middle East Respiratory Syndrome Coronavirus (MERS-CoV), and SARS-CoV-2, a novel coronavirus that emerged in late 2019 in Asia and quickly spread throughout the globe (2). As of late August 2020, SARS-CoV-2 had caused over 25 million confirmed infections and nearly 850,000 deaths (3).

Fig. 1 VirScan detects the humoral response to SARS-CoV-2 in sera from COVID-19 patients.

(A) Phylogeny tree of 50 coronavirus sequences (11) constructed using MEGA X (12, 13). The scale bar indicates the estimated number of base substitutions per site (14). Coronaviruses included in the updated VirScan library are indicated in red. (B) Schematic representation of the ORFs encoded by the SARS-CoV-2 genome (10, 15). (C) Overview of the VirScan procedure (58). The coronavirus oligonucleotide library includes 56-mer peptides tiling every 28 amino acids across the proteomes of 10 coronavirus strains, and 20-mer peptides tiling every 5 amino acids across the SARS-CoV-2 proteome. Oligonucleotides were cloned into a T7 bacteriophage display vector and packaged into phage particles displaying the encoded peptides on their surface. The phage library was mixed with sera containing antibodies that bind to their cognate epitopes on the phage surface; bound phage were isolated by immunoprecipitation (IP) with either anti-IgG- or anti-IgA-coated magnetic beads. Lastly, PCR amplification and Illumina sequencing from the DNA of the bound phage revealed the peptides targeted by the serum antibodies. (D) Detection of antibodies targeting coronavirus epitopes by VirScan. Heatmaps depict the humoral response from COVID-19 patients (n = 232) and pre-COVID-19 era control samples (n = 190). Each column represents a sample from a unique individual. The color intensity indicates the number of 56-mer peptides from the indicated coronaviruses significantly enriched by IgG antibodies in the serum sample. (E) Boxplots illustrate the number of peptide hits from the indicated coronaviruses in COVID-19 patients and pre-COVID-19 era controls. The box indicates the interquartile range, with a line at the median. The whiskers represent 1.5 times the interquartile range.

The clinical course of Coronavirus Disease 19 (COVID-19) – the disease resulting from SARS-CoV-2 infection – is notable for its extreme variability: while some individuals remain entirely asymptomatic, others experience fever, anosmia, diarrhea, severe respiratory distress, pneumonia, cardiac arrhythmia, blood clotting disorders, liver and kidney distress, enhanced cytokine release and, in a small percentage of cases, death (4). Understanding the factors influencing this spectrum of outcomes is therefore an intense area of research. Disease severity is correlated with advanced age, sex, ethnicity, socio-economic status, and co-morbidities including diabetes, cardiovascular disease, chronic lung disease, obesity and reduced immune function (4). Additional relevant factors likely include the inoculum of virus at infection, the individual’s genetic background and viral exposure history. The complex interplay of these elements also determines how individuals respond to therapies aimed at mitigating disease severity. Detailed knowledge of the immune response to SARS-CoV-2 could improve our understanding of diverse outcomes and inform the development of improved diagnostics, vaccines, and antibody-based therapies.

Here we describe a detailed analysis of the humoral response in COVID-19 patients using VirScan, a programmable phage-display immunoprecipitation and sequencing (PhIP-Seq) technology we developed previously to explore antiviral antibody responses across the human virome (58).


Development of a VirScan library targeting human coronaviruses

Our existing VirScan phage-display platform is based on an oligonucleotide library encoding 56-amino acid (56-mer) peptides tiling every 28 amino acids across the proteomes of all known pathogenic human viruses (~400 species and strains) plus many bacterial proteins (8). To interrogate the serological response to SARS-CoV-2 and other human coronaviruses (HCoVs), we supplemented this library with three additional sublibraries: Sublibrary 1 encodes a 56-mer peptide library tiling every 28 amino acids through each of the open reading frames (ORFs) expressed by the six HCoVs and three bat coronaviruses closely related to SARS-CoV-2; Sublibrary 2 encodes 20-mer peptides tiling every 5 amino acids across the SARS-CoV-2 proteome, enabling more precise localization of epitopes; and Sublibrary 3 encodes triple-alanine scanning mutants of the 56-mer peptides tiling across the SARS-CoV-2 proteome, enabling the mapping of epitope boundaries at amino acid resolution (Fig. 1, A to C, and table S1) (9, 10).

We used VirScan (Fig. 1C) to profile the antibody repertoires of 9 cohorts of individuals from multiple locations including Baltimore, MD, Boston, MA and Seattle, WA (tables S2 to S8). These cohorts comprised longitudinal samples from individuals enrolled in prospective studies of COVID-19 infection, cross-sectional samples from patients with active COVID-19 receiving treatment either in hospital or outpatient settings, and cross-sectional samples from convalescent individuals with past history of COVID-19. Our cohorts also included a diverse set of control sera collected prior to the outbreak of COVID-19. We profiled the targets of IgG and IgA antibodies separately: IgG and IgA are the most abundant isotypes in the blood, while IgA is the principal isotype secreted on mucosal surfaces including the respiratory tract. Altogether we analyzed approximately 550 samples in duplicate, in total assessing ~100 million potential antibody repertoire-peptide interactions.

Detection of SARS-CoV-2 seropositivity with VirScan

To measure immune responses to SARS-CoV-2, we compared VirScan profiles of serum samples from COVID-19 patients to those of controls obtained before the emergence of SARS-CoV-2 in 2019. These pre-COVID-19 era controls facilitate identification of (1) SARS-CoV-2 peptides encoding epitopes specific to COVID-19 patients and (2) SARS-CoV-2 peptides encoding epitopes that are cross-reactive with antibodies developed in response to the ubiquitous common-cold HCoVs. Sera from COVID-19 patients exhibited much more SARS-CoV-2 reactivity compared to pre-COVID-19 era controls (Fig. 1, D and E). Some cross-reactivity toward SARS-CoV-2 peptides was observed in the pre-COVID-19 era samples, but this was expected since nearly everyone has been exposed to HCoVs (16).

COVID-19 patient sera also showed significant levels of cross-reactivity with the other highly pathogenic HCoVs, SARS-CoV and MERS-CoV, although less was observed against the more distantly-related MERS-CoV. Extensive cross-reactivity was also observed against peptides derived from the three bat coronaviruses that share the greatest sequence identity with SARS-CoV-2 (Fig. 1, A, D, and E) (9). We know that these represent cross-reactivities as, given the low prevalence and circumscribed geographical location of SARS-CoV and MERS-CoV, none of the individuals in this study are likely to have encountered these viruses.

COVID-19 patient sera also exhibited a significantly higher level of reactivity to seasonal HCoV peptides compared to pre-COVID-19 era controls (Fig. 1, D and E). This could be due to the elicitation of novel antibodies that cross-react, or to an anamnestic response boosting B cell memory against HCoVs. The converse is not always true: many pre-COVID-19 era samples exhibit strong recognition of seasonal HCoV peptides but little or no recognition of SARS-CoV-2 peptides (Fig. 1D), although one caveat may be that the concentrations of antibodies against seasonal HCoVs may be below the threshold of detection in the pre-COVID-19 era samples.

Coronavirus proteins targeted by antibodies in COVID-19 patients

Analysis of SARS-CoV-2 proteins targeted by COVID-19 patient antibodies revealed that the primary responses to SARS-CoV-2 are reactive with peptides derived from spike (S) and nucleoprotein (N) (Fig. 2, A and B). These two proteins exhibit significant differential recognition by sera from COVID-19 patients versus pre-COVID-19 era controls, indicating that their recognition is a result of antibody responses to SARS-CoV-2. Third-most frequently recognized is the replicase polyprotein ORF1, but, unlike S and N, ORF1 is recognized to a similar extent by sera from COVID-19 patients and pre-COVID-19 era controls. This suggests that recognition of SARS-CoV-2 ORF1 is a result of cross-reactions from antibodies elicited by exposure to other pathogens, possibly HCoVs. Antibody responses to peptides from membrane glycoprotein (M), ORF3 and ORF9b were occasionally detected in COVID-19 patients.

Fig. 2 SARS-CoV-2 protein recognition in COVID-19 patient versus control sera.

(A) Antibodies targeting SARS-CoV-2 proteins. Each column represents a unique patient sample and each row represents a SARS-CoV-2 protein. The color intensity in each cell of the heatmap indicates the number of 56-mer peptides as in Fig. 1D. (B) Boxplots as in Fig. 1E illustrate the number of peptide hits from each of the indicated SARS-CoV-2 proteins detected in the IgG antibody response of COVID-19 patients and controls. (C) Longitudinal analysis of the antibody response to SARS-CoV-2 for 23 patients with confirmed COVID-19. Days on which a sample was available for analysis are indicated with a black line. Each point represents the maximum antibody fold-change score per SARS-CoV-2 peptide in each sample, colored by protein target.

We also analyzed longitudinal samples from 23 COVID-19 patients. Most patients displayed an antibody response to peptides derived from the S or N in the second week after symptom onset, with many displaying an antibody response by the end of the first week (Fig. 2C). The relative strength and onset of the antibody response to the S and N differed dramatically between individuals, and the initial immune response showed no preference for the S or N. The signal intensity of antibodies recognizing SARS-CoV-2 ORF1 epitopes did not increase over time, further suggesting that ORF1 antibodies likely represent a preexisting cross-reactive response.

Identification of immunogenic regions of SARS-CoV-2 proteins

To more precisely define the immunogenic regions of the SARS-CoV-2 proteome, we examined the specific 56-mer and 20-mer peptides that were detected by VirScan in COVID-19 patients compared to pre-COVID-19 era controls. An example IgG response from a single patient to the SARS-CoV-2 S and N is shown in Fig. 3A. We observed strong concordance between the viral regions enriched by the 56-mer and 20-mer fragments, demonstrating the robustness of VirScan. In many cases we observed recognition of overlapping 56-mer peptides, indicating an epitope in the common region.

Fig. 3 IgG and IgA recognition of immunodominant regions in SARS-CoV-2 spike and nucleoprotein.

(A) Example response to S and N proteins from a single COVID-19 patient. The y-axis indicates the strength of enrichment (Z-Score, see methods) of each 56-mer (blue) or 20-mer (red) peptide recognized by the IgG antibodies present in the serum sample. (B) Common responses to S and N proteins across COVID-19 patients. The y-axis indicates the fraction of COVID-19 patient samples (n = 348) enriching each 20-mer peptide with either IgG (top panel) or IgA (bottom panel) antibodies. (C) Comparison of the IgA and IgG responses in individual COVID-19 patients. Each set of two rows represent the IgG and IgA antibody specificities of a single patient, with ten representative COVID-19 patients displayed. Numeric values indicate the degree of enrichment (Z-Score) of each peptide tiling across the S and N proteins.

Next, we compared the protein regions recognized by IgG and IgA across COVID-19 patients (Fig. 3B). We identified four regions-each in the S and N that are recurrently targeted by antibodies from >15% of COVID-19 patients, with additional regions recognized less frequently. Overall, IgG and IgA recognize the same protein regions with similar frequencies across the population. However, when IgG and IgA responses were compared within individuals, we observed considerable divergence (Fig. 3C): many epitopes were recognized by only IgG, only IgA, or both IgG and IgA within an individual patient. Together, these data suggest that patients raise distinct IgG and IgA antibody responses to SARS-CoV-2, but the regions targeted are largely shared at a population level.

Machine learning guides the design of a Luminex assay for rapid COVID-19 diagnosis

To predict SARS-CoV-2 exposure history from VirScan data, we developed a gradient-boosting algorithm (XGBoost) that integrated both IgG and IgA data and predicted current or past COVID-19 disease with 99.1% sensitivity and 98.4% specificity (Fig. 4, A and B). Interrogating the model using Shapley Additive exPlanations (SHAP), a method to compute the contribution of each feature of the data to the predictive model (17), we identified peptides from SARS-CoV-2 S and N plus homologous peptides from SARS-CoV and BatCoV-HKU-3 and BatCoV-279 that were highly predictive of SARS-CoV-2 exposure (Fig. 4, C and D).

Fig. 4 Machine learning models trained on VirScan data discriminate COVID-19-positive and negative individuals with very high sensitivity and specificity.

(A) Gradient boosting machine learning models were trained on IgG and IgA VirScan data from 232 COVID-19 patients and 190 pre-COVID-19 era controls. Separate models were created for the IgG and IgA data, and then a third model (Ensemble) was trained to combine the outputs of the first two. (B) The plot shows the predicted probability that each sample is positive for COVID-19; true COVID-19 positive samples are shown as red dots, and true COVID-19 negative samples are shown as gray dots. The corresponding confusion matrix for each model is shown on the right. (C and D) SHAP analysis to identify the most discriminatory peptides informing the models in (B). The chart in (C) summarizes the relative importance of the most discriminatory peptides increased among COVID-19 patients identified by the IgG and IgA gradient boosting models. The enrichment (log2(Fold Change) of the normalized read counts in the sample IP versus in no-serum control reactions) of each of these peptides across all samples is shown in (D). (E) Luminex assay using highly discriminatory SARS-CoV-2 peptides identifies IgG antibody responses in COVID-19 patients but rarely in pre-COVID-19 era controls. Each column represents a COVID-19 individual (n = 163) or pre-COVID-19 era control (n = 165); each row is a SARS-CoV-2-specific peptide. Peptides containing public epitopes from Rhinovirus A, EBV, and HIV-1 served as positive and negative controls. The color-scale indicates the median fluorescent intensity (MFI) signals after background subtraction. (F) Receiver operating characteristic (ROC) curve for the Luminex assay predicting SARS-CoV-2 infection history, evaluated by 10x cross-validation. The light red lines indicate the ROC curve for each test set, the dark line indicates the average, the gray region reflects ± 1std. dev. The average area under the curve (AUC) is shown. (G) Left, the predicted probability that each sample is positive for COVID-19 by the Luminex model as in (B). The dashed line indicates the model threshold. Right, confusion matrix for the Luminex model.

We leveraged these insights to develop a simple, rapid Luminex-based diagnostic for COVID-19. We chose 12 SARS-CoV-2 peptides predicted by VirScan data and the machine-learning model to be highly indicative of SARS-CoV-2 exposure history (table S9). These SARS-CoV-2 peptides, plus two positive control peptides from Rhinovirus A and Epstein-Barr virus (EBV) that are recognized in over 80% of seropositive individuals by VirScan (7), and a negative control peptide from HIV-1, were coupled to Luminex beads (18). We tested 163 COVID-19 patient samples and 165 pre-COVID-19 era controls for IgG reactivity to the Luminex panel. We detected clear responses to SARS-CoV-2 peptides in COVID-19 patient samples but rarely in the pre-COVID-19 era controls (Fig. 4E). Using the Luminex data, we developed a logistic regression model that predicted COVID-19 infection history with 89.6% sensitivity and 95.2% specificity (AUC = 0.97) (Fig. 4, F and G). A subset of the COVID-19 positive samples (n = 107) were also examined using an in-house ELISA using three SARS-CoV-2 antigens: N, S, and the S receptor-binding domain (RBD). Considering a sample positive if it scored above the 99% specificity threshold on any one of the three ELISA antigens, we determined that the sensitivity of the Luminex assay for this subset (88.8%) was similar to that of the ELISA (90.7%) (fig. S1). Among samples run on all three assays, VirScan significantly out-performed both the Luminex and ELISAs (fig. S1, A and C). Remarkably, our optimal model integrated only 3 SARS-CoV-2 peptides which were also the most discriminatory 20-mers in the VirScan data: N 386-406, S 810-830, and S 1146-1166. IgG responses in the COVID-19 patients were highly correlated between the Luminex and VirScan assays, providing orthogonal validation of the VirScan data and supporting the prevalence of SARS-CoV-2-induced humoral responses to these regions of the S and N (fig. S1D).

Differential antibody responses to common viruses in hospitalized versus non-hospitalized COVID-19 patients

We next considered whether differences in the antibody response to SARS-CoV-2 or to other viruses might be associated with the severity of COVID-19 disease. We grouped the COVID-19 patients into two subsets: those who required hospitalization (n = 101), and those who did not (n = 131). We compared the responses to peptides derived from the SARS-CoV-2 S and N proteins between the hospitalized (H) and non-hospitalized (NH) groups, and found that the H group exhibited stronger and broader antibody responses to S and N peptides that might be due to epitope spreading (Fig. 5A). We then analyzed 32 NH COVID-19 samples, 32 H COVID-19 samples, and 32 pre-COVID-19 era negative controls with the Luminex assay, and similarly observed that the H group had stronger and broader antibody responses to SARS-CoV-2-specific peptides compared with the NH group (Fig. 5B).

Fig. 5 Correlates of COVID-19 disease severity.

(A) Differential recognition of peptides from SARS-CoV-2 nucleoprotein and spike between COVID-19 non-hospitalized patients (n = 131), hospitalized patients (n = 101), and pre-COVID-19 era negative controls. Each column represents a unique patient and each row represents a peptide tile; tiles are labeled by amino acid start and end position and may be duplicated for intervals for which amino acid sequence diversity are represented in the library. Color intensity represents the degree of enrichment (Z-score) of each peptide in IgG samples. Peptides exhibiting a significant increase in recognition by sera from hospitalized versus non-hospitalized patients are indicated with an asterisk, Kolmogorov-Smirnov test, Bonferroni-corrected p-value thresholds of 0.001 for S and 0.0025 for N). (B) SARS-CoV-2 Luminex assay identifies stronger IgG responses in hospitalized COVID-19 patients than in non-hospitalized COVID-19 patients. Each column represents either a non-hospitalized (n = 32) or hospitalized (n = 32) COVID-19+ patient or a pre-COVID-19 era control (n = 32); each row represents a peptide in the Luminex assay. The color-scale indicates the median fluorescent intensity (MFI) signals after background subtraction. (C) All peptides in the VirScan library are plotted by the fraction of non-hospitalized (x-axis) and hospitalized COVID-19 patient IgG samples (y-axis) in which they are recognized. A Z-score threshold of 3.5 was used as an enrichment cutoff to count a peptide as positive. Peptides that exhibit statistically significant associations with hospitalization status are colored by virus of origin (Fisher’s exact test, Bonferroni-corrected p-value threshold of 8.52 × 10−7). All peptides that do not exhibit significant association with hospitalization status are shown in gray. The significant peptides shown are collapsed for high sequence identity. (D) All peptides derived from CMV present in the VirScan library are plotted by median Z-score for the non-hospitalized (x-axis) and hospitalized COVID-19 patients (y-axis). The line y = x is shown as a dotted line. (E) Reduced recognition of mild-associated antigens with age. The histogram shows the relative recognition in healthy donors at age 58 compared to age 42 for each unique antigen that was more strongly recognized by antibodies in non-hospitalized than hospitalized COVID-19 patients.

VirScan also offers the opportunity to examine the history of previous viral infections and to determine correlates of COVID-19 outcomes. For example, prior viral exposure could provide some protection if cross-reactive neutralizing antibodies or T cell responses are stimulated upon exposure to SARS-CoV-2 (19, 20). Alternatively, cross-reactive antibodies to viral surface proteins could increase the risk of severe disease due to antibody-dependent enhancement (ADE), as has been observed for SARS-CoV (21, 22). Furthermore, exposure to certain viruses could impact the response to SARS-CoV-2 by altering the immune system. To examine these possibilities, we analyzed the virome-wide VirScan data and found that overall, the NH patients exhibited greater responses to individual peptides from common viruses such as Rhinoviruses, Influenza viruses, and Enteroviruses, while the H patients displayed greater responses to peptides from cytomegalovirus (CMV) and Herpes Simplex Virus 1 (HSV-1) (Fig. 5C). These observations may be influenced by demographic differences in the NH and H cohorts as described below.

We sought to understand whether the differential reactivity to CMV and HSV-1 between the H and NH patients was due to differences in the strength of antibody responses or the prevalence of infection (these viruses are common, but not ubiquitous as are Rhinoviruses, Enteroviruses and Influenza viruses). Using VirScan data, we found that the H group had a higher incidence of both CMV and HSV-1 infection: 82.2% (83/101) of the H group were positive for CMV versus 37.4% (49/131) of the NH group, while 92.1% (93/101) of the H group were positive for HSV-1 versus 45.8% (60/131) of the NH group. To examine the relative strength of the antibody responses, we considered only CMV or HSV-1 seropositive individuals from the NH and H groups: the antibody response to both CMV (Fig. 5D) and HSV-1 (fig. S2) was stronger among the NH individuals. Thus, the differing seroprevalence of CMV and HSV-1 in the NH versus H groups likely explain the results shown in Fig. 5C. We conclude that antibody responses to nearly all viruses, except SARS-CoV-2, were weaker in the H patients compared to the NH patients.

These striking differences led us to examine potential demographic covariates between the NH and H groups. We found that age, sex, and race were all significantly associated with COVID-19 severity (fig. S3), as has been reported (23, 24). Higher age, male sex, and non-white ethnicity groups were significantly overrepresented in the H group compared with the NH group (fig. S3 and table S3). Furthermore, hospitalized males exhibited stronger responses to N than hospitalized females while non-hospitalized males and females did not exhibit differential responses to any SARS-CoV-2 proteins (fig. S3E). Advanced age is a dominant risk factor for severe COVID-19 and is correlated with reduced immune function (25). In light of the age difference between the H (median age 58) and NH (median age 42) patients in our cohort, we reasoned that the antigens recognized more strongly in the NH group might reflect more general age-associated changes in humoral immunity. To test this hypothesis, we examined VirScan data for a cohort of 648 healthy, pre-pandemic donors. We characterized the recognition of each NH-associated peptide in subsets of the healthy donors representing different age groups and observed a general decline in recognition with age, including a median 19% reduction in recognition from age 42 to 58 (Fig. 5E). These data suggest that age-related changes to the immune system may in part explain the observation of weaker antibody responses to most viruses in the H group. While correlative and potentially influenced by other demographic differences between the NH and H cohorts, the broad age-related diminution in immune system activity we observed could be an important aspect of the increased severity in the H group.

Cross reactivity of SARS-CoV-2 epitopes

We returned to the question of epitope cross-reactivity, this time examining antibody responses to the triple-alanine scanning library. For each 56-mer peptide spanning the SARS-CoV-2 proteome, this library contained a collection of scanning mutants: the first mutant peptide encoded 3 alanines instead of the first 3 residues, the second mutant peptide contained the 3 alanines moved one residue downstream, and so on (fig. S4). Antibodies that recognize the wild-type 56-mer peptide will not recognize mutant versions of the peptide containing alanine substitutions at critical residues; thus, the location of the linear epitope can be deduced by looking for “antibody footprints,” indicated by stretches of alanine mutants missing from the pool of immunoprecipitated phage. The first and last triple-alanine mutations to interfere with binding are expected to start two amino acids before the first residue essential for the antibody binding, and end two amino acids after the last.

With respect to cross-reactivity, IgG from COVID-19 patients recognized more 56-mer peptides from the common HCoVs HKU1, OC43, 299E, and NL63, than IgG from pre-COVID-19 era controls. This difference is primarily driven by a dramatic increase in recognition of S peptides from the HCoVs and is likely a result of cross-reactivity of antibodies developed during SARS-CoV-2 infection (Fig. 6A).

Fig. 6 Cross-reactive epitopes among human coronaviruses.

(A) Bar graphs depicting the average number of 56-mer peptides derived from SARS-CoV-2, SARS-CoV, and each of the 4 common HCoVs that are significantly enriched per sample (IgG IP). Error bars represent the 95% confidence interval. (B) Analysis of cross-reactive epitopes for HCoV S proteins. The upper plot shows the similarity of each region of the SARS-CoV-2 S protein to the corresponding region in the four common HCoVs (see Methods). The frequency of peptide recognition is shown in the bottom two plots. Peptides from each virus are indicated by the colored lines: the length of each line along the x-axis indicates the corresponding region of the SARS-CoV-2 S protein covered by each peptide according to a pairwise protein alignment, and the height of each line corresponds to the fraction of samples in which that peptide scored in either the IgG or IgA IPs. The epitopes mapped in (C) and (D) are highlighted in pink. (C and D) Mapping of recurrently recognized SARS-CoV-2 S IgG (C) and IgA (D) epitopes by triple-alanine scanning mutagenesis. Each plot represents a 20 amino acid region of the SARS-CoV-2 S protein within the regions highlighted in (B). Each column of the heatmap corresponds to an amino acid position, and each row represents a sample. The color intensity indicates the average enrichment of 56-mer peptides containing an alanine mutation at that site relative to the median enrichment of all mutants of that 56-mer in each sample. COVID-19 patients with a minimum relative enrichment below 0.6 in the specified window are shown. The amino acid sequence across each region of SARS-CoV-2 S, as well as an alignment of the corresponding sequences in the common HCoVs, is shown below each heatmap.

We mapped the position of all HCoV S peptides that display increased recognition in COVID-19 patient samples onto the SARS-CoV-2 S protein. This revealed four immunodominant regions recognized by >25% of COVID-19 patients (Fig. 6B). Comparing the frequency of peptide recognition between the COVID-19 patients and pre-COVID-19 era controls showed that two of these immunogenic regions in SARS-CoV-2 S are likely strongly cross-reactive with homologous regions of other HCoVs, as the frequency of recognition of the HCoV peptides at these regions rises in COVID-19 patients. For instance, peptides from all four seasonal HCoVs that span the region corresponding to residues 811-830 of SARS-CoV-2 S are frequently recognized by COVID-19 patients but much less so by pre-COVID-19 era controls, suggesting that this recognition is a result of antibodies developed or boosted in response to SARS-CoV-2 infection. Using triple-alanine scanning mutagenesis (fig. S4), we mapped the antibody footprints in this region to an 11 amino acid stretch that is highly conserved between SARS-CoV-2 and all four common HCoVs, which explains the cross-reactivity (Fig. 6, C and D). Similarly, both SARS-CoV-2 and HCoV-OC43 peptides corresponding to S 1144-1163 are recognized much more frequently by COVID-19 patients than pre-COVID-19 era controls, and triple-alanine-scanning mutagenesis confirmed that the antibody footprints are located within a 10 amino acid stretch conserved between SARS-CoV-2 and HCoV-OC43 but not the other HCoVs. In contrast, the epitope sequences around S 551-570 and S 766-785 are not conserved between SARS-CoV-2 and the seasonal HCoVs, and indeed these epitopes are not cross-reactive. One HCoV-HKU1 peptide spanning S 551-570 scores in both COVID-19 patients and pre-COVID-19 era control samples; however, its frequency of detection is not further boosted in COVID-19 patients, suggesting the antibody responsible for boosting the SARS-CoV-2 S 551-570 peptide is distinct from the antibody recognizing the HCoV-HKU1 peptide, consistent with differences in sequence at this location (Fig. 6C).

Interestingly, we detect antibody responses to SARS-CoV-2 S 811-830 in 79% of COVID-19 patients, but we also see responses to the corresponding peptides from OC43 and 229E in ~20% of the pre-COVID-19 era controls and these responses seem to cross-react with SARS-CoV-2. It is possible that some patients have pre-existing antibodies to this region that cross-react and are expanded during SARS-CoV-2 infection. This might explain the remarkably high prevalence of antibody responses to this epitope, and suggests that anamnestic responses to seasonal coronaviruses may influence the antibody response to SARS-CoV-2. Interestingly, this region is located directly after the predicted S2’ cleavage site for SARS-CoV-2 and overlaps the fusion peptide. A recent study showed that adding an excess of the fusion peptide reduced neutralization, implying that an antibody that binds the fusion peptide might contribute to neutralization by interfering with membrane fusion (26, 27). Given the frequency of seroreactivity toward this epitope in COVID-19 patients, it will be important to determine if the antibodies recognizing this epitope are neutralizing in future studies.

Epitope mapping reveals hundreds of distinct SARS-CoV-2 epitopes, including likely epitopes of neutralizing antibodies

We also used the triple-alanine scanning mutagenesis library to map antibody footprints across the entire SARS-CoV-2 proteome (Fig. 7, fig. S5, and tables S10 to S19). We used a Hidden Markov Model (HMM) to analyze the mutagenesis data and detect antibody footprints. By integrating signals across stretches of consecutive residues, the HMM successfully distinguished antibody footprints from random noise and thus detected regions containing epitopes with improved sensitivity and with far greater resolution than was possible with the 56-mer peptide data alone (see Methods) (figs. S6 and S7 and tables S15 to S18). We performed hierarchical clustering on the antibody footprints identified by the HMM to determine the number of distinct epitopes (here defined as unique antibody footprints) that we detected across the SARS-CoV-2 proteome (fig. S8 and table S10). Overall, we identified 3103 antibody footprints across 169 COVID-19 patient samples and mapped 823 distinct epitopes (table S19). These epitopes are not evenly distributed along the proteins but rather fall into 303 epitope clusters, each of which contains multiple overlapping epitopes (fig. S8). For example, across the 89 IgA samples that recognized the epitope cluster from S 1135–1165, we identified 9 epitopes that overlap but have distinct triple-alanine scanning profiles that indicate unique antibody-epitope interactions (fig. S8C). Individual epitopes are recognized at a wide range of frequencies in the COVID-19 patients. The average COVID-19 patient sample contained antibodies to ~18 distinct linear epitopes (fig. S9), although this is likely an underestimate of the total epitope count per person as VirScan does not efficiently detect antibodies recognizing discontinuous (conformational) epitopes (although such antibodies may retain some affinity to linear peptides comprising the epitope).

Fig. 7 High-resolution mapping of SARS-CoV-2 epitopes.

(A) Mapping of antibody epitopes in the SARS-CoV-2 S protein using triple-alanine scanning mutagenesis. Each column of the heatmap corresponds to an amino acid position, and each row represents a COVID-19 patient. The color intensity indicates the average enrichment of three triple-alanine mutant 56-mer peptides containing an alanine mutation at that site, relative to the median enrichment of all mutants of that 56-mer. The upper panel shows the fraction of samples that recognized each region of S as mapped by the IgA 56mer (gray) versus the IgA and IgG triple-alanine scanning data (blue and red, respectively) (B and C) Detailed plot of triple-alanine scanning mutagenesis in (A) to show the epitope complexity within two regions: S 766-835 (B) and S 406-520 (C). The amino acid sequence at each position is shown on the x-axis. In (B), the fusion peptide and predicted S2’ cleavage site are indicated below the sequence (26, 27); in (C) the unique epitopes identified by the HMM and clustering algorithms are depicted by colored bars. The black dots correspond to ACE2 contact residues in the crystal structure of the RBD receptor complex (6M0J) (28). Epitopes in regions E9 and E10 were not picked up by the HMM classifier because of their short length; however, these regions score in multiple samples and correspond to accessible regions in the crystal structure, suggesting they may represent true epitopes. (D) Cryo-electron microscopy (cryo-EM) structure of the partially open SARS-CoV-2 spike trimer (6VSB) (29) highlighting the locations of the antibody epitopes mapped by triple-alanine scanning mutagenesis. The three spike monomers are depicted in tan, green and gray for the two closed and single open-conformation monomers, respectively. The RBD of the open monomer is show in light gray. Three of the RBD epitopes from (C) that overlap ACE2 contact residues and are resolved in the cryo-EM structure (E2, E5, E6) are highlighted in red, purple and blue respectively. The locations of additional public epitopes that were mapped in at least 10 samples across the IgG and IgA experiments are depicted in yellow, pink and cyan. (E to H) The locations of four of the epitope footprints mapped in (C) are shown in relation to the RBD-ACE2 binding interface. The upper image for each figure shows the structure (6M0J) of SARS-2-CoV-2 RBD (green) in complex with ACE2 (cyan). The E2, E5, E6 and E8 epitopes are highlighted in red, purple, blue, and orange, respectively. Below each image is the sequence alignment of the regions of the SARS-CoV-2 and the SARS-CoV S proteins encompassing each epitope. The colored bars indicate each epitope, the black dots indicate residues that directly interact with ACE2 in the crystal structure, and the shaded residues indicate conservation between SARS-CoV-2 and SARS-CoV.

The SARS-CoV-2 epitope landscape includes regions recognized by a large fraction of COVID-19 patients (public epitopes) and regions recognized by one or a few individuals (private epitopes). For example, we mapped 6 distinct epitopes in the region spanning N 151-175 (fig. S5C). One of these epitopes was recognized by nearly one-third of the COVID-19 patients, while the rest were detected by less than 2% of the COVID-19 patients. Similarly, the region spanning S 766-835 contained over 20 distinct epitopes, including the highly public epitope cluster near S815 and the public epitope cluster near S770 that is preferentially recognized by IgA (Fig. 7B). This epitope cluster was identified by 43% of COVID-19 patient IgA samples but only 4% of COVID-19 patient IgG samples. In another example, we detected at least 20 distinct epitopes within a stretch of just 46 residues in N 363-408, 10 of which were specific to IgA and 2 of which were specific to IgG (fig. S5D). The positions of several public epitope clusters are shown mapped onto the structure of SARS-CoV-2 in fig. S10.

We also mapped at least 12 distinct epitopes in the SARS-CoV-2 RBD, including 5 in the receptor binding motif (RBM) that binds ACE2, the human receptor for SARS-CoV-2, and 5 that are directly adjacent to ACE2 binding sites (Fig. 7, C and D, and fig. S6A). For example, S 414-427 (labeled E2 in Fig. 7) spans residue K417 in the RBD; K417 makes a direct contact with the human ACE2 protein in structures of ACE2 bound to the RBD. Thus, antibodies recognizing E2 are likely to block ACE2 binding and have neutralizing activity (Fig. 7E). Epitope S 454-463 (labeled E6 in Fig. 7) also overlaps ACE2 contact residues and partially overlaps the binding site of the neutralizing antibody CB6, suggesting that antibodies recognizing this epitope also have neutralizing potential (2830) (Fig. 7G). Several other epitopes also span or are adjacent to critical residues contacted by ACE2 (Fig. 7, F and H). Thus, our data reveal some of the likely binding sites for neutralizing antibodies.


In this study we have provided an in-depth serological description of antibody responses to SARS-CoV-2, using VirScan to analyze sera from COVID-19 patients and pre-COVID-19 era controls. We mapped the landscape of linear epitopes in the SARS-CoV-2 proteome, characterized their specificity or cross-reactivity, and investigated serological and viral exposure history correlates of COVID-19 severity.

Identification of SARS-CoV-2 epitopes recognized by COVID-19 patients

VirScan detected robust antibody responses to SARS-CoV-2 in COVID-19 patients. These were primarily directed against the S and N proteins, with significant cross-reactivity to SARS-CoV and milder cross-reactivity with the distantly related MERS-CoV and seasonal HCoVs. Cross-reactive responses to SARS-CoV-2 ORF1 were frequently detected in pre-COVID-19 era controls, suggesting that these result from antibodies induced by other pathogens.

At the population level, most SARS-CoV-2 epitopes were recognized by both IgA and IgG antibodies. We found individuals often exhibited a “checkerboard” pattern, utilizing either IgG or IgA antibodies against a given epitope. This suggests that a given IgM clone often evolves into either an IgG or an IgA antibody, potentially influenced by local signals, and that, within an individual, there may often be a largely monoclonal response to a given epitope.

Examination of the humoral response to SARS-CoV-2 at the epitope level using the triple-alanine scanning mutagenesis library revealed 145 epitopes in S, 116 in N, and 562 across the remainder of the SARS-CoV-2 proteome (table S10). Most S epitopes were located on the surface of the protein or within unstructured regions that often abut, but seldom overlap, glycosylation sites (fig. S11). These epitopes ranged from private to highly public, with one public epitope cluster being recognized by 79% of COVID-19 patients. Triple-alanine scanning mutagenesis showed highly conserved antibody footprints for some epitope clusters and diverse antibody footprints for others, indicating varying levels of conservation at the antibody-epitope interface among individuals (fig. S8). Peptides containing public epitopes could be used to isolate and clone antibodies from B-cells bearing antigen-specific BCRs. If these antibodies are found to lack protective effects or have deleterious effects, these regions could be mutated in future vaccines to divert the immunological response to other regions of S that might have more protective effects. Epitopes also varied in cross-reactivity, which can be explained by the presence or absence of sequence conservation between seasonal HCoVs and SARS-CoV-2 at these regions. Antibodies against several conserved epitopes in HCoVs seemed to be anamnestically boosted in COVID-19 patients. Altogether these data help explain why many serological assays for SARS-CoV-2 produce false positives, and should be taken as a cautionary note for those trying to develop such assays.

Development of SARS-CoV-2 signature peptides for detecting seroconversion by Luminex

Using machine learning models trained on VirScan data, we developed a classifier that predicts SARS-CoV-2 exposure history with 99% sensitivity and 98% specificity. We identified peptides frequently and specifically recognized by COVID-19 patients and used these to create a Luminex assay that predicted SARS-CoV-2 exposure with 90% sensitivity and 95% specificity. Remarkably, the Luminex assay only required three peptides to obtain performance comparable to full antigen ELISAs and could be further optimized in the future. This highlights the utility of VirScan-based serological profiling in the development of rapid and efficient diagnostic assays based on public epitopes.

Correlates of severity in COVID-19 patients

An important goal is to uncover serological correlates of COVID-19 severity. To this end, we compared cohorts of COVID-19 patients who had (H) or had not (NH) required hospitalization. Using both VirScan and the COVID-19 Luminex assay, we noticed a striking and somewhat counterintuitive increase in recognition of peptides derived from the SARS-CoV-2 S and N proteins among the H group, with more extensive epitope spreading. Whether this is a cause or a consequence of severe disease is not clear. Individuals whose innate and adaptive immune responses are not able to quell the infection early may experience a higher viral antigen load, a prolonged period of antibody evolution and epitope spreading. Consequently, these patients might develop stronger and broader antibody responses to SARS-CoV-2 and could be more likely to have hyperinflammatory reactions such as cytokine storms that increase the probability of hospitalization. We noticed that hospitalized males had stronger antibody responses to SARS-CoV-2 than hospitalized females. This may indicate that males in this group are less able to control the virus soon after infection and is consistent with reported differences in disease outcomes for males and females (23, 24).

VirScan allowed us to examine viral exposure history, and this revealed two striking correlations. First, the seroprevalence of CMV and HSV-1 was much greater in the H group compared to the NH group. The demographic differences in our relatively small cohort of H versus NH COVID-19 patients make it impossible for us to determine with certainty if CMV or HSV-1 infection impacts disease outcome or is simply associated with other covariates such as age, race and socioeconomic status. While CMV prevalence does slightly increase with age after 40, its prevalence also differs greatly among ethnic and socioeconomic groups (31, 32). CMV is a chronic herpes virus that is known to have a profound impact on the immune system: it can skew the naïve T-cell repertoire (33), decrease T and B cell function (34), and is associated with higher systemic levels of inflammatory mediators (35) and increased mortality of people over 65 years of age (36). The effects of CMV on the immune system could potentially impact COVID-19 outcomes.

The second striking correlation we observed was a significant decrease in the levels of antibodies targeting ubiquitous viruses such as Rhinoviruses, Enteroviruses, and Influenza viruses, in COVID-19 H patients compared with NH patients. When we examined only the CMV+ or HSV-1+ individuals in the two groups, we found that the strength of the antibody response to CMV and HSV-1 peptides was also reduced in the H group. We examined the effects of age on viral antibody levels in a pre-COVID-19 era cohort and found a diminution with age in the antibody response against viral peptides differentially recognized between the H and NH groups, consistent with previous studies on the effects of aging on the immune system (25). This inferred reduced immunity during aging could impact the severity of COVID-19 outcomes.

In correlative analyses such as these, it is difficult to draw strong conclusions about causality given the demographic differences in the NH versus H groups. The NH group is younger and has a higher percentage of Caucasians and females (average age 42, 66% female) compared to the H group (average age 58, 42% female) (fig. S2), consistent with well-documented demographic skews in severely-affected COVID-19 patients (23, 24). However, even if age and other demographic factors are covariates, CMV seropositivity and age-related reduction in antibody titers against viral antigens as described here could still impact the severity of infection. To test these hypotheses, a much larger cohort of COVID-19 patients with severe and mild disease that could be matched for age, race and sex is required. Such future studies have the potential to enhance our understanding of the biological mechanisms underlying variable outcomes of COVID-19.

Deep serological profiling can provide a window into the breadth of viral responses, how they differ in patients with diverse outcomes, and how past infections may influence present responses to viral infections. Understanding the epitope landscape of SARS-CoV-2, particularly within S, provides a stepping stone to the isolation and functional dissection of both neutralizing antibodies and antibodies that might exacerbate patient outcomes through ADE and could inform the production of improved diagnostics and vaccines for SARS-CoV-2.

Materials and methods

Sources of serum used in this study

Cohort 1

Plasma samples were from volunteers recruited at Brigham and Women’s Hospital who had recovered from a confirmed case of COVID-19. All volunteers had a PCR- confirmed diagnosis of COVID-19 prior to being admitted to the study. Volunteers were invited to donate specimens after recovering from their illness and were required to be symptom free for a minimum of 7 days. Participants provided verbal and/or written informed consent and provided blood specimens for analysis. Clinical data including date of initial symptom onset, symptom type, date of diagnosis, date of symptom cessation, and severity of symptoms was recorded for all participants, as were results of COVID-19 molecular testing. Participation in these studies was voluntary and the study protocols have been approved by the respective Institutional Review Boards.

Cohort 2

Serum samples from patients with PCR-confirmed COVID-19 cases while admitted to the hospital and from patients who were actively enrolled into a prospective study of COVID-19 infection were provided by collaborators from the University of Washington. Residual clinical blood specimens were used. Clinical data, including symptom duration and comorbidities were extracted from medical records and from participant-completed questionnaires. All study procedures have been approved by the University of Washington Institutional Review Board.

Cohort 3

Plasma samples were provided by collaborators from Ragon Institute of MGH, MIT and Harvard and Massachusetts General Hospital from study participants in three settings: 1) PCR-confirmed COVID-19 cases while admitted to the hospital; 2) PCR-confirmed SARS-CoV-2 infected cases seen in an ambulatory setting; 2) PCR-confirmed COVID-19 cases in their convalescent stage. All study participants provided verbal and/or written informed consent. Basic data on days since symptom onset were recorded for all participants as were results of COVID-19 molecular testing. Participation in these studies was voluntary and the study protocols have been approved by the Partners Institutional Review Board.

Cohort 4

Patients were enrolled in the Emergency Department (ED) in Massachusetts General Hospital from 3/15/2020 to 4/15/2020 in Boston during the peak of the COVID-19 surge, with an institutional IRB-approved waiver of informed consent. These included patients 18 years or older with a clinical concern for COVID-19 upon ED arrival, and with acute respiratory distress with at least one of the following: 1) tachypnea ≥ 22 breaths per minute, 2) oxygen saturation ≤ 92% on room air, 3) a requirement for supplemental oxygen, or 4) positive-pressure ventilation. A blood sample was obtained in a 10 mL EDTA tube concurrent with the initial clinical blood draw in the ED. Day 3 and Day 7 blood draws were obtained if the patient was still hospitalized at those times. Clinical course was followed to 28 days post-enrollment, or until hospital discharge if that occurred after 28 days.

Enrolled subjects who were SARS-CoV-2 positive were categorized into four outcome groups: 1) Requiring mechanical ventilation with subsequent death, 2) Requiring mechanical ventilation and recovered, 3) Requiring hospitalization on supplemental oxygen but not requiring mechanical ventilation, and 4) Discharge from ED and not subsequently readmitted with supplemental oxygen. Those who were SARS-CoV-2 negative were categorized as Controls. Demographic, past medical and clinical data were collected and summarized for each outcome group, using medians with interquartile ranges and proportions with 95% confidence intervals, where appropriate.

Cohorts 5 and 6

Longitudinal Hopkins Cohort: Remnant serum specimens were collected longitudinally from PCR confirmed COVID-19 patients seen at Johns Hopkins Hospital. Samples were de-identified prior to analysis, with linked time since onset of symptom information. Specimens were obtained and utilized in accordance with an approved IRB protocol.

Cohorts 7 and 8

Cohorts 7 and 8 were previously published (7, 8).

Cohort 9

Plasma samples were collected from consented participants of the Partner’s Biobank program at BWH during the period from July to August 2016 from 37 female and 51 male individuals with ages ranging from 18 to 85 years old. Plasma was harvested after a 10 min 1200xg ficoll density centrifugation from blood that was diluted 1:1 in phosphate buffered saline. Samples were frozen at −30 C in 1 mL aliquots. All samples were collected with Partners Institutional Review Board (IRB) approval.

Blood sample collection methods

For Cohorts 1-3: Blood samples were collected into EDTA (Ethylenediamine Tetraacetic Acid) tubes and spun for 15 min at 2600rpm according to standard protocol. Plasma was aliquoted into 1.5ml cryovials and stored in −80°C until analyzed. Only de-identified plasma aliquots including metadata (e.g., days since symptom onset, severity of illness, hospitalization, ICU status, survival) were shared for this study. When appropriate for non-convalescent samples plasma/serum was also heat inactivated at 56°C for 60 min, and stored at ≤20°C until analyzed.

For Cohort 4: Blood samples were collected in EDTA tubes, and processed no more than 3 hours post blood draw in a Biosafety Level 2+ laboratory on site. Whole blood was diluted with room temperature RPMI medium in a 1:2 ratio to facilitate cell separation for other analyses using the SepMate PBMC isolation tubes (STEMCELL) containing 16ml of Ficoll (GE Healthcare). Diluted whole blood was centrifuged at 1200 rcf for 20 min at 20°C. After centrifugation, plasma (5 mL) was pipetted into 15 mL conical tubes and placed on ice during PBMC separation procedures. Plasma was then centrifuged at 1000 rcf for 5 min at 4°C, pipetted in 1.5 mL aliquots into 3 cryovials (4.5 mL total), and stored at −80°C. For the current study samples (200 uL) were first randomly allocated onto a 96 well plate based on disease outcome grouping.

Design and cloning of the SARS-CoV-2 tiling and triple-alanine scanning library

Multiple VirScan libraries were constructed as described below. We created ~200 nt oligos encoding peptide sequences 56 amino acids in length, tiled with 28-amino acid overlap through the proteomes of all coronaviruses known to infect humans including HCoV-NL63, HCoV-229E, HCoV-OC43, HCoV- HKU1, SARS-CoV, MERS-CoV and SARS-CoV-2 as well as three closely related bat viruses: BatCoV-Rp3, BatCoV-HKU3 and BatCoV-279. For SARS-CoV-2 we included a number of coding variants available in early sequencing of the viruses. For SARS-CoV-2 we additionally made a 20 amino acid peptide library tiling every 5 amino acids. Additionally, for SARS-CoV-2 we made triple-alanine mutant sequences scanning through all 56-mer peptides. Non-alanine amino acids were mutated to alanine, and alanines were mutated to glycine. Each peptide in all three libraries was encoded in two distinct ways such that there were duplicate peptides that could be distinguished by DNA sequencing. We reverse-translated the peptide sequences into DNA sequences that were codon-optimized for expression in Escherichia coli, that lacked restriction sites used in downstream cloning steps (EcoRI and XhoI), and that were unique in the 50 nt at the 5′ end to allow for unambiguous mapping of the sequencing results. Then we added adapter sequences to the 5′ and 3′ ends to form the final oligonucleotide sequences (table S1): these adapter sequences facilitated downstream PCR and cloning steps. Different adapters were added to each sub-library so that they could be amplified separately. The resulting sequences were synthesized on a releasable DNA microarray (Agilent). We PCR-amplified the DNA oligo library with the primers shown below, digested the product with EcoRI and XhoI, and cloned it into the EcoRI/SalI site of the T7FNS2 vector (5). We packaged the resultant library into T7 bacteriophage using the T7 Select Packaging Kit (EMD Millipore) and amplified the library according to the manufacturer’s protocol.

Primers used for analysis of the different libraries employed.

CoV 56-mer Library

5′ Adapter: 5′- GAATTCGGAGCGGT -3′

3′ Adapter: 5′- CACTGCACTCGAGA -3′



SARS CoV-2 Triple-alanine scanning library

5′ Adapter: 5′- GAATTCCGCTGCGT -3′

3′ Adapter: 5′- CAGGGAAGAGCTCG -3′



SARS-CoV-2 20mer Library

5′ Adapter: 5′- GAATTCCGCTGCGT -3′




Phage immunoprecipitation and sequencing

We performed phage IP and sequencing as described previously or with slight modifications (58). For the IgA and IgG chain isotype-specific immunoprecipitations, we substituted magnetic protein A and protein G Dynabeads (Invitrogen) with 6 μg Mouse Anti-Human IgG Fc-BIOT (Southern Biotech) or 4 μg Goat Anti-Human IgA-BIOT (Southern Biotech) antibodies. We added these antibodies to the phage and serum mixture and incubated the reactions overnight a 4°C. Next, we added 25 μL or 20 μL of Pierce Streptavidin Magnetic Beads (Thermo-Fisher) to the IgG or IgA reactions, respectively, and incubated the reactions for 4 hours at room temperature, then continued with the washing steps and the remainder of the protocol, as previously described (9).

Machine learning classifiers

Gradient boosting classifier models for the VirScan data were generated using the XGBoost algorithm (version 1.0.2). Classifier models were trained to discriminate either COVID-19+ and COVID-19- patients (n = 232 and n = 190 respectively) or severe disease and mild disease (n = 101 hospitalized patients and n = 131 non-hospitalized patients). Two models were generated in each case, one using the Z-scores for each VirScan peptide from the IgG immunoprecipitation as input features, and the other using the Z-scores for each VirScan peptide from the IgA immunoprecipitation as input features. Additionally, a third logistic regression classifier was trained on the output probabilities from the IgG and IgA models to generate a combined prediction. The performance of each of the three model was assessed using a 20-fold cross-validation procedure, whereby predictions for each 5% of the data points were generated from a model trained on the remaining 95%. The SHAP package was used to identify the top discriminatory peptide features from each of the XGBoost models. The logistic regression models for the Luminex data were generated using the scikit-learn python package. The raw MFI values were preprocessed using the RobustScalar function, then a logistic regression model was trained using the three most discriminate SARS-CoV-2 peptides. The model performance was quantified by 10-fold cross-validation.

High-resolution epitope identification and clustering

For each position in the 56-mer, the relative enrichment for each amino-acid was calculated as the mean fold-change of the three mutant peptides containing an alanine-mutation at that location relative to the median fold-change of all alanine mutants for the 56-mer. Overlapping 56-mers were combined by taking the minimum value at each shared position to account for the possibility that an epitope is interrupted in one of the tiles by the peptide junction. To map the boundaries of antibody footprints from the triple-alanine scanning data for each sample we used the HMMlearn python package to develop a three-state HMM assuming a Gaussian distribution of relative- enrichment emissions for each state. Mapped antibody footprints smaller than 5 amino acids in length were removed from the subsequent analysis. Next, we performed a two-step hierarchical clustering procedure to identify the number of unique epitopes. First, for each protein all antibody footprints identified across the 169 COVID-19+ patient samples were clustered based on the start and stop locations predicted by the HMM classifier to generate epitope clusters. Next, to identify unique epitopes, we performed an additional step of hierarchical clustering on the samples with epitopes within each epitope cluster based on the relative-enrichment values of the triple-alanine mutants spanning the epitope (fig. S8).

Similarity-score calculation

Pairwise alignments were generated for the S proteins of SARS-CoV-2 and each of the four common HCoVs. Similarity scores were calculated separately for a 21-amino acid window centered at each position of the SARS-CoV-2 S protein. The mean similarity score between SARS-CoV-2 and the corresponding sequence of the other HCoV was calculated for each window using the BLOSUM62 substitution matrix with a gap opening and extending penalty of −10 and −1 respectively. The maximum similarity was score was calculated as the maximum value among the pairwise-similarity scores between SARS-CoV-2 and each of the four common HCoVs for the sliding window centered at each position.

Luminex multiplex peptide epitope serology assays

Multiplexed SARS-CoV-2 peptide epitope assays were built using the peptides listed in table S9. Peptides were synthesized by the Ragon/MGH Peptide Core Facility with a Proparglyglycine (Pra, X) moiety in the amino terminus to facilitate crosslinking to Luminex beads using a “click” chemistry strategy as described (18). In brief, Luminex beads were first functionalized with amine-PEG4-azide and then reacted with the peptides to generate 20 different Luminex beads with attached peptides. Luminex bead-based serology assays were performed in 96-well U-bottom polypropylene plates using PBS + 0.1% bovine serum albumin as the assay buffer. Bead washes were done using PBS + 0.05% Triton X-100 by incubation for 1 min on a strong magnetic plate (Millipore-Sigma, Burlington, MA). All assay incubation times were 20 min. In the first step, beads were incubated with 20 μL of plasma samples. Samples used for the classifier were diluted 1:100, samples used to compare disease severity were diluted 1:300. After a wash step, bound IgA or IgG was detected by adding 40 μL of biotin-labeled anti-IgA or IgG antibodies at 0.1 μg/ml (Southern Biotechnology, Birmingham, AL). Next 40 μL of phycoerythrin (PE)-labeled streptavidin (0.2 μg/ml) (Biolegend, San Diego, CA) and assay plates were analyzed on a Luminex FLEXMAP 3D instrument (Luminex Corporation, Austin, Texas) to generate median fluorescence intensity (MFI) values to quantify peptide-specific IgA or IgG levels.

ELISA serology assays

ELISAs were performed separately using the SARS-CoV-2 N protein, S protein, or the S receptor-binding domain (RBD). 96-well plates were coated with antigen overnight. The plates were then blocked in PBS+3%BSA. After washing with PBS+0.05% Tween-20, the plasma sample were diluted 1:100, added to the plates and incubated overnight at 4°C. Following incubation, the plates were washed 3x with PBS+0.05% Tween-20. The bound IgG was detected by adding anti-Human IgG-alkaline phosphatase (Southern Biotech, Birmingham, AL) and incubating for 90 min at room temperature. The plates were washed an additional three times after which p-nitrophenyl phosphate solution (1.6 mg/mL in 0.1M glycine, 1mM ZnCl2, 1mM MgCl2, pH 10.4) was added to each well and allowed to develop for 2 hours. Bound IgG was quantified by measuring the OD405, and the reported values were calculated as the fold change over the pre-COVID-19 controls.

Supplementary Materials

Figs. S1 to S11

Tables S1 to S19

References (3739)

MDAR Reproducibility Checklist

MGH COVID-19 Collection & Processing Team participants

Collection Team

Kendall Lavin-Parsons, Blair Parry, Brendan Lilley, Carl Lodenstein, Brenna McKaig, Nicole Charland, Hargun Khanna, Justin Margolin

Department of Emergency Medicine, Massachusetts General Hospital, Boston, MA 02115, USA.

Processing Team

Anna Gonye, Irena Gushterova, Tom Lasalle, Nihaarika Sharma

Massachusetts General Hospital Cancer Center, Boston, MA 02115, USA.

Brian C. Russo, Maricarmen Rojas-Lopez

Division of Infectious Diseases, Department of Medicine, Massachusetts General Hospital, Boston, MA 02115, USA.

Moshe Sade-Feldman, Kasidet Manakongtreecheep, Jessica Tantivit, Molly Fisher Thomas

Massachusetts General Hospital Center for Immunology and Inflammatory Diseases, Boston, MA 02115, USA.

Massachusetts Consortium on Pathogen Readiness

Betelihem A. Abayneh, Patrick Allen, Diane Antille, Katrina Armstrong, Siobhan Boyce, Joan Braley, Karen Branch, Katherine Broderick, Julia Carney, Andrew Chan, Susan Davidson, Michael Dougan, David Drew, Ashley Elliman, Keith Flaherty, Jeanne Flannery, Pamela Forde, Elise Gettings, Amanda Griffin, Sheila Grimmel, Kathleen Grinke, Kathryn Hall, Meg Healy, Deborah Henault, Grace Holland, Chantal Kayitesi, Vlasta LaValle, Yuting Lu, Sarah Luthern, Jordan Marchewka (Schneider), Brittani Martino, Roseann McNamara, Christian Nambu, Susan Nelson, Marjorie Noone, Christine Ommerborn, Lois Chris Pacheco, Nicole Phan, Falisha A. Porto, Edward Ryan, Kathleen Selleck, Sue Slaughenhaupt, Kimberly Smith Sheppard, Elizabeth Suschana, Vivine Wilson

Massachusetts General Hospital, Boston, MA, USA.

Galit Alter, Alejandro Balazs, Julia Bals, Max Barbash, Yannic Bartsch, Julie Boucau, Josh Chevalier, Fatema Chowdhury, Kevin Einkauf, Jon Fallon, Liz Fedirko, Kelsey Finn, Pilar Garcia-Broncano, Ciputra Hartana, Chenyang Jiang, Paulina Kaplonek, Marshall Karpell, Evan C. Lam, Kristina Lefteri, Xiaodong Lian, Mathias Lichterfeld, Daniel Lingwood, Hang Liu, Jinqing Liu, Natasha Ly, Ashlin Michell, Ilan Millstrom, Noah Miranda, Claire O’Callaghan, Matthew Osborn, Shiv Pillai, Yelizaveta Rassadkina, Alexandra Reissis, Francis Ruzicka, Kyra Seiger, Libera Sessa, Christianne Sharr, Sally Shin, Nishant Singh, Weiwei Sun, Xiaoming Sun, Hannah Ticheli, Alicja Trocha-Piechocka, Daniel Worrall, Alex Zhu

Ragon Institute, MGH, MIT and Harvard, Cambridge, MA, USA.

George Daley, David Golan, Howard Heller, Arlene Sharpe

Harvard Medical School, Boston, MA, USA.

Nikolaus Jilg, Alex Rosenthal, Colline Wong

Brigham and Women’s Hospital, Boston, MA, USA.

This is an open-access article distributed under the terms of the Creative Commons Attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

References and Notes

Acknowledgments: We thank Jillian Bensko, Deborah Gakpo, Geneva DeGregorio, and Sudeshna Fiscsh (BWH) for assistance organizing the human cohorts, Galit Alter (MGH) for facilitating samples from the Seattle cohort and advice; Hugh Chen for help on the VirScan assays, the BPF Next-Gen Sequencing Core Facility at Harvard Medical School for their expertise and instrument availability that supported this work. Funding: This work was also supported by grants from the National Institutes of Health (NIH, AI121394, AI139538) and the Burroughs Wellcome Fund to D.W.H., from the Division of Intramural Research, NIAID, NIH to O.L., from the VoVRN to S.J.E., from the Executive Committee on Research, MGH, to M.B.G. and M.R.F., from the MassCPR to B.D.W., S.J.E., D.R.W., and from the NIH/NIAID (U24 grant) to H.B.L. and S.J.E. R.T.T. is supported by the Pemberton-Trinity Fellowship and a Sir Henry Wellcome Fellowship (201387/Z/16/Z). E.S. is funded by the NSF Graduate Research Fellowship Program. The COVID-19 sample biorepository was supported by a gift from Ms. Enid Schwartz, by the Mark and Lisa Schwartz Foundation, the Massachusetts Consortium for Pathogen Readiness and the Ragon Institute of MGH, MIT and Harvard. S.J.E. and B.D.W. are Investigators with the Howard Hughes Medical Institute. Author contributions: Experimental design, E.S., E.F., T.K. R.T.T., J.A.L., H.B.L. and S.J.E. Investigation, E.S. E.F., X.Y., A.P.-T., T.K. J.A.L., I.-H.L., M.L.R., B.M.S., M.Q.L., Y.L., R.T.T. Reagents and Samples Y.C., A.Z., D.M., Y.C., J.L., A.Z., D.R.M., F.J.N.L., M.T., S.H., J.L., MGH COVID Collection & Processing Team, P.C., O.L., A.K., A.C.V., K.K., X.Y., A.P.-T., B.D.W. and D.R.W. Writing, E.S., E.F., R.T.T., T.K., and S.J.E. Supervision, S.J.E., D.R.W., A.C.V., K.K., M.R.F., B.W., H.Y.C., N.H., M.B.G., B.D.W., H.B.L. Competing interests: S.J.E. and T.K. are founders of TSCAN Therapeutics, SJ.E. is a founder of MAZE Therapeutics and Mirimus, S.J.E. serves on the scientific advisory board of Homology Medicines, TSCAN Therapeutics, MAZE, XChem, and is an advisor for MPM, none of which impact this work. S.J.E., T.K., and H.B.L. are inventors on a patent application filed by the Brigham and Women's Hospital (US20160320406A) that covers the use of the VirScan library to identify pathogen antibodies in blood. All other authors declare no competing interests. Data and materials availability: All data are available in the manuscript or the supplementary materials. All reasonable requests for materials will be fulfilled. This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. To view a copy of this license, visit This license does not apply to figures/photos/artwork or other content included in the article that is credited to a third party; obtain authorization from the rights holder before using such material.
View Abstract

Stay Connected to Science


Navigate This Article