The SARS Coronavirus: A Postgenomic Era

See allHide authors and affiliations

Science  30 May 2003:
Vol. 300, Issue 5624, pp. 1377-1378
DOI: 10.1126/science.1086418

The complete sequences of the ∼30,000-nucleotide RNA genomes of two isolates of the SARS coronavirus (SARS-CoV) are reported on pages 1399 and 1394 of this issue (1, 2), a remarkable achievement since the virus was identified less than 2 months ago. Additional sequences in GenBank, and complete genome sequences of nine virus isolates from five patients (3) allow comparison between different SARS-CoV isolates.

Sequence analysis reveals the genome organization and phylogeny of SARS-CoV (1, 2). The genome has all the features characteristic of a coronavirus, but is sufficiently different from all previously known coronaviruses to represent a new coronavirus group. The genomes of the SARS-CoV Tor2 strain from Toronto (1) and the Urbani strain from Vietnam (2) differ by just eight nucleotides. Thus, the viral RNA genome appears stable during human passage. The SARS-CoV genome contains five major open reading frames (ORFs) that encode the replicase polyprotein; the spike (S), envelope (E), and membrane (M) glycoproteins; and the nucleocapsid protein (N) in the same order and of approximately the same sizes as those of other coronaviruses. (The figure shows the virion's structure.)

Coronavirus organization.

A model of the coronavirus structure showing the organization of the spike (S), membrane (M), and envelope (E) glycoproteins. The RNA is protected by a helical capsid of N protein monomers.


Coronavirus genomes also contain a variable number of nonconserved ORFs interspersed between the major ORFs. Marra et al. identified nine potential ORFs, not found in other coronaviruses, that could encode proteins unique to SARS-CoV (1). Five of these were also identified by Rota et al. (2), who only included ORFs for proteins longer than 50 amino acids. It remains to be established which of these ORFs are translated in infected cells. These proteins may be nonessential for virus replication or they may serve novel functions in virus replication and pathogenesis or modulate immune responses to infection. The Marra and Rota groups proposed different names for these ORFs (1, 2) and a common nomenclature awaits experiments showing which ORFs are expressed in infected cells.

Coronavirus-infected cells contain a characteristic 3′ coterminal nested set of mRNAs, each of which has at its 5′ end an ∼70-nucleotide long, capped leader sequence derived from the 5′ end of the genome. Synthesis of subgenomic negative sense RNA species by discontinuous RNA transcription is regulated by a core transcription-regulating sequence (TRS) found on the genome near the beginning of each ORF and at the 3′ end of the leader (4). Marra et al. (1) suggest that the SARS-CoV TRS core sequence is 5′-CUAAAC-3′, like TRSs of murine and bovine coronaviruses in group 2. In SARS-CoV and human and porcine coronaviruses in group 1, this core sequence is flanked at its 3′ end by GAA. Rota et al. (2) suggest that the core TRS for SARS-CoV is 5′-AAACGAAC-3′, based on the 5′ sequence of the smallest mRNA. The TRSs of the known coronaviruses vary slightly from gene to gene, and candidate TRSs of SARS-CoV are not identical for each potential ORF. The consensus sequence CUAAAC is found before each of the S and M genes and ORF 10 of SARS-CoV, and the AAACGAAC sequence is found at the same three sites and also before the N gene and ORFs 3 and 9, suggesting that these six genes may be expressed in infected cells from subgenomic mRNAs. Sequences upstream of the E gene, and ORFs 7, 9, and 11 differ significantly from the consensus TRSs. Perhaps the E protein of SARS-CoV is translated from a larger mRNA by internal initiation, like the E proteins of several other coronaviruses. Rota and co-workers detected five abundant viral subgenomic mRNAs in Northern blots of cells infected with SARS-CoV, but less abundant mRNAs may not have been detected (2). Experimental data are needed to confirm the core TRS, characterize the viral mRNAs, and detect virus-encoded proteins in infected cells.

Although the predicted amino acid sequences of the 3CL protease, which is part of the viral replicase polyprotein, and the S, E, M, and N proteins of SARS-CoV suggest that they are structurally and functionally homologous to the proteins of known coronaviruses, the pairwise amino acid sequence identity with their homologs is less than 40 to 50%. Overall, the SARS-CoV genome appears to be equidistant from those of all known coronaviruses. The sequences of the polymerase are most closely related to bovine and murine coronaviruses in group 2, with some characteristics like avian viruses in group 3. In addition, the 3′ end of the SARS-CoV genome contains a 32-nucleotide motif that is also found in group 3 coronaviruses. Unlike group 2 viruses, SARS-CoV does not encode a hemagglutinin-esterase protein. Also, whereas group 2 viruses encode two papain-like proteinases in the replicase polyprotein, SARS-CoV, like group 3 viruses, apparently encodes a single papain-like proteinase. Based on comparison of the genomes of SARS-CoV and other coronaviruses, both the Rota and Marra groups suggest that SARS-CoV should be classified in a new coronavirus group.

Indeed, the genome clearly shows that SARS-CoV is neither a host-range mutant of a known coronavirus, nor a recombinant between known coronaviruses. SARS-CoV is also unlikely to have been created from known coronaviruses by genetic engineering, because at present it would be impossible to modify 50% of a coronavirus genome without abrogating viral infectivity. SARS-CoV probably evolved separately from an ancestor of the known coronavirus, and infected an unidentified animal, bird, or reptile host for a very long time before infecting humans and starting the SARS epidemic. The original host for the SARS-CoV may be identified by serological studies of species near the site where the epidemic began. Coronavirus would have to be isolated from this host and its genome sequenced in order to identify genetic changes associated with adaptation to humans. The available sequence data on the few independent isolates of SARS-CoV from humans suggest that the virus is genetically quite stable. Minor nucleotide changes found in viruses from different clinical isolates may prove useful as markers for epidemiological studies, but their significance for viral pathogenesis cannot be determined until the functions and antigenic determinants of the viral proteins have been characterized.

The sequence of the SARS-CoV genome makes it possible to identify subgenomic mRNAs by reverse transcription-polymerase chain reaction and to clone viral cDNAs, express recombinant viral proteins, and study their roles in virus replication and pathogenesis. Viral cDNAs and antibodies to recombinant viral proteins will be useful for developing sensitive and specific tests for SARS-CoV RNA and antigens in clinical specimens. The genome sequence and recombinant viral proteins will also facilitate the development of drugs and vaccines against SARS-CoV. For example, a three-dimensional model of the SARS-CoV-encoded 3 CL proteinase, has been made to direct the design of protease inhibitors that may block coronavirus replication (5). Passive immunization with neutralizing monoclonal antibodies may be useful for prophylaxis or therapy. Live, attenuated vaccines may be developed by serial passage of SARS-CoV in cell culture, and mutations responsible for attenuation of virus virulence could then be identified. Characterization of the SARS-CoV antigens that elicit protective immunity will facilitate development of vaccines. Fortunately, coronavirus genomes can now be manipulated using targeted RNA recombination and infectious cDNA clones in order to identify determinants of virus virulence (610). Genetically engineered coronaviruses that can express proteins, but not be transmitted from cell to cell, may be useful as vaccines to elicit mucosal immunity (1113). The direction of SARS research has now moved from identifying the virus and sequencing its genome to analyzing the viral proteins and their roles in virus replication and pathogenesis with the aim of developing new drugs and vaccines against SARS.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
View Abstract

Stay Connected to Science

Navigate This Article