Comparative Genomics of BCG Vaccines by Whole-Genome DNA Microarray

See allHide authors and affiliations

Science  28 May 1999:
Vol. 284, Issue 5419, pp. 1520-1523
DOI: 10.1126/science.284.5419.1520


Bacille Calmette-Guérin (BCG) vaccines are live attenuated strains of Mycobacterium bovis administered to prevent tuberculosis. To better understand the differences between M. tuberculosis, M. bovis, and the various BCG daughter strains, their genomic compositions were studied by performing comparative hybridization experiments on a DNA microarray. Regions deleted from BCG vaccines relative to the virulent M. tuberculosis H37Rv reference strain were confirmed by sequencing across the missing segment of the H37Rv genome. Eleven regions (encompassing 91 open reading frames) of H37Rv were found that were absent from one or more virulent strains of M. bovis. Five additional regions representing 38 open reading frames were present in M. bovis but absent from some or all BCG strains; this is evidence for the ongoing evolution of BCG strains since their original derivation. A precise understanding of the genetic differences between closely related Mycobacteria suggests rational approaches to the design of improved diagnostics and vaccines.

Tuberculosis is an ancient scourge of humankind, caused by the bacterium M. tuberculosis and infrequently by other subspecies of the M. tuberculosiscomplex, such as M. bovis. The anticipation of 80 million cases of tuberculosis in the coming decade, an increasing proportion of which are likely to be drug-resistant, has revived efforts to develop a new vaccine (1). The current vaccine was originally developed by Calmette and Guérin, who passaged a strain ofM. bovis 230 times in vitro between 1908 and 1921. The resulting vaccine was thought to have struck a balance between reduced virulence and preserved immunogenicity (2). However, because of the inability to preserve viable bacteria (such as by freezing), this live vaccine required continued passage, eventually resulting in a profusion of phenotypically different daughter strains that are collectively known as BCG.

By the time lyophilized seed lots of BCG vaccines were created in the 1960s, these vaccines had been separately propagated through about 1000 additional passages (depending on the daughter strain), usually under the very conditions that effected the original attenuation. We previously hypothesized that pressures to minimize adverse effects and maintain tuberculin reactivity during this time resulted in impotent vaccines that consistently induce tuberculin sensitivity with immunization (3). If true, such evolution could in part explain why BCG efficacy has varied considerably in human trials. Moreover, such BCG strains would contain antigens that cross-react with the commonly applied diagnostic test for latent tuberculous infection.

To understand the genetic basis of this progression, one would ideally compare contemporary BCG vaccines to their progenitor strain. Because this strain was lost during World War I, the origin of current BCG vaccines can only be inferred through an evolutionary approach (4). We therefore curated a collection of BCG daughter strains representing this global dissemination for the purpose of performing genomic comparisons (5).

The recent determination of the genomic nucleotide sequence of M. tuberculosis H37Rv provides a framework for the genomic analysis of BCG strains (6), in that differences between H37Rv and BCG comprise (i) differences between M. tuberculosis H37Rv and virulent M. bovis and (ii) differences between virulent M. bovis and BCG strains. We have used the genomic sequence of H37Rv to assemble a DNA microarray (7) representing nearly all open reading frames (ORFs) of H37Rv (8) and have used this microarray to perform parallel comparative hybridizations between M. tuberculosisH37Rv and M. bovis BCG strains (9).

Because both the nucleotide sequence of individual genes and the order of genes within the whole genome of M. tuberculosis andM. bovis BCG-Pasteur are extremely similar, one can expect equivalent hybridization by M. tuberculosis and BCG strains for loci that are equally present in both organisms (10,11). On the other hand, a mismatched hybridization signal is likely to reflect nonequivalent representation in the genome because of either a repeated element or a relative deletion in the BCG strain.

Our DNA microarray had 4896 spots, representing 3902 of the 3924 ORFs of M. tuberculosis H37Rv (99.4%); comparisons of H37Rv and BCG strains provided interpretable data for an average of 3756 ORFs (96%) (12). The results can be graphically represented as pseudocolorized spots, or mapped to their respective genomic locations, to highlight ORFs with unequal hybridization ratios (Fig. 1). In this example, BCG-Danish 1331 is shown to have significantly weaker hybridization than H37Rv for the large domain encompassing ORFs Rv1963c to Rv1988. We have used microarray-based comparisons to screen for ORFs deleted from BCG strains and then performed confirmatory analysis with polymerase chain reaction (PCR)–based sequencing across deleted regions (13). Using this strategy, we have documented 16 regions deleted in BCG strains varying in length from 1903 to 12,733 base pairs (bp) (Table 1). Four of these regions have been previously described (14, 15) and we have extended the nomenclature of the former description to name our deletion regions RD1 to RD16. Of the 16 deletion regions, nine are missing from BCG and all virulent M. bovis strains tested, two (representing prophages) are missing from BCG and some of the strains of M. bovis, one is missing from all BCG strains, and four are missing only from certain BCG strains.

Figure 1

(A) Scanning fluorimetric representation of a whole-genome DNA microarray comparison of genomic DNA from M. tuberculosis H37Rv (red) andM. bovis BCG-Danish 1331 (green). Yellow fluorescence indicates equivalent hybridization by both strains. Red fluorescence indicates unopposed hybridization by M. tuberculosis, indicating deletion of relevant ORFs in BCG strain. (B) Microarray hybridization results comparing H37Rv to BCG-Denmark as displayed by the ProbeBrowser software, which maps microarray results to genomic location. For each spot on the array, the logarithm of the hybridization ratio of H37Rv (Cy3) to BCG-Denmark (Cy5) is displayed in blue marks on they axis (such that ORFs not present in BCG are displayed above the mean value). The location of ORFs within the entire 4.4-megabase genome maps on the x axis. Regions where more than two contiguous ORFs have a log ratio exceeding 2 standard deviations above the mean are indicated by red bars, and these putative deletions were examined with PCR as described (13). (C) The scalable software permits an expanded view of one genomic region with each array spot similarly represented in blue on x and y coordinates, but now with the length of the region probed reflected by the length of the blue lines. PCR-based investigation confirmed two deletions, RD15 and RD2, separated by 335 bp. The probe browser software is freely available at

Table 1

Genetic information about distribution of deletions in virulent M. bovis and BCG strains. Parentheses afterM. bovis indicate how many strains of virulent M. bovis were missing the genetic element. Start and end position are based on the 162 segments of genome available at A list of the 129 deleted ORFs may be accessed at

View this table:

The identification of nine regions (61 ORFs) that are present inM. tuberculosis and consistently absent from M. bovis strains (including BCG) may provide some insights into the phenotypic differences between M. bovis and M. tuberculosis. Pulmonary disease caused by M. tuberculosisand M. bovis are clinically, radiographically, and pathologically indistinguishable. However, M. bovis appears to have a diminished propensity to reactivate and spread from person to person (16). Of particular note is the absence fromM. bovis of a cluster of three phospholipase C genes (plcA, -B, and -C), containing the only two genes (of the 61 described here) for which empirical evidence exists for both antigenicity and enzymatic function (17, 18). Given that phospholipase C activity contributes to the virulence of the opportunistic lung pathogen Pseudomonas aeruginosa, it is attractive to speculate that the M. tuberculosis plcA-plcC gene cluster manifests a similar, clinically important phenotype. However, before attributing species-specific roles to these deletions, a better understanding of the phenotypic relevance of genes missing from M. bovis will require concentrated biochemical and genetic analysis of each of these 61 deleted ORFs. It must be stressed that the use of H37Rv sequence as a reference excludes studies of M. bovis regions missing from M. tuberculosis, including possible M. bovis–specific virulence determinants. Therefore, the analysis presented here is unidirectional and represents only a subset of the differences between these organisms.

This collection of ORFs absent from M. bovis isolates may prove to be of considerable practical utility. Currently available diagnostics, such as the tuberculin skin test, are unable to distinguish between individuals who have been infected with M. tuberculosis and those who have been vaccinated with BCG. This distinction may be possible with an assay based on antigens derived from these M. tuberculosis–specific genes. The availability of a diagnostic specific for wild-type infection could greatly facilitate the introduction of vaccination in countries that have a low prevalence of disease and may permit more discriminating treatment of latent tuberculous infections in countries using BCG.

To address the differences specific to BCG, we assumed that regions of H37Rv present in M. bovis strains and absent only from BCG were deleted during the derivation and maintenance of BCG vaccines in various vaccine facilities around the world. If this is true, then regions of M. bovis missing from BCG strains would indicate unidirectional genetic events from which a phylogeny of BCG strains can be inferred. Because some BCG daughter strains were obtained directly from the Institut Pasteur while other strains were derived from another vaccine facility (for example, BCG Connaught came from BCG Frappier in 1948), it is possible to reconstruct the genealogy of BCG strains and determine when and where BCG-specific deletions occurred (Fig. 2) (4). The deletions described in this report can be superimposed on the historical record, demonstrating a unique use of genomic analysis to describe a half-century of in vitro bacterial evolution (19). In comparison with M. bovis, all BCG vaccines lack one region (RD1) that presumably was lost during the 1908–1921 attenuation (14). Another deletion (RD2) occurred at the Institut Pasteur between 1927 and 1931. A further deletion (RD14) specific to BCG-Pasteur indicates an event after Aronson's receipt of BCG-Pasteur 575 in 1938 (20) and before the lyophilization of BCG-Pasteur 1173 in 1961 (21). The losses of RD8 in Montréal (between 1937 and 1948) and RD16 in Uruguay or Brazil (after 1925) indicate that ongoing evolution of BCG strains was not confined to the Institut Pasteur.

Figure 2

BCG historical genealogy incorporating genetic differences previously noted and newly detected genetic deletions. The vertical axis represents time. The horizontal axis denotes different geographic locations of BCG propagation. Under this reconstruction, theM. bovis strain that was used to develop BCG would be missing RD3, RD4, RD5, RD6, RD7, RD9, RD10, RD11, RD12, RD13, and RD15. During serial propagation of this strain, RD1, RD2, RD8, RD14, RD16, and an IS6110 element (IS) were deleted.

A historical review of the BCG literature reveals reports of decreasing virulence in the Institut Pasteur at various times, consistent with (but not necessarily directly caused by) the documented ongoing evolution. During the original derivation between 1908 and 1921, RD1 was lost and concurrently Calmette described an attenuation of virulence for animals (22). Later, between 1927 and 1931, RD2 was lost and various investigators reported decreased vaccine lesions in humans (23) and reduced virulence in animals (24). As a basis for speculation on the impact of RD14 in humans, one can compare descriptions from the vaccine trials using BCG-Pasteur before and after this deletion. In the American Indian trial, which used BCG-Pasteur 575, 75% of participants developed a draining abscess at the vaccine site (20). In Madras, India, where BCG-Pasteur 1173 was used, abscesses at the vaccination sites were rare (25). However, caution must be taken in inferring causality between these documented genetic events and the evolving phenotype of BCG strains. For example, similar results were obtained in the Madras trial with BCG-Danish 1331, a strain that had not suffered any detectable deletion after being obtained from the Institut Pasteur in 1931.

To explore the potential association between BCG deletions and progressive attenuation of virulence, we analyzed the predicted function of the 38 ORFs deleted specifically from BCG strains by means of the hierarchy of gene function proposed by Cole and colleagues (26). None of the ORFs present in M. bovis but deleted from BCG strains were classified as a virulence element, and the gene function predicted by homology search spanned a variety of functions. ORFs classified as transcriptional regulators (“repressors/activators”) were overrepresented in BCG deletions relative to their frequency in the H37Rv genome (3/38 versus 37/3924, odds ratio = 9.0, P = 0.006). Transcriptional regulators were deleted in BCG strains obtained after 1927 (Rv1985c in RD2), BCG-Pasteur (Rv1773c in RD14), and BCG-Moreau (Rv3405c in RD16). Although the role of this gene family remains to be determined inM. tuberculosis, it is tempting to speculate that regulatory elements may serve a role in adapting to environmental change, such as might be experienced during in vivo infection. In contrast, the loss of such genes in laboratory conditions may have little consequence under the relatively constrained conditions of in vitro growth.

The association between ongoing deletion of genetic material (including transcriptional regulators) and progressive attenuation of virulence now suggests a testable hypothesis for the loss of protective efficacy over the same era. It has long been known that administration of killed BCG organisms results in a weak and transient immune response, indicating that protective immunity by BCG requires survival and replication in the vaccinated host (27). We propose that the deletions detected reflect a progressive adaptation of BCG strains to laboratory conditions that has compromised their capacity to survive within the host, impairing their ability to stimulate a durable immune response.

The basic approach we have used to determine genetic differences within the M. tuberculosis complex can be applied to study genetic variability within any species for which the genomic sequence is known. Our current mycobacterial DNA microarray can detect deletions as small as 2 kb, or 1/2000 of the genome. This array cannot detect smaller deletions, point mutations, deletions restricted to intergenic regions, genetic rearrangements, M. bovis ORFs that are not part of the H37Rv genome, and deletions of homologous repetitive elements (such as PGRS genes). However, with microarrays of greater resolution it will soon be possible to study these other sources of genetic variability and determine their relative frequency and importance. As a result of the numerous genome projects, complete sequence information will soon be available for dozens of microbial species, many derived from reference strains and laboratory isolates. The use of microarray-based comparative genomics should prove a powerful tool for understanding phenotypic variability among clinical and environmental isolates of similar genetic composition.

  • * To whom correspondence should be addressed. E-mail: mbgq{at}

  • These authors contributed equally to this report.


View Abstract

Stay Connected to Science

Navigate This Article