Extensive DNA Inversions in the B. fragilis Genome Control Variable Gene Expression

See allHide authors and affiliations

Science  04 Mar 2005:
Vol. 307, Issue 5714, pp. 1463-1465
DOI: 10.1126/science.1107008


The obligately anaerobic bacterium Bacteroides fragilis, an opportunistic pathogen and inhabitant of the normal human colonic microbiota, exhibits considerable within-strain phase and antigenic variation of surface components. The complete genome sequence has revealed an unusual breadth (in number and in effect) of DNA inversion events that potentially control expression of many different components, including surface and secreted components, regulatory molecules, and restriction-modification proteins. Invertible promoters of two different types (12 group 1 and 11 group 2) were identified. One group has inversion crossover (fix) sites similar to the hix sites of Salmonella typhimurium. There are also four independent intergenic shufflons that potentially alter the expression and function of varied genes. The composition of the 10 different polysaccharide biosynthesis gene clusters identified (7 with associated invertible promoters) suggests a mechanism of synthesis similar to the O-antigen capsules of Escherichia coli.

Bacteroides fragilis is the major obligately anaerobic Gram negative bacterium isolated from abscesses, soft tissue infections, and bacteraemias that arise from contamination of normally uncolonized body sites by bacteria from the resident gastrointestinal (GI) microbiota. Putative virulence attributes of B. fragilis include attachment mechanisms, aerotolerance, extracellular enzyme production, and resistance to complement-mediated killing and phagocytosis [reviewed in (1)]. The lipopolysaccharide of B. fragilis triggers inflammatory events via the Toll-like receptor 2 (TLR2) and is likely to be involved in systemic inflammatory response syndrome caused by GI tract bacteria (2). B. fragilis itself only accounts for between 4 and 13% of the normal human fecal microbiota but is present in 63 to 80% of Bacteroides infections. In contrast, the related B. thetaiotaomicron accounts for between 15 and 29% of the fecal microbiota but is associated with only 13 to 17% of infection cases. B. fragilis is capable of a high amount of within-strain phase and antigenic variation of surface components. A single strain of B. fragilis may reversibly produce three different encapsulating surface structures: the large capsule and the small capsule, both visible by light microscopy, and an electron-dense layer (EDL) visible by electron microscopy (3). In addition, reversible within-strain antigenic variation of multiple antigenically distinct high molecular mass polysaccharides and other components is evident (4). Before the advent of the genome sequencing program, the potential mechanisms generating this variation were unknown. We determined the complete genome sequence of the nonenterotoxin-producing DNA homology group I B. fragilis, strain NCTC 9343.

The genome of B. fragilis NCTC 9343 contains a single circular chromosome of 5,205,140 base pairs (bp) predicted to encode 4274 genes and a plasmid, pBF9343 (fig. S1 and table S1). During the assembly of shotgun data, particular regions could not be resolved because certain segments of the sequence were present in two alternative orientations. This indicated that specific inversions of these sequences occurred at a high frequency within the clonal growth of bacteria used for DNA isolation. These fragilis invertible (fin) regions can be grouped on the basis of the inverted repeat sequences that flank them. Twelve regions (table S2A, group 1) shown to be invertible or with sequence similarity are flanked by inverted repeats, designated fragilis inversion crossover (fix) sites, similar to those acted on by the Salmonella Hin DNA invertase (5). All of these invertible regions contain a consensus promoter, suggesting that they control the expression of downstream genes. Seven fin regions (average length of 226 bp) were found upstream of 7 of the 10 polysaccharide biosynthesis gene clusters (table S3), immediately suggesting a mechanism for the observed antigenic variation. The orientation of specific promoters can be correlated experimentally with expression of specific polysaccharides (6), an observation confirmed in independent experiments (5). The remaining five related fin regions in group 1 are 161 bp in length and are associated with a variety of other putative proteins (table S2A). We identified two serine site-specific DNA invertases similar to Hin in the genome, FinA (BF2779), chromosomally located but not near an invertible region, and FinB (pBF9343.01), on the plasmid. The role of FinA (Mpi) in the inversion of these segments has been demonstrated (7), and the plasmid-encoded FinB binds to fix sites (5). In total, the genome encodes 30 enzymes potentially capable of site-specific DNA inversion: 26 tyrosine recombinases (integrase family), 3 serine recombinases (resolvase-invertase family), and 1 Piv-like transposase-invertase (IS110 family).

A further 11 fin promoter regions (average length of 370 bp) are different from the hin-like group 1 regions and more heterogeneous in nature (table S2A, group 2). The inverted repeats that flank these regions and contain the sites of strand exchange (group 2 fix sites) are different from the hix-like regions, indicating that they are acted on by a different recombinase. These predominantly control the expression of a family of outer membrane proteins, and some might also drive the expression of divergent genes with diverse functions [Supporting Online Material (SOM) Text]. The use of DNA inversion by B. fragilis goes beyond the control of promoter sequences. Several more complex inversion events, or intergenic shufflons, that involve the inversion of complete and partial coding sequences were observed in the shotgun sequence. One example, whereby DNA inversion brings silent gene segments into an expression site, is the two-domain specificity protein of a type-I restriction-modification system (BF1839) (table S2B, IR-BB). Each domain in such proteins is responsible for recognizing half of the two-part DNA binding site. Just after the start codon and between the two domains of BF1839 are independent inverted repeats that are unrelated to group 1 or 2 fix sites, both of which are present in similar positions in the downstream convergent gene BF1842, which does not have an appropriate start codon (Fig. 1A). Independent recombination events between these inverted repeats would produce four different specificity proteins recognizing four different DNA sequences. Between these genes are two further gene cassettes, each of which encodes one C-terminal recognition domain. At the 5′ end of these cassettes are two further, different inverted repeats that allow either of the cassettes to be exchanged with the C-terminal domain of the adjacent gene (BF1838 or BF1842), increasing the number of potential recognition specificities to eight. Three potential recombinases encoded nearby (BF1833, 1843, and 1845) may be involved in this system. A similar, although less complex, variable restriction-modification system has been described in Mycoplasma pulmonis (8). Three further independent intergenic shufflons, acting on outer membrane proteins and a signal transduction system, were observed in the shotgun sequence (Fig. 1, table S2B, and SOM Text). Other intergenomic inversions (IR-Q, IR-R, and IR-S) (table S3) involve the inversion of complete coding sequences, often reorientating them with or against the apparent direction of transcription of surrounding genes. These may also affect the transcription levels of the genes within these regions.

Fig. 1.

Examples of invertible regions in the B. fragilis genome (18). (A) Restriction modification intergenic shufflon: restriction-modification (R/M) complex genes, gray boxes; other genes, open boxes; potential hsdS DNA binding modules, hatched boxes; different inverted repeats at the inversion ends, light gray triangles. (B) Inversion of large segments of DNA through large inverted repeats (black triangles) brings alternative outer membrane protein genes (gray boxes) downstream of an invertible promoter (gray diamond). (C) Local inversion through inverted repeats (black triangles) fuses silent alternative outer membrane protein gene cassettes (hatched boxes) to a fixed promoter and translation start.

Comparison of the B. fragilis genome with the recently sequenced B. thetaiotaomicron strain VPI 5482 (ATCC 29148) (9) reveals that there are no orthologous variable promoters or indeed operons driven by them. B. thetaiotaomicron does encode some variable systems (9), but they are unique to that organism and considerably less numerous than those in B. fragilis. This enhanced potential for variation and other genomic differences (SOM Text) may explain in part why B. fragilis is isolated more frequently from infection than B. thetaiotaomicron.

Surface polysaccharides are involved in establishing abscess formation (10). Ten separate gene clusters potentially involved in polysaccharide synthesis are evident in the genome sequence (table S3). The polysaccharide gene clusters A to H have genes similar to wzx and wzy that are involved in transfer of linked sugar repeats across the cytoplasmic membrane and repeat unit polymerization, respectively, but are lacking in genes associated with the export of polymer across the outer membrane. This suggests that these gene clusters are similar to the Escherichia coli group 4 O-antigen capsules (11) and is in keeping with the characteristic heterogeneity of the polysaccharide chain length after SDS–polyacrylamide gel electrophoresis (PAGE) and the EDL phenotype (12). A gene with some similarity to E. coli wzz (BF1708) that determines chain length is located within polysaccharide gene cluster J. Variation in the expression of BF1708 may explain the varying reports of presence (13) or absence (14) of repeating O-antigen units after PAGE.

Phase variation controlled by DNA inversion events has been reported in several other bacteria. For example, Salmonella typhimurium regulates the expression of a flagellar protein by using a single invertible promoter (15), and E. coli plasmids use shufflons to express one of several variant pilus proteins (16). Different species of Mycoplasma have been shown to regulate the expression of a number of surface proteins by using invertible promoters (17) or to use a shufflon system to express variable surface proteins (8). However, in each of these cases, the use of these mechanisms is restricted to a single system or class of surface molecules. As described here, B. fragilis uses DNA inversion to control a larger number or greater breadth of systems than in any other organism described to date, including surface proteins, polysaccharides, and regulatory systems. This may be related to its niche as a commensal and opportunistic pathogen, because the resulting diversity in surface structures could increase both immune invasion and the ability to colonize novel sites.

Supporting Online Material

Materials and Methods

SOM Text

Figs. S1 and S2

Tables S1 to S4

References and Notes

View Abstract

Stay Connected to Science

Navigate This Article