Crystal Structure of Invasin: A Bacterial Integrin-Binding Protein

See allHide authors and affiliations

Science  08 Oct 1999:
Vol. 286, Issue 5438, pp. 291-295
DOI: 10.1126/science.286.5438.291


The Yersinia pseudotuberculosis invasin protein promotes bacterial entry by binding to host cell integrins with higher affinity than natural substrates such as fibronectin. The 2.3 angstrom crystal structure of the invasin extracellular region reveals five domains that form a 180 angstrom rod with structural similarities to tandem fibronectin type III domains. The integrin-binding surfaces of invasin and fibronectin include similarly located key residues, but in the context of different folds and surface shapes. The structures of invasin and fibronectin provide an example of convergent evolution, in which invasin presents an optimized surface for integrin binding, in comparison with host substrates.

Many bacterial pathogens bind and enter eukaryotic cells to establish infection. Yersinia pseudotuberculosis and Y. enterocolitica are enteropathogenic Gram-negative bacteria that cause gastroenteritis when they are translocated across the intestinal epithelium at Peyer's patches by way of M cells. Translocated bacteria enter the lymphatic system and colonize the liver and spleen, where they grow mainly extracellularly (1). Invasin is an outer membrane protein required for efficient uptake of Yersinia into M cells (2, 3). Invasin mediates entry into eukaryotic cells by binding to members of the β1 integrin family that lack I, or insertion, domains, such as α3β1, α4β1, α5β1, α6β1, and αvβ1(3). Integrins are heterodimeric integral membrane proteins that mediate communication between the extracellular environment and the cytoskeleton by binding to cytoskeletal components and either extracellular matrix proteins or cell surface proteins (4). Invasin binding to β1 integrins is thought to activate a reorganization of the host cytoskeleton to form pseudopods that envelop the bacterium (5). Another family of enteropathogenic bacterial proteins related to invasin, the intimins, does not appear to use integrins as its primary receptors for invasion (6). Instead, intimins mediate attachment of the bacteria to host cells by binding to a bacterially secreted protein Tir, which upon secretion becomes inserted into the host membrane (6).

Yersinia pseudotuberculosis invasin is a 986-residue protein. The NH2-terminal ∼500 amino acids, which are thought to reside in the outer membrane (7), are related (∼36% sequence identity) to the analogous regions of intimins (8). The COOH-terminal 497 residues of invasin, which make up the extracellular region, can be expressed as a soluble protein (Inv497) that binds integrins and promotes uptake when attached to bacteria or beads (9). The shortest invasin fragment capable of binding integrins consists of the COOH-terminal 192 amino acids (7). This fragment is not homologous to the integrin-binding domains of fibronectin [the fibronectin type III repeats 9 and 10 (Fn-III 9–10)] (8), although mutagenesis studies and competition assays indicate that invasin and fibronectin bind to α3β1 and α5β1 integrins at the same or overlapping sites (10). The integrin-binding region of invasin also lacks significant sequence identity with the corresponding regions of intimins (∼20% identity) (8). To gain insight into enteric bacterial pathogenesis and to compare the structural basis of integrin binding by invasin and Fn-III domains, we solved the crystal structure of Inv497.

Inv497 was expressed in Escherichia coli and purified (9). The structure was solved to 2.3 Å by multiple isomorphous replacement with anomalous scattering (MIRAS) (Table 1) (11, 12). Inv497 is a rodlike molecule with overall dimensions of ∼180 Å by 30 Å by 30 Å (Fig. 1A), consistent with analytical ultracentrifugation analyses that suggest the fragment has an extended monomeric structure in solution (13). The Inv497 structure bears an overall resemblance to that of another α5β1-binding fragment, Fn-III repeats 7 through 10 (Fn-III 7–10) (14), as they are both elongated molecules composed of tandem domains. The first four Inv497 domains (D1, D2, D3, and D4) are composed mainly of β structure, and the fifth domain (D5) includes α helices and β sheets. Despite only 20% sequence identity (8), the D3 to D5 region of Inv497 is structurally similar to a 280-residue fragment of the extracellular portion of enteropathogenic E. coli intimin (14).

Figure 1

(A) Ribbon diagram of the structure of Y. pseudotuberculosis Inv497. Residues implicated in integrin binding [Asp911, Asp811(7, 20), and possibly Arg883] are green (24). The α-helical regions in D5 and a 310helix in D4 are red. The disulfide bond in D5 is yellow, and β strands are blue (D4 and D5) or green (D1 through D3). (B) Topology diagrams for domains of invasin and related proteins. Inv497 D5 is shown beside a canonical C-type lectin CRD [from E-selectin (14)]; Inv497 D4 is shown beside a C1-type IgSF domain. The β strands are blue, helices are red, and disulfide bonds are yellow. The calcium-binding loop in E-selectin (residues 54 to 89) and its truncated counterpart in Inv497 (residues 956 to 959) are green. (C) (left) Hydrogen bonding pattern of the interrupted helix (18) in D5. Main-chain atoms are shown for residues in the α helix (24). Side chains are shown for those residues in which main-chain atoms form hydrogen bonds (dashed light blue lines) across the break in the helix. Other side chains have been omitted for clarity. The carbon-α trace of the loop is shown in gray. Red, blue, and black balls are oxygen, nitrogen, and carbon atoms, respectively. (right) The Inv497 model (24) in the region of the loop (gray in left panel) of the interrupted helix superimposed on a 2.3 Å σA-weighted 2|F obs| − |F calc| annealed omit electron density map contoured at 1.0σ (map radius, 3.5 Å) (12). (D) Schematic model of the structure of intact invasin in which the ∼500 NH2-terminal residues reside in theYersinia outer membrane (OM) (yellow) in a porin-like structure (7) (red), and the Inv497 portion of invasin (green and blue) projects ∼180 Å from the outer membrane.

Table 1

Summary of data collection and refinement statistics for Inv497. Inv497 crystals (space group P21,a = 61.1 Å, b = 50.7 Å,c = 97.9 Å, β = 98.3°; one molecule per asymmetric unit) were grown at 22°C in hanging drops by combining 1 μl of protein solution [Inv497 (5 to 10 mg/ml), 20 mM Hepes at pH 7.0, and 1 mM EDTA] with 1 μl of precipitant solution (20 mM sodium citrate at pH 5.6, 20% polyethylene glycol 4000, and 20% isopropanol). Crystals were improved by microseeding. SeMet crystals, derived from selenomethionine-substituted Inv497 protein (9), grew under similar conditions. For cryoprotection, 5 μl of mother liquor containing 25% isopropanol was added to the crystals immediately before transferring them to liquid nitrogen. A cryocooled xenon derivative was prepared by mounting a cryoprotected crystal in a nylon loop and subjecting it to 200 psi of xenon for 2.5 min in a xenon pressure cell (11). A small microfuge tube containing excess mother liquor was placed in the pressurization chamber to maintain vapor pressure and prevent cracking of the crystals. Immediately after depressurization, the crystals were transferred to liquid nitrogen. The PIP derivative was prepared by the addition of one grain of PIP to a drop containing several crystals, followed by soaking for 5 hours. Data from the native and the xenon derivative crystals were collected at −170°C at a wavelength of 0.98 Å on a MAR Research image plate detector at beam line 9-1 at SSRL. Data from the PIP and SeMet derivatives were collected at −170°C on an RAXIS IIC image plate using a Rigaku rotating anode. Statistics in parentheses refer to the highest resolution bin. Phasing, model building, and refinement were done as described (11,12).

View this table:

The four NH2-terminal domains of Inv497 adopt folds resembling eukaryotic members of the immunoglobulin superfamily (IgSF) (15), although the Inv497 domains do not share significant sequence identity with IgSF domains and lack the disulfide bond and core residues conserved in IgSF structures (8,15). D1 belongs to the I2 set of the IgSF, and D2 and D3 belong to the I1 set (15). D4 adopts the folding topology of the C1 set of IgSF domains, a fold seen in the constant domains of antibodies, T cell receptors, and major histocompatability complex (MHC) molecules (15). Unlike these C1 domains, D4 of Inv497 includes a 15–amino acid insertion between strands A and B that forms two additional β strands (A" and A‴) (Fig. 1B). D1 and D2 of the intimin fragment are also Ig-like, and the second domain includes an insertion similar to that found in Inv497 D4 (14).

D5 of Inv497 has a folding topology related to that of C-type lectin-like domains (CTLDs) (Fig. 1B) (16). This superfamily includes true C-type lectins such as mannose-binding protein (14) and E-selectin (14), which contain carbohydrate recognition domains (CRDs) that bind carbohydrates in a calcium-dependent manner, and evolutionarily related proteins such as the Ly49 family of natural killer cell receptors, which bind ligands in the absence of calcium and may not recognize carbohydrates (16). A characteristic feature of C-type lectin CRDs is a long stretch of extended structure including one or two calcium-binding sites, which is required for carbohydrate recognition (16). The COOH-terminal domains of Inv497 and intimin lack these calcium-binding loops (Fig. 1B) (14, 16). Inv497 is not known to bind carbohydrates (17); thus, the importance of the CTLD fold remains to be determined. By analogy with Ly49A, which recognizes a carbohydrate-independent epitope on its class I MHC ligand (16), Inv497 may recognize an unglycosylated region of integrins.

Like CTLDs and structurally related proteins such as the COOH-terminal domain of intimin (D3) (14), Inv497 D5 is composed of two antiparallel β sheets with interspersed α-helical and loop regions and includes a disulfide bond linking helix 1 to β strand 5 (Fig. 1B). An additional disulfide bond linking β strand 3 and the loop following strand 4 is found in CTLDs and CRDs but is absent in Inv497 D5 and intimin D3. Whereas C-type lectin CRDs contain two α helices located between the first and second β strands, the region corresponding to the second helix is replaced by a loop in Inv497 D5 (Fig. 1B) and CD94, a component of the CD94/NKG2 natural killer cell receptor (14). In Inv497 D5, the loop is preceded by a two-turn α helix (residues 917 to 921 and 931 to 936) interrupted by a nine-residue loop (residues 922 to 930) (Fig. 1C) (18). The corresponding region in intimin was not interpretable in the nuclear magnetic resonance structure (14).

Extensive interactions between Inv497 D4 and D5 create a superdomain that is composed of the 192 residues identified as necessary and sufficient for integrin binding (7). The interface between D4 and D5 is significantly larger than the interfaces between tandem IgSF domains and between the Ig-like invasin domains (D4 to D5 buried surface area is 1925 Å2 in comparison with ∼500 Å2 for IgSF interfaces) (19). The D4-D5 interface is predominantly hydrophobic, although a number of hydrogen bonds are also present (Fig. 2). The interrupted helix in D5 and strands A" and A‴ in D4 (Fig. 1B) play a major role in the interaction between these two domains. In particular, a portion of the loop within the interrupted helix in D5 contacts the A‴ strand in D4 (Fig. 2). In addition, strand A‴ hydrogen bonds with strand 1 of D5, extending the second β sheet of the CTLD (Fig. 1B). The large buried surface area at the D4-D5 interface and the consequent rigidity of this portion of invasin contrasts with the flexibility between the integrin-binding portions of fibronectin, inferred from interdomain buried surface areas that are lower than average at these interfaces (Fn-III 9–10 and Fn-III 12–13) (14, 19). Interdomain flexibility in fibronectin was proposed to facilitate integrin binding (15) and is also observed in the structures of two other integrin-binding proteins, ICAM-1 (14) and VCAM-1 (14). However, invasin, which shows little or no interdomain flexibility in its integrin-binding region, binds at least five different integrins and binds α5β1 with an affinity that is ∼100 times that of fibronectin (5, 10). High-affinity binding of invasin is necessary for bacterial internalization, as studies have shown that bacteria coated with lower affinity ligands for α5β1 bind, but do not penetrate, mammalian cells (5, 10).

Figure 2

Comparison of interdomain interfaces in integrin-binding regions of Inv497 (D4–D5), fibronectin type III repeats 9 and 10 (D9–D10) (14), and VCAM-1 (D1–D2) (14). Hydrogen bonds are shown as dashed yellow lines. Additional hydrogen bonds, van der Waals contacts, and a three- to fivefold larger interdomain surface area (19) stabilize Inv497 D4–D5 and restrict interdomain flexibility, compared to the other interfaces.

Invasin residues that are important for integrin binding include 903 to 913 (7, 20), which form helix 1 and the loop after it in D5. The disulfide bond between Cys906 and Cys982, conserved in all CTLDs (Fig. 1B), is required for integrin binding (20), presumably because it is necessary for correct folding. Although invasin lacks an Arg-Gly-Asp (RGD) sequence, which is critical for the interaction of Fn-III 10 with integrins (4), an aspartate in Inv497 D5 (Asp911) is required for integrin binding (7,20). Like the aspartate in the Fn-III RGD sequence, Asp911 is located in a loop (Figs. 1A and3B). Other host proteins, such as VCAM-1 and MAdCAM-1, which bind integrins that lack I domains, also contain a critical aspartate residue on a protruding loop (15). By contrast, ligands of I domain–containing integrins, such as the ICAM proteins, present their acidic integrin-binding residue in the context of a β strand rather than a loop (15). A second region of invasin that is ∼100 amino acids from Asp911contains additional residues that are implicated in integrin binding, including Asp811 (Figs. 1A and 3B) (20). This region of invasin is reminiscent of the fibronectin synergy region located in Fn-III 9, which is required for maximal α5β1 integrin-dependent cell spreading (21). Invasin Asp811 is located in D4 between strands A" and A‴ and lies on the same surface as Asp911, separated by 32 Å (measured between carbon-α atoms). The distance between Fn-III 10 Asp1495 in the RGD sequence and Fn-III 9 Asp1373 in the synergy region is also 32 Å (14), although the side-chain orientation of Asp1373 differs from that of Asp811 in invasin (Fig. 3). Within the Fn-III synergy region, a critical residue for integrin binding is Arg1379 [32 Å from Asp1495 (Fig. 3B)] (21). The invasin synergy-like region also includes a nearby arginine, Arg883[32 Å from Asp911 (Fig. 3B)]. The overall similarity in the relative positions of these three residues suggests that invasin and host proteins share common integrin-binding features.

Figure 3

Comparison of integrin-binding regions of invasin and fibronectin. Despite different folding topologies and surface structures, the relative positions of several residues implicated in interactions with integrins are similar [Asp811, Asp911, and Arg883 in Inv497; Asp1373, Asp1495, and Arg1379 in Fn-III 9 and 10; (aspartates are red; arginines are blue)]. (A) Surface representations (12) of the structures of Inv497 and Fn-III 7–10 (14). (B) Ribbon representations of Inv497 D4–D5 and Fn-III 9–10 (24). Addition of one or more residues to the COOH-terminus of invasin (indicated as “COO”) interferes with integrin binding (25), suggesting that the rather flat region between Asp811 and Asp911 is at the integrin-binding interface. By contrast, the integrin-binding surface of fibronectin contains a cleft resulting from the narrow link between Fn-III 9 and 10.

The transmembrane regions of outer membrane proteins of known structure are β barrels, as represented by the structures of porins (7). Assuming that the membrane-associated region of invasin is also a β barrel (7), the structure of intact invasin may resemble the model shown in Fig. 1D, in which the cell-binding region projects ∼180 Å away from the bacterial surface, ideally positioned to contact host cell integrins. Similarities between invasin and fibronectin demonstrate convergent evolution of common integrin-binding properties. However, the integrin-binding surface of invasin does not include a cleft, as found on the binding surface of fibronectin (Fig. 3); thus, invasin may bind integrins with a larger interface. Together with the restricted orientation of the invasin integrin-binding domains, a larger binding interface provides a plausible explanation for the increased integrin-binding affinity of invasin as compared with fibronectin. Differences between the integrin-binding properties of invasin and fibronectin illustrate how a bacterial pathogen is able to efficiently compete with host proteins to establish contact and subsequent infection, thereby exploiting a host receptor for its own purposes.

  • * To whom correspondence should be addressed. E-mail: bjorkman{at}


View Abstract

Navigate This Article