Research Article

The Genome of the Natural Genetic Engineer Agrobacterium tumefaciens C58

See allHide authors and affiliations

Science  14 Dec 2001:
Vol. 294, Issue 5550, pp. 2317-2323
DOI: 10.1126/science.1066804


The 5.67-megabase genome of the plant pathogen Agrobacterium tumefaciens C58 consists of a circular chromosome, a linear chromosome, and two plasmids. Extensive orthology and nucleotide colinearity between the genomes of A. tumefaciens and the plant symbiont Sinorhizobium meliloti suggest a recent evolutionary divergence. Their similarities include metabolic, transport, and regulatory systems that promote survival in the highly competitive rhizosphere; differences are apparent in their genome structure and virulence gene complement. Availability of the A. tumefaciens sequence will facilitate investigations into the molecular basis of pathogenesis and the evolutionary divergence of pathogenic and symbiotic lifestyles.

Agrobacterium tumefaciens is an α-proteobacterium of the family Rhizobiaceae and a member of the diverseAgrobacterium genus. A ubiquitous soil organism and etiological agent of the plant disease crown gall (1),A. tumefaciens infects more than 90 families of dicotyledonous plants, resulting in major agronomic losses (2,3). The gall results from the transfer, integration, and expression of a discrete set of genes (T-DNA) located on the tumor-inducing (Ti) plasmid. Expression of these genes leads to biosynthesis of plant growth hormones as well as a bacterial nutrient source called opines (4). The processing and transfer of the T-DNA is mediated by the Ti plasmid virulence (vir) genes, and several virulence determinants initially characterized in A. tumefaciens have been found in plant symbionts and animal pathogens (5–7).

The genes within the T-DNA can be replaced by any DNA sequence, makingA. tumefaciens an ideal vehicle for gene transfer and an essential tool for plant research and transgenic crop production. The research and commercial potential of A. tumefaciens has been broadened under laboratory conditions to include the transfer of T-DNA to recalcitrant plants, fungi (8), and human cells (9).

A. tumefaciens shares a similar habitat and close evolutionary relationship with the nitrogen-fixing symbionts of the Rhizobiaceae (10). Indeed, the introduction of a symbiotic plasmid from Rhizobium phaseoli into A. tumefaciens results in the weak but measurable formation of nitrogen-fixing root nodules (11), suggesting a shared genetic background. The recent publication of the genome sequences of two Rhizobiaceae, Sinorhizobium meliloti (12) andMesorhizobium loti (13), allowed a genome-wide comparison with A. tumefaciens. We present the results of this comparison as well as a detailed analysis of the genome ofA. tumefaciens strain C58 (14, 15).

General features of the genome.

The 5.67-Mb genome of A. tumefaciens C58 (16) comprises four replicons (17): a circular chromosome, a linear chromosome, and the AtC58 and TiC58 plasmids (Table 1 and Fig. 1). The genome contains 5419 predicted protein-coding genes (14), of which we have assigned a putative function to 3475 (64.1%). The remaining 1944 genes (35.9%) include 1236 conserved hypothetical genes (22.8%) whose predicted products are similar to proteins of unknown function in other genomes, and 708 hypothetical genes (13.1%) with no significant matches in the sequence databases (Table 1). Our analysis assigns the A. tumefaciens genes to 501 paralogous families containing from 2 to 206 members (14). The two largest families are composed of genes belonging to the adenosine triphosphatase (ATPase) and membrane-spanning components of the ATP binding cassette (ABC) transport family.

Figure 1

(facing page) Schematic representation of the A. tumefaciens genome. Chromosomes are drawn to scale with plasmids represented at 5× or 10× magnification, as indicated. The outer two bands indicate opposing transcriptional orientations of predicted genes. Colors indicate orthology to proteins in the S. meliloti replicons: blue, chromosome; green, pSymA; gold, pSymB; red, nonorthologous. The inner circle depicts GC content for each coding region, with lower GC content indicated by darker shading. Thevir and T-DNA regions of pTiC58 and the AT island of pAtC58 are indicated. Orthologs were identified by comparison of predicted proteins for each A. tumefaciens replicon with the genome of S. meliloti. Two proteins were considered orthologous if their BLASTP alignment covered at least 60% of each protein at an expect value of less than or equal to 10−5. Proteins that did not match these criteria were considered nonorthologous (14).

Table 1

General features of the A. tumefaciens C58 genome.

View this table:

The overall GC content of the A. tumefaciens genome is 58%. The TiC58 plasmid has two regions of distinctive GC content: the T-DNA (46%) and the vir region (54%) (Fig. 1). Low GC content was noted previously in the T-DNA of a related Ti plasmid (18). Reduced GC content (53%) is also seen within a 24-kb segment of pAtC58 (AT island, Fig. 1). This region includes 17 conserved hypothetical or hypothetical genes, an ATP-dependent DNA helicase, and an insertion sequence (IS) element. These genes are flanked by a phage integrase and a second IS element. The genes in these three regions have a distinct codon usage as compared to the rest of the genome, consistent with their recent evolutionary acquisition (14).

The genome contains 53 transfer RNAs (tRNAs) that represent all 20 amino acids (Table 1). These tRNAs are distributed unevenly between the circular and linear chromosomes. Transfer RNA species corresponding to the most frequently represented alanine, glutamine, and valine codons are found only on the linear replicon. The genome contains 25 predicted IS elements representing eight different families (14). The largest is the IS3 family comprising 10 IS elements. The IS elements are not equally distributed among the replicons but are located preferentially on the linear chromosome and pAtC58 (Table 1). The adjacent virH1 andvirH2 genes of the Ti plasmid, encoding p450 mono-oxygenases (19), are flanked by IS elements, which suggests that they arrived in A. tumefaciens as part of a compound transposon. Twelve genes of probable phage origin were identified, most of which are on the circular chromosome (Table 1). Many of these genes cluster in two discrete regions and thus may represent prophage remnants. None of these clustered phage-related genes are shared withS. meliloti, which implies that they were lost from S. meliloti or entered the A. tumefaciens genome after these organisms evolutionarily diverged.

Phylogeny and whole-genome comparison.

A comparison with all sequenced organisms reveals that the A. tumefaciens proteome is most similar to that of two rhizobial species, S. meliloti and M. loti(14). This result was obtained by cataloging top BLAST hits of predicted A. tumefaciens proteins and by classifying predicted proteins into clusters of orthologous groups (Fig. 2A) (20). Of the two rhizobial species, the A. tumefaciens proteome is most similar to that of S. meliloti. Phylogenetic analyses of broadly conserved proteins indicate that this similarity results fromA. tumefaciens and S. meliloti sharing a recent common ancestor, and not from gene loss or branch rate variation (Fig. 2, B and C).

Figure 2

Comparisons with fully sequenced genomes. (A) Distribution of best hits based on a comparison of predicted proteins of A. tumefaciens with proteins from all published genomes. (B and C) Phylogenetic trees generated using two broadly conserved proteins. The trees were generated using PAUP distance methods and a distance calculation based on PAM matrices (14).

Sinorhizobium meliloti has a circular chromosome (3.65 Mb) and two plasmids (1.68 Mb and 1.35 Mb), with a total genome size 1.1 Mb larger than that of A. tumefaciens (12). The circular chromosomes of these organisms show extensive nucleotide colinearity and gene order conservation (Fig. 3) (14). Previously, such extensive colinearity has only been seen between members of the same genus. Chromosome-wide conservation of gene order is less pronounced between S. meliloti and M. loti(14). The comparison of the circular chromosomes ofA. tumefaciens and S. meliloti also reveals major rearrangements near the putative replication origin and termini (Fig. 3, regions A and B). Similarly located rearrangements are commonly seen between closely related bacteria (21).

Figure 3

Alignment of the proteomes of the S. meliloti chromosome and the A. tumefaciens circular chromosome. Each point in the figure is a bidirectional best hit. These hits were obtained by pairwise BLASTP searches of predicted A. tumefaciens proteins against those of S. meliloti with a maximum expect value of 10−4 (14). Putative origins (region A) and termini (region B) of replication are indicated, as well as a sizable region lacking colinearity (region C).

A comparison of the other replicons of A. tumefaciens with all replicons of S. meliloti reveals a mosaic pattern of ortholog distribution (Table 2 andFig. 1). These orthologs are distributed across the A. tumefaciens elements as individual genes and small regions of gene order conservation. Two regions of the linear replicon exhibit extensive conservation of gene order with a segment of the S. meliloti chromosome (Fig. 3, region C). The first comprises 46 genes (44 kb) and the second contains 65 genes (89 kb). These regions are partially conserved in the M. loti chromosome. The large number of orthologs and the lack of extensive gene order conservation suggest that the smaller A. tumefaciens replicons underwent substantial rearrangement since the organisms diverged. This finding is consistent with differential evolutionary pressures acting on these elements. The nonorthologous genes, many of which are seen on the Ti plasmid, reflect lineage-specific gene loss or acquisition from other species. Taken together, these data support the recent evolutionary divergence of A. tumefaciens and S. meliloti.

Table 2

Number of orthologous genes of A. tumefaciens with respect to S. meliloti. The number of orthologous genes is shown in bold, with the percentage of eachA. tumefaciens replicon they represent shown in square brackets. The remainder of the genes, which are not orthologs, are shown in the last row. Numbers of putative protein coding genes for each replicon are shown in parentheses.

View this table:

Genus-specific genes.

Comparison of the genomes of A. tumefaciens, S. meliloti, and M. loti identified genes in each organism that likely contribute to genus-specific biology (14). Of the 5419 predicted A. tumefaciens proteins, 853 (16%) are not found in these other organisms. Of these, 97 have an assigned function, whereas 756 are hypothetical or conserved hypothetical. The predicted products of these genes are diverse and include proteins involved in cellulose production, plasmid maintenance, cell growth, transcriptional regulation, and cell wall synthesis. Several additional proteins are predicted to catabolize plant cell wall materials, sugars, and exudates. These include polygalacturonases, a glycosidase, an endoglucanase, a myo-inositol catabolism protein, and a cell wall lysis–associated protein. Additional genes, predictably found on the Ti plasmid, include those encoding virulence, T-DNA, and conjugal transfer–associated proteins. With 756 open reading frames (ORFs) yet to characterize, much remains to be elucidated regarding the genetic distinction between A. tumefaciens and its Rhizobiaceae relatives.

Linear chromosome.

Linear replicons, the predominant genetic element in eukaryotes, have been identified in only a few prokaryotes. These include members of the genera Borrelia and Streptomyces (22,23). Although sequence analysis did not reveal distinct features associated with terminal secondary structures, Goodner et al. found that the termini of this replicon are covalently linked (15). This covalent linkage did not prevent nearly complete sequencing of the replicon termini as confirmed by Southern analysis (14). Proteins associated with the maintenance of linear ends in other systems, such as telomerases or theStreptomyces tpg proteins (24), are absent in A. tumefaciens. One notable feature of the replicon termini is the presence of IS elements near each end. The evolutionary origin of this replicon awaits investigation, as does the mechanism that A. tumefaciens uses to maintain it in a linear form.

There are 1882 protein-coding genes on the linear replicon, including those encoding ribosomal and DNA replication proteins, as well as 21 complete metabolic pathways. The presence of these genes confirms the chromosomal identity of this replicon. Additional features, however, resemble those traditionally associated with plasmids. For example, genes whose products are similar to the conjugative proteins TraA, MobC, and TraG are present, although an oriT is not apparent. Further, an intact and highly conserved repABCoperon, the definitive element of the RepABC-type replicator family of circular plasmids, is located near the center of the linear chromosome. The presence of this operon, coupled with a colocalized GC-skew inversion, indicates a bidirectional plasmid-like mode of replication. If experimentally verified, this replication mechanism would prove unique among known linear replicons.

Plasmid replication and transfer.

Replication of both pTiC58 and pAtC58 is mediated by RepABC-type systems commonly found in plasmids of the Rhizobiaceae. It is likely that the origin of replication for these plasmids is adjacent to therepC gene (25). In contrast to the pSymB plasmid of S. meliloti (12), both A. tumefaciens plasmids contain all necessary machinery for conjugation and do not contain essential genes. A new conjugal transfer system belonging to the Type IV secretion family (AvhB) (26) was identified on pAtC58. In contrast to the tight control of Ti plasmid conjugal transfer mediated by specific opines that activate quorum sensing (27), the conjugal transfer of pAtC58 appears to be constitutive.


Transporters constitute 15% of the A. tumefaciens genome, 87% of which are found on the chromosomes (14). These systems are predicted to confer broad capabilities for the transport of common nutrients found in the rhizosphere, including sugars, amino acids, and peptides. In addition, there are 11 LysE/RhtB amino acid efflux proteins, almost double the number seen in any bacterium outside of the Rhizobiaceae (12, 28). These transporters may function in the export of homoserine lactones or other signal molecules. There are also a large number of high-affinity tripartite ATP-independent periplasmic (TRAP) dicarboxylate transporters (29). Our analyses indicate that A. tumefaciens and the other sequenced members of the Rhizobiaceae have similar transport capabilities.

Like both S. meliloti and M. loti, A. tumefaciens has an abundance of ABC transporters, constituting 60% of its total transporter complement. There are 153 complete systems plus additional “orphan” subunits. The number of ABC transporters found in these organisms is greater than that found in any sequenced eukaryote and more than double the number found in any sequenced bacterium (28, 30). Predicted substrates of these ABC transport systems include sugars (53 systems), amino acids (29 systems), and peptides (25 systems). Other organisms with large ABC transporter complements include photosynthetic bacteria such asSynechocystis PCC6803 and organisms that lack a tricarboxylic acid (TCA) cycle and an electron transfer chain, like the mycoplasmas and Thermotoga maritima(28). The generation of large ATP pools in these organisms, via photosynthesis or F-type ATPases, may explain their preference for ATP-driven transport. In contrast, the preference for ABC transporters in A. tumefaciens may reflect a need for high-affinity uptake systems for the acquisition of nutrients in the highly competitive soil and rhizosphere environments.


Bacteria that inhabit diverse environments tend to have large complements of regulatory genes (31). Consistent with this, regulatory genes constitute a substantial proportion (9%) of theA. tumefaciens genome (Table 1) (14). This regulatory capacity likely facilitates survival of A. tumefaciens within the dynamic soil and rhizosphere environments. The genome encodes 11 extracellular sigma factors, proteins implicated in stress responses in other organisms (32). In addition, although several LuxR family motifs are evident, only one previously identified acyl-homoserine lactone synthase (traI) known to be involved in quorum sensing was detected (4). Several proteins are similar to eukaryotic signal transduction proteins rarely found in bacteria, including four regucalcin-like calcium-binding regulators and a serine-threonine kinase. As is true of other α-proteobacteria, norpoS gene was identified. However, A. tumefaciensdoes have a homolog of the HF-1 protein known to regulate stationary phase and oxidative stress responses in Escherichia coli andBrucella abortus (33).

Our analysis identified numerous nucleotide cyclases in the plant symbionts S. meliloti and M. loti (25 and 12, respectively) and in the evolutionarily distinct human pathogenMycobacterium tuberculosis (12). These cyclases are rarely found in other bacterial genomes. The nucleotide cyclases in S. meliloti have been noted previously and were postulated to function in signal transduction (12). Contrary to our expectation, there are only three nucleotide cyclases in A. tumefaciens. It is unclear why the nitrogen-fixing plant symbionts share similarly large numbers of nucleotide cyclases with a human pathogen, whereas few such genes are found in the evolutionarily related A. tumefaciens.

Attachment, cell surface, and secretion.

The initial interaction of Agrobacterium with its plant hosts is mediated by several attachment-related genes (34). These include the chvA,chvB, exoC, and cellulose synthesis genes as well as the pAtC58-localized att region. Several additional genes encode proteins similar to adhesins in mammalian pathogens, including BfrA of E. coli and PsaA of Streptococcus pneumoniae.

Pili are extracellular appendages often required for bacterial association with their hosts. Although only the pilus encoded by thevirB operon has been experimentally confirmed (35), the trb operon required for Ti plasmid conjugation likely produces a pilus (36). TheavhB and ctp clusters, identified by our analyses, may also produce pili. Additional surface components include exopolysaccharides, lipopolysaccharides, and capsular polysaccharides, whose biosynthetic genes are primarily located on the linear chromosome. Such surface polysaccharides are commonly involved in invasion, growth, and survival of plant-associated bacteria.

Five protein secretion systems are found among Gram-negative bacteria (37), at least three of which are represented in A. tumefaciens. These include four potential type I secretion systems. Although components of the main terminal branch of type II secretion appear to be absent, the Sec system for protein secretion across the inner membrane is intact. There is, however, a type IV pilus biogenesis system with components similar to those of type II secretion systems. Similar to S. meliloti(12), no type III secretion system was identified.Agrobacterium tumefaciens encodes three type IV secretion systems: VirB, Trb (27), and AvhB. The genome also contains the twin arginine targeting system, Tat/Mtt (38).


To date, most virulence determinants of A. tumefaciens have been found on the Ti plasmid. Other than the virBoperon, these genes are not found in S. meliloti. The TiC58 plasmid contains a single T-DNA region, in contrast to the two found in a number of other strains (18), and the 25–base pair (bp) border regions that delineate the T-DNA are not present elsewhere in the genome.

The availability of the genome sequence has enabled the identification of genes whose products are similar to plant pathogen virulence proteins required for host cell wall degradation. These include pectinase (kdgF), ligninase (ligE), and xylanase as well as regulators of pectinase and cellulase production (pecS/M); A. tumefaciens may use such enzymes to breach the cell wall of its host before T-DNA transfer.

In addition, we have identified numerous orthologs of animal virulence genes. Examples include those involved in host survival, such as thebacA locus of Brucella (39) and two members of the widely conserved HtrA family of serine proteases implicated in response to oxidative stress in Salmonella andYersinia (40). Interestingly, a bacAhomolog is involved in S. meliloti symbiosis (41). Invasion-related homologs include the ialAand ialB genes of Bartonella henselae(42) as well as five hemolysin-like proteins with associated type I secretion systems. The highly conserved mviN gene, implicated in Salmonella virulence (43), is also present.


Agrobacterium tumefaciens grows on minimal medium and therefore possesses all pathways required for prototrophic growth, an observation confirmed by our computational pathway analysis (14). These metabolic pathways are dispersed among the four replicons. Unlike their organization in E. coli, most genes of these pathways are not tightly clustered, which suggests that they are not present in operons. We identified pathways for the synthesis of all 20 amino acids as well as numerous enzyme cofactors. At least one nonribosomal protein synthesis system for the production of polyketides was identified. Encoded energy metabolism pathways include glycolysis, TCA cycle, and Entner-Doudoroff.Agrobacterium tumefaciens can catabolize 17 amino acids, including S-adenosylhomocysteine and 4-hydroxyproline. Pathways for use or degradation of plant metabolites typically found in the rhizosphere were also detected. These include sugars such as glucose, fructose, sucrose, ribose, xylose, xylulose, and lactose as well as compounds such as myo-inositol, hydantoin, urea, and glycerol. The capacity to metabolize glucuronate, galactonate, galactarate, gluconate, ribitol, glycogen, quinate, l-idonate, creatinine, stachydrine, ribosylnicotinamide, and 4-hydroxymandelate was also detected. Chemotaxis systems responding to many of these compounds are present in A. tumefaciens(44).

Agrobacterium tumefaciens encodes a variety of proteins that may protect against toxic compounds in the environment. Examples include four cytochrome p450s, two of which have been previously identified. One of these has been shown to modify ferrulic acid, an inducer of the vir genes (45). These highly oxidative enzymes may also detoxify or modify plant-derived compounds, including phytoalexins (46) and protocatechuate, and xenobiotics such as 1,2-dichloroethane, cyanate, 1,4-dichlorobenzene, and octane. In addition, antibiotic resistance genes targeted against tetracycline (47) and chloramphenicol are present.

Many components of nitrogen metabolism are conserved between A. tumefaciens and the nitrogen-fixing symbionts S. meliloti and M. loti. Examples include components of the nitrogen regulation (Ntr) system such as ntrBC,ntrXY, ntrA, glnE, glnD,glnB, and glnK. Agrobacterium tumefaciens harbors seven glutamine synthetase (GS) genes, which encode GS types I, II, and III. The presence of multiple GS genes may relate to the observation that A. tumefaciens requires high concentrations of glutamate for optimal growth. Other members of the Rhizobiaceae also contain multiple GS genes. In addition, A. tumefaciens has a gene predicted to encode the large hexameric adenosine monophosphate (AMP)–dependent glutamate dehydrogenase (48), but a gene encoding the AMP-independent glutamate dehydrogenase was not identified. In contrast, S. melilotiand M. loti contain both genes. The A. tumefaciens linear chromosome carries denitrification genes, including a periplasmic dissimilatory nitrate reductase (nap), nitrite reductase (nir), and nitric oxide reductase (nor), but lacks nitrous oxide reductase (nos). In contrast, all of these genes are present inS. meliloti. Nitrate transport genes are also located on the linear chromosome. Although A. tumefaciens is considered an aerobe, the existence of these genes implies that it could use nitrate as an electron acceptor under anaerobic conditions. As expected,A. tumefaciens lacks the subunits of nitrogenase and its cofactors. Most of the nod genes are also absent, except for three genes similar to those involved in nod factor production,nodL, nodX, and nodN.


The combination of a linear and a circular chromosome is found in only a few members of the genus Agrobacterium (49). This observation represents a key evolutionary distinction betweenA. tumefaciens and S. meliloti. On the basis of 16S ribosomal DNA phylogenetic analyses, it has been proposed that the genus Agrobacterium be reclassified into the genus Rhizobium (10). Combining what has been elucidated regarding genome structure with the complete genome sequence should allow a more accurate definition of the taxonomic position ofA. tumefaciens in the Rhizobiaceae.

One striking finding from our analysis is the extensive similarity of the circular chromosomes of A. tumefaciens and the plant symbiont S. meliloti, which supports the view that these bacteria originated from a recent common ancestor. Galibert et al. speculate that the S. meliloti chromosome was present in a progenitor that later acquired pSymA and pSymB (12). The mosaic structure of the A. tumefacienslinear chromosome and plasmids, predominantly composed of orthologs found on each of the S. meliloti replicons, suggests that these organisms diverged after acquisition of the pSymA and pSymB ancestral molecules by this progenitor.

Recent models of bacterial evolution suggest that the differential acquisition and loss of genes in organisms that inhabit the same environment allows divergence into symbiotic and pathogenic lifestyles (50). The acquisition of such elements is apparent in both A. tumefaciens and S. meliloti. The nod genes of S. meliloti (12, 51), as well as the vir genes and T-DNA of A. tumefaciens, display GC content and codon usage distinct from the rest of the genome, which suggests recent evolutionary acquisition. In the case of the T-DNA, reduced GC content may facilitate expression in the plant host, where lower GC content is common. Moreover, none of the T-DNA and few of the vir genes of A. tumefacienshave orthologs in S. meliloti, and most nod genes are not found in A. tumefaciens. Differential selection and maintenance of such horizontally acquired genes likely led to the divergence into pathogenic and symbiotic states. Thus, these organisms provide a rich model system for further investigations into the evolutionary divergence of pathogens and symbionts.

As the central biological tool in the generation of transgenic plants for research and agriculture, A. tumefaciens, and the availability of its genome sequence, will continue to have an impact on plant biotechnology. Detailed studies, supplemented by this sequence, should lead to a directed refinement of plant transformation that increases both the host range and transformation efficiency of this versatile genetic tool. Genes likely to be targeted by such work include potential virulence factors that are shared between plant and animal pathogens. Examination of these genes in the genetically tractable Agrobacterium system may also serve to elucidate the molecular role they play in animal pathogens. It is our hope that this work will broaden the scientific foundation from which to address the worldwide debate over the production, use, and safety of genetically modified organisms.

  • * Present address: Department of Pathology, University of Washington, Box 357470, Seattle, WA 98195, USA.

  • Present address: Gene Function & Target Validation, Celltech R&D Inc., Bothell, WA 98021, USA.

  • Present address: Department of Plant Pathology, Kansas State University, 113 Waters Hall, Manhattan, KS 66506, USA.

  • § To whom correspondence should be addressed. E-mail: gnester{at}


View Abstract

Stay Connected to Science

Editor's Blog

Navigate This Article