Research Article

Genome Sequence of the Plant Pathogen and Biotechnology Agent Agrobacterium tumefaciens C58

See allHide authors and affiliations

Science  14 Dec 2001:
Vol. 294, Issue 5550, pp. 2323-2328
DOI: 10.1126/science.1066803


Agrobacterium tumefaciens is a plant pathogen capable of transferring a defined segment of DNA to a host plant, generating a gall tumor. Replacing the transferred tumor-inducing genes with exogenous DNA allows the introduction of any desired gene into the plant. Thus, A. tumefaciens has been critical for the development of modern plant genetics and agricultural biotechnology. Here we describe the genome of A. tumefaciens strain C58, which has an unusual structure consisting of one circular and one linear chromosome. We discuss genome architecture and evolution and additional genes potentially involved in virulence and metabolic parasitism of host plants.

Agrobacterium tumefaciensis a plant pathogen with the unique ability to transfer a defined segment of DNA to eukaryotes, where it integrates into the eukaryotic genome. This ability to transfer and integrate DNA is used for random mutagenesis and has been adapted into a powerful tool for production of transgenic plants, including soybean, maize, and cotton (1, 2). A. tumefaciens was identified early in the 20th century as the causal agent of crown gall disease in plants (3). Pathogenesis is initiated whenAgrobacterium detects small molecules released by actively growing cells in a plant wound. These molecules induce a series of virulence (vir) genes whose encoded products export the single-stranded transferred DNA (T-DNA) to the plant cell, where it integrates into the genome at an essentially random location. Once integrated, T-DNA gene expression alters plant hormone levels, leading to cell proliferation typical of a gall tumor. The T-DNA also encodes enzymes for the synthesis of opines, a class of nutrient molecules used almost exclusively by A. tumefaciens(4–7).

A. tumefaciens strains fall into three biovars, which differ in their host range, metabolic characteristics, relationships with other genera in the family Rhizobiaceae, and potentially their chromosome structure (4–13). The taxonomy of the Rhizobiaceae family is not without controversy, but we expect that phylogenetic analysis that includes genome sequences should help resolve this issue (12, 13). Until that time, we believe that the biovar concept is the most valuable for understanding the diversity within the genus Agrobacterium. We analyzedA. tumefaciens strain C58, a biovar 1 strain with an unusual genome structure known only in this genus: one circular chromosome and one linear chromosome. All pathogenic strains also harbor a tumor-inducing (Ti) plasmid that encodes the T-DNA and virulence genes, and they are classified by the types of opines induced and metabolized (4–7). Some strains also harbor additional plasmids, but previous studies focused heavily on Ti plasmids, including the recently completed sequences of a nopaline-agrocinopine–type plasmid, pTi-SAKURA, and a consensus plasmid based on several closely related octopine-type plasmids (4–6, 14–16).A. tumefaciens C58, isolated from a cherry tree (Prunus) tumor, carries two plasmids: pAtC58 and the nopaline-agrocinopine–type pTiC58 (17). This strain has been intensively studied and is the parent of many strains used for the genetic transformation of plants (2,4–6).

Agrobacterium is an α-proteobacterium within the Rhizobiaceae family, an agriculturally important group that includes the nitrogen-fixing symbiotic partners of legumes (18). Six complete genomes are now available within the α-proteobacteria:Sinorhizobium meliloti and Mesorhizobium loti(both nitrogen-fixing symbionts), Caulobacter crescentus, two Rickettsia species, and A. tumefaciens (19–24). The mammalian pathogens Brucella and Bartonella are also related, so comparative genomics may reveal commonalities and differences between pathogens and symbionts (25,26).

Overall genome structure.

The A. tumefaciens C58 genome consists of a circular chromosome, a linear chromosome, and two plasmids: the tumor-inducing plasmid pTiC58 and a second plasmid, pAtC58 (8,10). The genome was sequenced and assembled with standard methods (27). The sequence of all four DNA molecules is available on GenBank, along with chromosomal diagrams and analysis tools (28). The essential features of the four molecules are summarized in Table 1 and in the accompanying Web materials (27). The two chromosomes contain all of the genes for stable RNAs and housekeeping proteins involved in essential cellular functions and prototrophic growth.

Table 1

General features of the A. tumefaciensgenome.

View this table:

The circular chromosome contains a putative origin of replication (Cori) similar to the known Cori ofCaulobacter crescentus (29). The linear chromosome, on the other hand, has a plasmid-type replication system of the same type found on pTiC58 and pAtC58. This system, encoded by the repABC genes, expresses a pair of segregation proteins (RepA and RepB) and an origin-binding replication initiation protein (RepC) (30). Thus, we hypothesize that the linear chromosome is evolutionarily derived from a plasmid. The plasmid origin of an “extra” chromosome had been predicted for multichromosome genomes of the α-proteobacteria and has been found in more distantly related organisms such as Vibrio cholerae(19, 31–33).

Gene density is very similar between the two chromosomes. However, genes involved in most essential processes are significantly overrepresented on the circular chromosome (34) (Fig. 1). This asymmetry is consistent with direct descent of the circular chromosome from the primordial α-proteobacterial genome, with a minority of essential genes moving to the linear chromosome (10). Consistent with lateral transfer between chromosomes over a long period, the overall dinucleotide signatures of the two chromosomes are essentially identical but are significantly different from those of the two plasmids (27, 35). The dinucleotide signatures of the two plasmids are quite similar to each other and to related plasmids from other members of the Rhizobiaceae family (27).

Figure 1

Distribution of genes from different functional classes on the circular chromosome (solid bars) versus the linear chromosome (open bars). Clusters of orthologous groups (COG) (34) classes are as follows: TR, translation; NU, nucleotide metabolism; CO, coenzyme metabolism; CE, cell envelope; LP, lipid metabolism; and AA, amino acid metabolism. Other classes are AAS, amino acid synthesis; RP, ribosomal proteins; and NUS, nucleotide synthesis. Asterisks indicate significant differences in gene distribution (goodness of fit test). The dotted line represents the null hypothesis of gene distribution based on chromosome size for the circular chromosome.

More than 6000 base pairs of near-perfect sequence identity extend across the two ribosomal RNA (rRNA) gene clusters on each of the two chromosomes. The chromosomes also share some shorter regions of greater than 90% sequence identity with pAtC58 (27). Transcription of all rRNA gene clusters is oriented away from the DNA replication origins, with those on the linear chromosome in the same orientation. A number of housekeeping genes are located between the linear chromosome's rRNA operons, and one might expect frequent recombination resulting in lethal events.

The telomeres of the linear chromosome are covalently closed.

The linear chromosome is a covalently closed linear molecule, apparently possessing hairpin loops at the telomeres (Fig. 2). Our sequence comes within several hundred bases of the telomeres, and additional sequence is presented by Wood et al.(24). However, the sequence of the putative hairpin loop is not yet available. The proximal regions of both telomeres are similar in overall architecture but very different in sequence. These regions contain portions of several insertion sequence (IS) elements with intervening DNA of additional repeated and unique sequence. They are rich in potential secondary structure and contain numerous short sequence repeats.

Figure 2

The linear chromosome telomeres are covalently closed, as shown by Southern blot hybridization using probes near the right (A) and left (B) telomeres. Chromosomal DNA was digested with specified restriction endonucleases in 100 mM NaCl, 50 mM tris-Cl, 10 mM MgCl2, and 1 mM dithiothreitol, and a portion of each reaction mix was boiled for 10 min, then allowed to cool slowly. After electrophoresis and blot transfer, specific probes identified DNA fragments containing intact telomeres (Nde I and Mlu I digests) or lacking telomeres (Dde I and Pvu II digests). The mobility of fragments possessing telomeres was essentially unchanged by denaturation, suggesting that telomeres contain a covalent hairpin loop. DNA lacking telomeres was denatured by boiling, creating two single-stranded molecules. Slow cooling allowed renaturation of a portion of the denatured molecules. nb, not boiled; 0, 12, 24, 36, and 48 indicate cooling time (in minutes) before freezing in a dry ice and ethanol bath. After 48 min of cooling, the temperature of the DNA samples was approximately 50°C. Numbers on the right of each figure indicate sizes (in kilobases) of double-stranded DNA molecular weight standards.

Precedence for covalently closed telomeres is established inEscherichia coli phage N15 and spirochaetes such asBorrelia burgdorferi (36, 37). DNA replication proceeds through the hairpin, and telomere duplicates are separated to permit chromosome segregation. Resolution of duplicated telomeres is best characterized in the phage N15 system, where protelomerase separates the telomere copies, then rejoins the ends of individual molecules to create hairpin loops (36). No ortholog of N15 protelomerase was identified in the A. tumefaciens C58 genome. However, several putative transposases are encoded near the telomeres, and these could separate daughter telomeres in a reaction analogous to transposon excision.

DNA replication and the cell cycle.

DNA replication is synchronized with the A. tumefaciens cell cycle (11, 38, 39), a feat requiring coordination of four DNA molecules and two different classes of replication origin: Cori and repABC. The precise signal that initiates DNA replication is not yet clear, although in the related α-proteobacterium Caulobacter, many proteins are subject to cell cycle–synchronized transcriptional control and proteolysis (40). Because the initiation signal must be interpretable by both types of replication origin, it is unlikely to be transduced by a single origin-specific binding protein.

Processive DNA replication is performed by DNA polymerase III (Pol III), and A. tumefaciens carries four paralogs of thednaE gene encoding the Pol III α (polymerase) subunit (41). These dnaE genes fall into two distinct sequence families, designated as categories A and B (27). The category A gene of the circular chromosome is conserved in all sequenced α-proteobacteria and probably encodes the primary replication enzyme (19–23,27, 41). Each of the A. tumefaciens repABCreplicons (linear chromosome, pTiC58, and pAtC58) encodes a category BdnaE gene within an operon containing two conserved genes of unknown function. The operon is present in all fully sequenced α-proteobacteria except the Rickettsia species and may encode a novel DNA polymerase complex (19–23).

Synteny analysis of the A. tumefaciens, S. meliloti, and M. loti genomes.

AlthoughAgrobacterium and Sinorhizobium are in different phylogenetic clades (13), there is an excellent syntenic relationship between the A. tumefaciens circular chromosome and the chromosome of S. meliloti. This further supports the idea that the circular chromosome is the original chromosome involved during the evolutionary radiation of the α-proteobacteria, with the subsequent evolution of the linear chromosome taking place from a plasmid (19, 42) (Fig. 3A). Many of the S. melilotiorthologs missing from the circular chromosome are present on the linear chromosome, as are a number of gene duplications. The region ofS. meliloti retaining the most synteny with the A. tumefaciens linear chromosome contains about 300 genes, with orthologs arranged in short syntenic groups across the linear chromosome (19, 42) (Fig. 3A). These genes retain the same gene order relative to S. meliloti, despite their broad dispersal. If portions of the linear chromosome arose via an excision event from the primordial chromosome, the excision may have originated in this region, with subsequent insertions moving particular sections apart. Many other S. meliloti orthologs are also present on the linear chromosome, but their seemingly random location suggests many independent transfers. IS elements are relatively rare on the linear chromosome (with the exception of the telomeres) and cannot explain the highly distributed nature of orthologous genes. The synteny between the circular chromosome of A. tumefaciens and the chromosome of M. loti is much less pronounced and deserves further analysis in terms of the evolutionary history of the family Rhizobiaceae (20) (Fig. 3D).

Figure 3

Synteny among S. meliloti, M. loti,and the two A. tumefaciens chromosomes (19, 20). Protein pairs were generated by BLASTP analysis of predicted proteins from each genome, retaining only the best S. meliloti or M. loti match for eachA. tumefaciens protein. Each protein pair is graphed according to the location of the corresponding gene on respective DNA molecules. Blue dots, alignment between A. tumefacienscircular chromosome and a designated S. meliloti or M. loti replicon. Red dots, alignment between A. tumefaciens linear chromosome and a designated S. meliloti or M. loti replicon. (A) Comparison of A. tumefaciens with S. meliloti chromosome. (B) Comparison of A. tumefaciens with S. meliloti plasmid pSymA. (C) Comparison of A. tumefaciens with S. meliloti plasmid pSymB. (D) Comparison of A. tumefaciens with M. loti chromosome. Synteny between A. tumefaciens and theM. loti pML plasmids was extremely low (27). To reduce background from members of orthologous groups, only pairs with a BLASTP expectation value ≤ 1 × 10−80 are shown, but full BLASTP data are available (27).

All four A. tumefaciens replicons contain some genes similar to those on the S. meliloti pSymA and pSymB plasmids (19, 27, 31, 43) (Fig. 3, B and C). Most of these shared genes encode members of large orthologous groups, such as ABC transporters. However, there are exceptions, including the fixNOQP operon found on the circular chromosome in A. tumefaciens (44). In S. meliloti, these genes lie on pSymA and encode a special cytochrome oxidase required for the microaerophilic growth found in symbiotic nodules. The role of this cytochrome cbb 3-type oxidase in the biology of A. tumefaciens awaits further study. As with comparisons of the A. tumefaciens linear chromosome versus the S. meliloti chromosome, orthologous genes from the pSym plasmids are spread across both A. tumefaciens chromosomes in many short regions of similarity.

Plant transformation and tumorigenesis.

Genes involved in plant transformation and tumorigenesis are located on all four genetic elements. The circular chromosome harbors the well-studied chvAB genes required for synthesis of the extracellular β-1,2-glucan involved in binding to plant cells; thechvGI, chvE, and ros genes involved in regulation of Ti plasmid vir genes; and the chvD,chvH, and acvB genes (4–6). The linear chromosome harbors theexoC (pgm) gene required for synthesis of the extracellular β-1,2-glucan and succinoglucan polysaccharides, and the cellulose synthesis (cel) genes involved in binding to plant cells (4–6). Our cel region sequence differed from the published sequence, and we have reannotated this locus (27). Plasmid pAtC58 contains the attachment (att) genes involved in initial specific attachment of the bacterium to plant cells, as well as a second, partial att locus (4–6). pAtC58 is reportedly dispensable for virulence, raising the question of whether there is a virulence requirement for att(14).

Ti plasmids fall into several opine groups, and the three plasmid sequences now available permit detailed analyses of their relationships (4–6, 15, 16) (Fig. 4). The order of genes on the nopaline-agrocinopine–type plasmids pTiC58 and pTi-SAKURA are virtually identical. Major exceptions include one large insertion on pTiC58 and four smaller insertions on pTi-SAKURA. In contrast, the consensus octopine-type plasmid shares only five major gene clusters with the nopaline-type plasmids (Fig. 4). Many regional differences among these plasmids can be circumstantially linked to mobile DNA elements (45). Most of the pTiC58-specific genes are involved in metabolism and transport and probably allow the bacterium to scavenge additional nutrients. However, for a few genes unique to a given Ti plasmid, there is direct or circumstantial evidence that they play a role in tumorigenesis on particular hosts (for example,ligE of pTi-SAKURA, and virH1, virH2, and virJ of octopine-type plasmids) (15, 16). Both pTiC58 and pTi-SAKURA encode a probable NUDIX hydrolase, which may degrade altered nucleotides or other toxic compounds present in the plant wound environment (46).

Figure 4

Schematic linear alignment, beginning at the left T-DNA border, of pTiC58 with the two other sequenced Ti plasmids (15, 16). Blue represents areas of homologous genes in the same or similar location. Gray (trb), orange (rep), and green (tra) represent areas of homologous genes that are in different locations on various plasmids. Red represents unique regions on each plasmid. The positions of the T-DNA oncogenes (ONC), trb operon, rep operon,tra operons, and vir operons are shown for pTiC58. Arrows delineate regions of identical gene order among the plasmids.

Secretion systems and other pathogenicity factors.

A. tumefaciens lacks a dedicated type III secretion system for exported pathogenicity factors, but it does have the flagellar type III system, and there is evidence of chemotaxis in response to plant-released compounds (47, 48). Flagella are known to secrete pathogenicity factors in Yersinia enterocolitica, and flagellar mutants of A. tumefaciensexhibit moderately reduced virulence (49, 50). However, the A. tumefaciens studies do not discriminate between the potential effects of motility and protein secretion (50).

A. tumefaciens C58 contains three type IV secretion systems and is the primary model for understanding secretion of DNA and proteins through these transporters. Two systems encoded by pTiC58 have been heavily studied: the VirB system for T-DNA transfer to plant cells and the Trb/Tra system for conjugal transfer of pTiC58 (4–6). A third system is encoded by pAtC58. The utility of this system is unknown, but pAtC58 may be conjugative (51). The type IV complexes most similar to the pAtC58 system are from the related animal pathogens Brucellaand Bartonella (25, 26).

Several additional loci may play a role in establishing or maintaining a pathogenic relationship with the plant. An autotransporting virulence factor family member is encoded by pAtC58. Such proteins cross the plasma membrane via the signal peptide–dependent pathway and self-insert into the outer membrane (52). Typically, a large extracellular domain is exposed, where it modifies cell adhesion or host cell functions. Other genes similar to known virulence factors include orthologs of bacA (macrophage survival inBrucella abortus and legume symbiosis in Sinorhizobium meliloti), putative adhesins, icmF (macrophage killing in Rickettsia), and as many as six different iron uptake systems (53, 54). Iron acquisition is always a priority for pathogens, and A. tumefaciens strains vary in their production of siderophores (55). Although strain C58 does not produce detectable siderophore activity in low-iron medium, the diversity of iron uptake systems may allow it to co-opt some catechol, hydroxymate, or citrate siderophores produced by other microbes or by plants (55).

Other aspects of metabolism and signaling.

A. tumefaciens establishes a proprietary carbon and nitrogen source by genetically engineering its host to produce opines (4–7). The genes for nopaline and agrocinopine utilization lie on pTiC58, but strain C58 derivatives lacking pTiC58 can take up octopine and nopaline without subsequent hydrolysis, and spontaneous mutants of these derivatives have been found with an octopine or mannopine/mannopinic acid utilization phenotype (56, 57). Genome analysis provides some clues about the basis of these activities. There are many putative amino acid, dipeptide, and oligopeptide ABC transport systems that may permit scavenging a variety of nitrogenous compounds. Also, the linear chromosome and pAtC58 both encode strong homologs of AgaE, the Ti plasmid–encoded version of which catalyzes the breakdown of mannopine to mannose and glutamate (58).

A. tumefaciens also exploits many native plant metabolites, such as sucrose, tannins, and cell wall polymers. Sucrose is the major form by which organic carbon is transported in most plants. Like its relative S. meliloti, A. tumefaciens has an α-glucoside utilization (agl) operon and an additional orphan α-glucosidase, as well as sucrose hydrolase (invertase) (59). A third route unique to A. tumefaciensinvolves the periplasmic oxidation of sucrose to 3-ketosucrose, transport of 3-ketosucrose, and cytoplasmic cleavage into fructose and 3-ketoglucose. With no homologs for comparison, we cannot positively identify the genes encoding the enzymes of this pathway, but mutants affecting the pathway are available (60). Plasmid pAtC58 encodes a protein with strong similarity to fungal tannases. This enzyme might allow the bacterium to use tannins as nutrients or to defend itself against the antimicrobial activities of many tannins (61). Finally, there are numerous genes that may allow degradation of cellulose, hemicellulose, pectin, and/or lignin. These include a β-endoglucanase encoded within the cellulose synthesis gene cluster, putative xylan esterases and xylanases, a previously identified inducible polygalacturonase (pectinase), and several enzymes for degrading monomeric and dimeric components of plant lignin (LigE, VirH2, and the β-ketoadipate pathway) (4–6, 15, 16,62, 63).

The A. tumefaciens genome encodes at least 25 different two-component regulatory systems and a wide array of other regulatory proteins, including two new bacterial phytochromes (64). Typically, bacterial phytochromes contain the sensory portion of the protein, including the tetrapyrrole chromophore-binding site, attached to a histidine kinase domain. One of the A. tumefaciens phytochromes has this structure, and its gene is in a putative operon with a partner response regulator. The other phytochrome is itself a response regulator. There are no data in the literature as to the effect of light on A. tumefaciens.

Concluding remarks.

The analysis presented here and in the accompanying paper by Woodet al. illuminate many new avenues to further explore the biology and biotechnological utility of A. tumefaciens(24). These include the maintenance of a complex genome, new potential virulence mechanisms, and many additional ways in which the bacterium may parasitize a plant host. At a larger level, the availability of six α-proteobacterial genomes, with more on the way, provides a wealth of comparative data for further understanding the evolutionary history of the group, the evolution and maintenance of multichromosome genomes, and the evolution of mechanisms that interact with and exploit animal and plant hosts (19–26).

  • * To whom correspondence should be addressed. E-mail: steven.c.slater{at}


View Abstract

Navigate This Article