Research Article

Unexpected conservation and global transmission of agrobacterial virulence plasmids

See allHide authors and affiliations

Science  05 Jun 2020:
Vol. 368, Issue 6495, eaba5256
DOI: 10.1126/science.aba5256

Agrobacteria virulence writ large

Plasmids are widespread among bacteria and are important because they spread virulence and antibiotic resistance traits, among others. They are horizontally transferred between strains and species, so it is difficult to work out their evolution and epidemiology. Agrobacteria, a diverse grouping of species that infect plants, inject oncogenic Ti and Ri plasmids, which cause crown galls and hairy root diseases, respectively. The upside is that these plasmids have become valuable biotechnological tools. Weisberg et al. combed through an 80-year-old collection of Agrobacterium strains but found a surprisingly low diversity of plasmids. It is puzzling how limited the number of plasmid lineages is despite reported high levels of plasmid recombination, but what is clear is how plant production systems have influenced plasmid spread into various genomic backbones.

Science, this issue p. eaba5256

Structured Abstract


Plasmids are autonomously replicating, nonessential DNA molecules that accelerate the evolution of many important bacterial-driven processes. For example, plasmids spread antibiotic resistance genes, which are a pressing problem for human and animal health. Plasmids can also encode complex traits that allow bacteria to interact intimately with eukaryotes. Acquisition of an oncogenic tumor-inducing (Ti) or root-inducing (Ri) plasmid by saprophytic soil agrobacteria changes them into pathogens capable of genetically transforming and causing disease in a broad range of plant species.

Plasmids are also biotechnology tools that can advance our understanding of life. They can be used to generate organisms with unusual traits and innovative applications. The potential for using oncogenic plasmids to accelerate research was recognized early in their discovery. Along with strains of agrobacteria, disarmed plasmids are mainstays as tools in plant biology and plant genetic engineering.


Inferring evolutionary relationships is foundational for classifying plasmids, accurately assessing the influence of plasmids on disease outbreaks, developing appropriate strategies for mitigating disease, and expediting efforts to leverage plasmid diversity for biotechnology. However, such research is complicated because plasmids consist of diverse structural variants and are extraordinarily dynamic, modular molecules that can be reshuffled and broadly transmitted horizontally.

We focused on oncogenic plasmids of agrobacteria because of their important roles in causing disease and as biotechnology tools. Two genomic datasets were developed. One consisted of diverse, broadly sampled historical strains and was intended to serve as the basis for an evolutionary framework. The other consisted of contemporaneous strains hierarchically sampled from managed plant production sites, for the purpose of calibrating epidemiology methods. The datasets were combined to identify epidemiological patterns.


We combined analyses of chromosomal ancestry and plasmids to uncover their contributions and accurately model the global spread of disease. Phylogenetic, genomic, and time tree analyses of thousands of strains from the Rhizobiales order yielded a robust phylogenetic history of agrobacteria. We developed a strategy that uses phylogenetic and network approaches as well as different scales of genetic information to infer the evolution of diverse oncogenic plasmids. By combining results, we uncovered global epidemiological patterns supporting movement of pathogens clonally and plasmids horizontally in space and time.

This study has three major findings: (i) Lineages of agrobacteria emerged independently and at different times from within a genus-level group that also circumscribes multiple lineages of rhizobia. (ii) Agrobacterial Ti and Ri plasmids are descended from only six and three lineages (types), respectively. Few evolutionary events are sufficient to explain the relationships observed among types. Each type is subject to different pressures and shows different degrees of within-group variation, but their evolution is nonetheless guided by similar principles. The extent of modularity is high, and genes and functional modules are frequently reshuffled via recombination within conserved loci. Yet plasmid diversification is nonetheless constrained by the spatial structure of loci that interact genetically. (iii) Transmission of oncogenic plasmids, especially within agricultural settings, promotes the massive spread of disease.


Our strategy for inferring the evolution and transmission of virulence plasmids has potential applications in plant and human or animal health and food safety, as well as for understanding the ecology and evolution of other plasmid-mediated processes such as mutualistic symbioses. In addition, this strategy can be applied to study other mobile and modular elements, such as integrative conjugative elements and pathogenicity or symbiosis islands. We have shown that plasmids once viewed as too diverse to be classified have distinct lineages, and that accurate modeling of the spread of disease can be accomplished by robustly defining their evolutionary relationships.

Combined genomic analyses of chromosomal and plasmid identities to model disease spread.

Genomic data from hundreds of strains of agrobacteria were parsed and analyzed to infer the evolutionary histories of chromosomes and oncogenic Ti and Ri plasmids. The data were overlaid to uncover the roles of bacteria and plasmids in the global spread of disease.


The accelerated evolution and spread of pathogens are threats to host species. Agrobacteria require an oncogenic Ti or Ri plasmid to transfer genes into plants and cause disease. We developed a strategy to characterize virulence plasmids and applied it to analyze hundreds of strains collected between 1927 and 2017, on six continents and from more than 50 host species. In consideration of prior evidence for prolific recombination, it was surprising that oncogenic plasmids are descended from a few conserved lineages. Characterization of a hierarchy of features that promote or constrain plasticity allowed inference of the evolutionary history across the plasmid lineages. We uncovered epidemiological patterns that highlight the importance of plasmid transmission in pathogen diversification as well as in long-term persistence and the global spread of disease.

Agricultural ecosystems promote the rapid evolution and diversification of pathogens (1). These ecosystems increase pathogen population sizes, lower barriers for transmission, and increase opportunities for horizontal exchange of virulence genes. Understanding the genetic basis for how pathogens emerge and diversify in agricultural ecosystems is foundational for determining their spread and assessing risks. Such knowledge is critical to policies for improving plant health and preparing against disease outbreaks to increase global food security (2).

Plasmids that confer virulence and antimicrobial resistance are evolutionary drivers of pathogenic bacteria that affect plant, human, and animal health (36). Some plasmids can mediate conjugation and transmit horizontally within and across species to diversify pathogen populations. The development of strategies to infer horizontal transfer of plasmids is crucial for accurately assessing disease outbreaks and instituting measures to limit risks from disease. However, it is difficult to trace dissemination of plasmids and integrate findings with analyses of chromosomes, because plasmids can be horizontally transferred and hence can defy inference of evolutionary histories (7). In addition, plasmids tend to have few conserved regions, high numbers of repeated sequences, high rates of gene exchange, and many structural variants. Even core genes such as those associated with replication or mobility are susceptible to recombination.

Agrobacteria are a diverse, polyphyletic group, and its members are common in soils and on many species of plants. These bacteria have multipartite genomes with two chromosomes and nonvirulence plasmids, which are extremely diverse among agrobacteria (8). Pathogenic strains additionally carry conjugative oncogenic tumor-inducing (Ti) or root-inducing (Ri) plasmids and can infect plants to cause crown gall or hairy root diseases, respectively (9). These diseases are incurable and persist for the duration of the lives of infected plants, which continually release opines, metabolites that are central to the ecology and epidemiology of agrobacteria.

Oncogenic plasmids are exceptionally well characterized because of their applications in plant biotechnology and hence are models for understanding the impact of plasmids on disease ecology (10, 11). Core to oncogenic plasmids are repABC replication genes, tra and trb interbacterial conjugation genes, and vir genes. Vir proteins are necessary to process and escort a region, the transfer DNA (T-DNA), of oncogenic plasmids into host cells, where it recombines into the genome to genetically transform host plants. T-DNAs include oncogenes that reprogram transformed cells to proliferate. One oncogene present on T-DNAs in all oncogenic plasmids is tms1 (aux1/iaaM), which is involved in the synthesis of auxins, a class of plant growth–promoting hormones (12). T-DNAs also include genes responsible for the synthesis of opines. More than 20 chemically diverse opines have been identified among oncogenic plasmids (13). Opines are nutrients for the pathogen and act as signals that trigger replication and interbacterial conjugation of oncogenic plasmids. Genes necessary for uptake and catabolism of opines are spatially separated from cognate synthesis genes and are located outside of T-DNAs. This arrangement led to the hypothesis that instigating pathogens are privileged in accessing the opines (14).

Beyond the core set of genes, oncogenic plasmids vary extensively in composition and structure (15). This diversity is a major challenge, best expressed by the sentiment that it is practically impossible to reconstruct the evolution of agrobacteria and oncogenic plasmids, even with the availability of an extraordinarily large genomic dataset (16). Here, we overcame this challenge and analyzed genomic data from hundreds of strains, and applied the results to infer the evolutionary history and transmission of oncogenic plasmids.

Chromosomal ancestry

A robust phylogeny of bacteria is an essential foundation for understanding the evolution and transmission of the plasmids that they host. Hence, it was critical to reconcile controversies over the complex evolutionary relationships among agrobacteria (see data S1 for synonymous terms) (17, 18). Sequenced strains were reclassified in the context of a dataset of nearly 1500 additional strains that represent families within the order Rhizobiales (data S1 and S2). Combined phylogenetic and genomic analyses indicated that agrobacteria, Rhizobium, Allorhizobium, Neorhizobium, Ensifer/Sinorhizobium, Pararhizobium, and Shinella are related at a level of a genus, referred to as the agrobacteria-rhizobia complex (ARC; fig. S1 and data S3). Within the ARC, groups traditionally known as Agrobacterium tumefaciens, Agrobacterium rhizogenes, and Agrobacterium vitis are called biovar (BV) 1, BV2, and BV3, respectively (19). BV1 was previously subclassified into genomospecies (20). Whole-genome analysis suggested that most genomospecies (excepting G7 and G8, which each represent three groups) are commensurate with species-level groups (data S4). BV2 is homogeneous and represents a single species-level group. BV2 is sister to a newly defined clade of pathogenic agrobacteria (BV2-like). BV3 forms multiple species-level groups.

A single lineage of agrobacteria was previously estimated to have diverged from rhizobia between ~250 million and ~150 million years ago (21, 22). Considering the new interpretation of their phylogeny, we reevaluated estimates of divergence times (Fig. 1, fig. S2, and data S5). Lineages of agrobacteria emerged independently, at different times in the history of the ARC, and are more closely related to lineages of rhizobia than to each other. BV1 emerged 48 ± 10 million years ago, and its clade has substructure with long branch lengths, which suggests that species-level groups diverged early in its history. The BV2 lineage is estimated to have emerged only 1.6 ± 0.5 million years ago, and its short branch lengths and low genetic diversity are consistent with the possibility that BV2 is recovering from a recent bottleneck (fig. S2). In the time tree, BV3 diverged independently from other lineages of agrobacteria (Fig. 1). However, its relationships in the time tree are incongruent with those of the phylogenetic tree (fig. S1).

Fig. 1 Time-calibrated phylogenetic tree of the agrobacteria-rhizobia complex.

Blue horizontal bars indicate confidence intervals for each split. Clades I to IV are defined in fig. S1. Key groups are labeled; three-letter codes are for select species of rhizobia or agrobacteria without biovar classifications: Run (R. undicola), Ala (A. larrymoorei), Ask (A. skierniewicense), Aru (A. rubi), and Aar (A. arsenijevicii); synonyms are listed in data S1. Each strain is colored according to its species-level classification. Two groups (black bars) within BV1 are not associated to established genomospecies.

Plasmid identities

We next addressed the difficult task of resolving the relationships of the oncogenic plasmids. Because plasmid genes do not meet the assumptions of traditional phylogenetic methods, we used multiple approaches to compare and cross-validate findings. A total of 143 oncogenic plasmid sequences were identified. We analyzed different levels of genetic features, including signatures of subsequences of length k (k-mers), a 43-core gene phylogeny, a phylogenetic network derived from 144 single-copy genes present in at least 40% of all plasmids, and patterns of composition as well as organization. To our surprise, the results were consistent and allowed us to categorize the molecules into just nine distinct lineages of six Ti (type I–VI) and three Ri (type I–III) oncogenic plasmids (Figs. 2 and 3, figs. S3 to S9, and data S6 and S7). Type I and type IV Ti plasmids were further divided into two and three subtypes, respectively.

Fig. 2 Nine distinct lineages of oncogenic plasmid types.

(A) Weighted undirected network of oncogenic plasmids. Nodes represent individual oncogenic plasmids and are colored according to the type and subtype of Ti (top) and Ri (bottom) plasmids. Darker edges indicate greater Jaccard similarity of k-mer signatures. (B) A split network of the oncogenic plasmids. Branch thickness indicates relative support for the split. Key reference strain plasmids are indicated. (C) Maximum likelihood tree constructed on the basis of concatenated sequences from 43 single-copy core genes (data S7). Tips (circles and triangles) are color-coded according to plasmid type. Colored panels in the top row below the tree denote the type of plant from which strains were cultured. Colored panels in the bottom row indicate the classification of the strains. The same color scheme for plasmid types is used in each of the three panels. The tree is midpoint-rooted. See fig. S3 for a more detailed and larger version of (C).

Fig. 3 Variations within and between types and subtypes of oncogenic plasmids.

(A) Visualization of variation within plasmid subtypes. Circles, starting from the innermost circle, are a gene synteny graph, a plasmid map, and bar graphs representing relative depth of coverage of sequencing reads (sliding window of 1 kb), number of SNPs (sliding window of 1 kb), and number of soft-clipped reads (≥5 clipped reads). Nodes in the synteny graph represent genes present in ≥1 plasmid of a type. Nodes are connected if genes are adjacent in at least one plasmid. The network cycle represents the most common order of genes within a type. Different structures, such as gene presence/absence variation, rearrangements, or inversions present in plasmids, are shown as alternative paths in the cycle. Nodes are colored according to key functions and aligned to corresponding features in the plasmid map, or colored yellow for transposase- and insertion sequence–encoding genes. Soft-clipped reads are those that must be trimmed in order to align to the reference and can be evidence for a recombination breakpoint. Type I.a and type I.b Ti plasmids are presented as examples. (B) General structure and organization of type II–VI Ti plasmids, as well as their subtypes, and type I–III Ri plasmids. Simplified maps show locations and spatial relationships of key modules. They are colored and labeled with a letter code: A, acc; O, opine genes; R, repABC; T-1 to T-4, T-DNA 1–4; Ta, tra; Tb, trb; V, vir genes. The triangle represents an insertion sequence. Visualizations are as shown in (A) for each of the plasmid types and subtypes (figs. S5 to S9).

In the k-mer network, most types segregated into distinct graphs (Fig. 2A). However, type IV.c Ti plasmids are high-degree nodes that connect type I Ti plasmids to the other subtypes of type IV Ti plasmids. In the phylogenetic network, there were many splits, yet plasmids clustered via short edges into monophyletic clades (Fig. 2B). This topology is consistent with extensive and ancient recombination events prior to divergence of plasmid lineages. The two networks show that the type I Ti plasmids are most closely related to type IV plasmids, which is consistent with their similarities in structure and gene composition (fig. S10 and data S8). Other relationships were also observed in the phylogenetic network. Type I Ti plasmids are related to the type VI Ti plasmid (Fig. 2B). Type II plasmids are nested within the type III Ti plasmids and are more closely related to each other than to other types of Ti plasmids. Type V Ti plasmids are the most distinct of the sequenced Ti plasmids. Type II and type III Ri plasmids are more closely related to each other than to type I Ri plasmids.

Within this sequenced dataset, there are relationships among plasmids, bacterial species, and plant hosts, some of which had not been previously noted and could potentially be of use in biotechnology applications of agrobacteria (Fig. 2C and fig. S3). BV1 has the most diverse spectrum of oncogenic plasmids, with four types of Ti plasmids and type I Ri plasmids. Type III Ti plasmids are exclusively in BV1. BV2 tends toward having only one of the two minor variants of type 1.a Ti plasmids (fig. S15A). However, some have type II and type IV.c Ti plasmids as well as type III Ri plasmids. BV3 strains exclusively carry type IV.a, type IV.b, and type V Ti plasmids. Strains carrying type I Ti plasmids were isolated predominately from woody plants, whereas those with type III Ti plasmids were exclusively from herbaceous plants. With the exception of only two strains, those belonging to BV2 were cultured from woody plants. BV3 strains were exclusively cultured from grapevine, which is also host to BV1 and BV2 strains.

There are different degrees of variation within each type of oncogenic plasmid. We combined gene synteny, gene annotation, and sequence data for a comprehensive comparison of patterns of variation in plasmid types (Fig. 3A, figs. S5 to S9, and data S9). Type I.a and type II Ti plasmids are relatively conserved in gene composition and structure. Type I.b and type III Ti plasmids are more diverse, with gene presence/absence variation present in T-DNAs, opine-associated loci, and regions flanking the vir loci. Despite the lower sampling depth, a range in gene presence/absence was also observed among the Ri plasmids. Across all but type II and type V Ti plasmids, higher variation in gene composition is present within T-DNAs and proximal to the right border. Variation extends to the region neighboring the right border of T-DNAs in type I.b and type III Ti plasmids and all types of Ri plasmids. In these plasmids, the region encodes proteins necessary for transport and catabolism of opines. As previously noted, noncore genes of most plasmid types are highly variable. The tra and trb loci of type I–III Ti plasmids have extensive small-scale changes.

Plasmid evolution

The range in variation across types of oncogenic plasmids suggests that each of the types has been shaped by different evolutionary processes. We modeled plasmid histories to understand these processes and infer origins and relationships. In doing so, a constant theme emerged: Core genes used to characterize oncogenic plasmids not only have low phylogenetic value but are in fact the major contributors to their variation. Their conservation among plasmids provides regions for recombination and reshuffling of genes to occur. We therefore analyzed plasmid types at different scales to infer patterns of evolution. Features of oncogenic plasmids were also accounted for separately because of their different evolutionary histories.

Different levels of modularity and variation in gene analogs contribute to oncogenic plasmids maintaining their function while adopting diverse gene composition and structure. The vir locus is central to virulence of agrobacteria yet is prone to exchange among plasmid types. Results from k-mer and phylogenetic analyses were not consistent with grouping on the basis of locus structure or with relationships among plasmid types (Fig. 4A and figs. S11 and S12A). The vir loci of type I.a, type I.b, type II, type III, and type VI Ti plasmids as well as type II and type III Ri plasmids share a recent common ancestor and follow one of the paths in the k-mer graph. Those of type IV and type V Ti plasmids and type I Ri plasmids share a recent common ancestor and follow the second path.

Fig. 4 Diversity of virulence loci of oncogenic plasmids.

(A) Compacted de Bruijn graph constructed from component k-mers of sequenced vir loci. Traces are colored according to plasmid type and subtype. Key variations and vir gene regions are indicated; IS, insertion sequence. Large regions that interrupt the vir loci were not included. Panels at bottom show the traces of pTiC58 and pRi2659, which exemplify the two main traces. Traces in the inset are colored to differentiate between gene loci. (B) Gene synteny graph of 213 T-DNAs. The main Ti and Ri paths are labeled with dotted arrows. Paths of the three T-DNAs that form the chimera of the type I.a Ti plasmids are indicated by numbers in red. The two left borders of the chimeric T-DNA-1 of the type III Ti plasmids are labeled with numbers in green. Nodes are colored according to category (dark gray, left border; light gray, right border; blue, gene; yellow, insertion sequence). The tms2* gene is an independently acquired homolog that failed to cluster with tms2. Edges are colored according to plasmid type. Line weight is normalized to the proportion of plasmids within a given type. (C) Circos plot relating oncogenic plasmids to synthesized opine variants (S-A, conjugates of sugars and amino acids; K-A, conjugates of keto acids and amino acids; S-S, conjugates of sugars). See figs. S11 and S13 for larger variations of (A) and (B).

The vir locus is often acquired as separate modules. There are two vir loci in type IV.c Ti plasmids. The primary vir locus is hypothesized to have been acquired from a Ri plasmid. By itself, this locus is likely not functional because of multiple transposase genes interrupting virA, which encodes a regulator of the vir genes (fig. S12B). The second vir locus is a remnant that encompasses only tzs, virA, virB1, virB2, and virB3 and has overall identity of 93% to the vir locus of type I Ti plasmids. We speculate that the virA allele of this fragment is necessary to complement the disrupted allele of the primary vir locus. In addition, virE1-2 and GALLS genes are interchangeable submodules that are frequently inherited separately from the rest of the vir locus. This was observed for virE1-2 genes among type IV–VI Ti and type II Ri plasmids (fig. S12). Likewise, the vir locus of type III Ri plasmids, despite having a common ancestor with that of type I Ti plasmids, carries GALLS instead of virE1-2.

T-DNAs are extraordinarily diverse in size as well as composition and are highly recombinogenic (Fig. 4B, fig. S13, and data S6). T-DNA transfer is largely dependent on a short right border sequence and a flanking overdrive enhancer sequence (fig. S14) (2325). Border sequences are practically invariant among 213 T-DNAs, and this strict conservation of short sequences is a low barrier for generating alternative T-DNA–vir combinations and multiplexing T-DNAs in plasmids (fig. S14, A and B, and data S6). Diversification can be driven by gene gain and loss from T-DNAs with little to no consequence to the transformation process. Chimerization is another frequently permitted mechanism of diversification (figs. S15 and S16). The original T-DNA of the type I.a Ti plasmid (e.g., pTiC58) was extended, from left to right, by invasion of two additional T-DNAs. The last of the T-DNAs is one of two prominent variants that invaded and swept through the type I–IV Ti plasmids, potentially because of a selective advantage over others. This is the path from acs through the 6b gene to the right border that cuts prominently across the gene synteny graph. The type III Ti plasmid has two left borders in T-DNA-1 because a second T-DNA displaced all but the most left-flanking acs gene of the original T-DNA. T-DNA-2 of the type VI Ti plasmid has three right border sequences and two sets of homologous, but nonparalogous, oncogenes. This T-DNA follows a complex path in the graph.

However, within this dataset, there is little evidence for exchange of T-DNAs between classes of oncogenic plasmids. With the exception of T-DNA-2 of type III Ri plasmids, T-DNAs of the Ti and Ri classes have distinct gene compositions and segregate into different regions of the gene synteny graph prior to converging on a common subset of opine synthesis genes (Fig. 4B).

The structural arrangement of modules affects the degree of flexibility of oncogenic plasmid types. Most T-DNAs have a gene for synthesizing an opine; opines are conjugates between a keto acid and an amino acid (K-A), a sugar and an amino acid (S-A), or two sugars (S-S). All oncogenic plasmids have at least one T-DNA with an opine synthesis gene, and plasmid classification traditionally relied on the opine synthesized, despite assumptions of frequent exchange of opine genes (Fig. 4). Although variation of opine genes does exist, there is no evidence for the generalization of rampant exchange among all types of plasmids (Fig. 4C). Opine variation is influenced by the proximity of genes to the right borders of T-DNAs. When the cognate opine anabolism and catabolism/uptake genes are closely linked and separated by only the right border, swapping of opine loci and the border sequence can occur with little constraint (figs. S5, S6, S14, S17, and S18). Type I.b Ti plasmids are very promiscuous (fig. S5). We hypothesize that swapping is mediated by a process that involves nonhomologous recombination within oncogenes. In type III Ti plasmids, exchange is predicted to be mediated by homologous recombination among oncogenes (fig. S6). Because K-A and S-A genes are associated with different sets of oncogenes, swapping is most frequently limited to within their structural groups of opines (fig. S17 and data S6). Type I.a Ti plasmids are structurally similar to type I.b Ti plasmids and have the potential to diversify opines but are far more limited. We suggest that the low diversity is a consequence of the ancestral type I.a Ti plasmid experiencing a bottleneck along with BV2 (Fig. 1).

In other types of plasmids, important functional modules separate opine genes, and multiple recombination events would need to occur to acquire the two interdependent modules. For example, opine genes of type II plasmids are interrupted by tra genes (fig. S6). In type IV.a Ti plasmids, genes necessary for anabolism and catabolism of nopaline are separated by those involved in the catabolism of a second opine, agrocinopine (fig. S7). The acs gene, cognate to acc and necessary for synthesizing agrocinopine, is within and adjacent to the left border of the T-DNA (Fig. 4B and fig. S13). This organization is restrictive; for opine swapping to occur, either a larger fragment that included most of the T-DNA would also need to be exchanged, or the two modules for nopaline would need to be acquired separately.

The trb locus is associated with a high number of single-nucleotide polymorphisms (SNPs) and has diverse alleles (figs. S3 to S9 and S19). These data are consistent with the possibility of recombination within the trb locus, and because trb is adjacent to opine loci, it likely contributes to the swapping of opine cassettes in many types of Ti plasmids. Some type III Ti plasmids have additional copies of trb genes located distal to the main trb locus, likely due to duplication resulting from recombination within the locus (fig. S18B). In contrast, in the Ri class, trb is paired with tra and is separated from opine loci by large regions with no genes involved in virulence or plasmid maintenance (fig. S9). This difference in spatial relationship between classes likely constrains recombination between Ti and Ri plasmids. However, tra and trb could mediate recombination with homologous loci of the diverse non-oncogenic plasmids that are prevalent in the ARC (fig. S20).

All findings were synthesized to model the evolution of oncogenic plasmids (Fig. 5). Ti and Ri plasmids are distantly related and are hypothesized to have emerged from an ancestral proto-oncogenic plasmid (13, 26). We hypothesize that this ancient replicon carried vir, tzs, acs, and an opine synthesis gene flanked by T-DNA borders (fig. S21 and data S8). The tzs gene is a paralog of ipt (tmr) that is distal to the T-DNA and not transferred into plant cells (27). The tzs and ipt genes encode isopentenyltransferases, enzymes that catalyze the first step in the synthesis of the plant growth–promoting hormone cytokinin (28). However, cytokinins derived from Tzs have been implicated in regulating vir gene expression and promoting virulence (29, 30). The proto-oncogenic plasmid was suggested to have originated as a catabolic plasmid. The largest set of genes inferred to have a shared evolutionary history includes 52 genes (36% of those analyzed), of which 10 are implicated in the uptake of nutrients or metabolism and another 18 have unknown functions (data S8 and figs. S22 and S23). This set also includes repABC. We propose that the proto-oncogenic plasmid led to either the Ti or Ri class of plasmid and recombined with a separate repABC plasmid to yield the other class.

Fig. 5 Model of the evolution of oncogenic Ti and Ri plasmids.

Genes in boxes were acquired from unknown sources. The cluster of tzs, vir [mas] genes is predicted to have been acquired once (dotted arrows) and then transferred from one plasmid backbone to the other (solid arrows). Genes in purple were acquired horizontally from the indicated sources. Purple arrows represent major horizontal acquisition events. Circle with “X” depicts an undefined plasmid hypothesized to be the donor of the prominent T-DNAs that swept through the type I–IV Ti plasmids.

The next innovation was the acquisition of oncogenes within T-DNA border sequences. Every one of the sequenced Ti plasmids has at least one T-DNA that circumscribes tms1 and tms2 (aux2/iaaH), and all Ri plasmids have at least one T-DNA with tms1 (Fig. 4B and data S6). The tms2 gene cooperates with tms1 in auxin biosynthesis (12). Data are consistent with the ancestors of Ti and Ri plasmids independently acquiring tms1 from different nonagrobacterial sources (fig. S24). Moreover, even though tms1 and tms2 are a functional module and are consistently linked in T-DNAs of Ti plasmids, tms2 was potentially acquired separately from tms1 by an ancestral Ti plasmid (fig. S24C). The evidence suggests that in Ti plasmids, ipt was derived from a duplication of tzs and the paralog was incorporated within the T-DNA region (fig. S21).

An unexpectedly small number of events are sufficient to relate plasmid lineages (Fig. 5 and figs. S3, S4, and S12 to S24). Single-copy genes of oncogenic plasmids were grouped into just 18 sets inferred to have similar evolutionary histories (fig. S23). Five sets were sufficient to represent 80% of the genes analyzed. In contrast, tra and trb genes are distributed across 50% of the sets. Type IV.c Ti plasmids are cointegrates of a type I Ti plasmid and a Ri plasmid. The two sequenced type IV.c Ti plasmids have edges to every type I.a node, are nearly twice the size of pTiC58, and have a second tra locus located between two vir loci (Fig. 1A and fig. S10). In a modification to a previously proposed model, we suggest that all subtypes of the type IV plasmids are derived from the same lineage of the cointegrated ancestor and that type IV.a and type IV.b Ti plasmids are streamlined variants (15). The type VI Ti plasmid is also likely a cointegrate. This plasmid is novel in that it horizontally acquired, via T-DNA invasion, a second set of tms1 and tms2, which is unlike the first set found in all types of Ti plasmids (fig. S24). Type II Ti plasmids are hypothesized to have recently emerged from a rearrangement event within the type III Ti plasmids. Type V Ti plasmids are more closely related to Ri plasmids and are the most genetically and structurally distinct of the Ti plasmids. The multiple changes and mosaic structure make its evolutionary history difficult to interpret. Its unique structure is predicted to limit opportunities for productive recombination events with other oncogenic plasmids. Although Ti and Ri classes are very distantly related, they can co-reside in cells and exchange regions. Strain Di1411, for example, carries a type I.a Ti plasmid and a type II Ri plasmid.

Modeling disease spread

We next used this evolutionary framework to identify epidemiological patterns among strains collected across the world and spanning nearly a century of collection. The framework allowed us to analyze strains and plasmids as independently transmitted entities to accurately model disease spread. An additional 66 genome sequences of hierarchically sampled strains were used to guide grouping of the agrobacterial strains and oncogenic plasmids to ranks below species and subtypes, respectively (data S10). Strains of G1, G4, G7 (three species-level groups), and G8 (three species-level groups) of BV1 as well as BV2 were divided into 124 unique genotypes. Type I–III Ti plasmids were divided into 40 distinct plasmid clusters (data S11 to S19).

Multiple patterns underscore aspects of agricultural ecosystems that promote the diversification and spread of pathogens (Fig. 6 and fig. S25). Nonpathogenic, plasmid-lacking strains were isolated from healthy and diseased individuals (data S12, S15, S16, and S17). For example, three nonpathogenic strains were cultured from symptomatic plants at facility C7_N18, a location that also had pathogenic strains, differing by ~130,000 to ~290,000 SNPs (Fig. 6A and data S17). As potential recipients of virulence plasmids, nonpathogenic strains, which are common in soils and in association with plants, represent a standing pool of genetic diversity for the evolution of new pathogen genotypes. Strain-plasmid combinations can persist over the long term within agricultural ecosystems. Strains LMG 267 (type II Ti) and B140/95 (type II Ti) were collected 60 years apart yet are members of the same genotype and plasmid cluster, with 7 and 0 SNP differences, respectively (Fig. 6B and fig. S25B). Pathogens can be transmitted between agricultural and unmanaged ecosystems. Three members of a BV2 genotype (type I.a Ti) linked a natural ecosystem, an agricultural ecosystem, and an undocumented location in a different country (Fig. 6C and fig. S25C). The three strains and their Ti plasmids have ≤5 and 0 SNP differences, respectively, among them. S6_N30 and S6_N25 are almost 240 km apart, but two other plant production facilities are located within 1.5 km of S6_N30 and could have bridged the link to S6_N25.

Fig. 6 Spatiotemporal transmission of strains and plasmids.

(A to H) Patterns revealed in undirected networks combining strain genotypes and Ti plasmid clusters (fig. S25). Key: gray square, agricultural ecosystem; large circle, species-level group; small circle, type or subtype of Ti plasmid. The location is labeled with a coded identifier (top); strain name(s) (left or right side); date isolated (bottom; unk, unknown). Double-headed arrows link locations and show approximate distance (d, kilometers) and/or time (t, years). The large circle without a small circle in (A) represents a nonpathogenic strain that lacks an oncogenic plasmid.

Agricultural ecosystems can have recurrent infections. Locations in which different genotype-plasmid combinations were detected had likely experienced independent infections. Facility S2_N7 had eight strains representing seven different genotype-plasmid combinations (Fig. 6D and fig. S25, I to M). Over a 12-year span, S8_N32 was infected by G1 (type III Ti) strains that have >50,000 and >350 SNP differences among strains and plasmids, respectively (fig. S25, N to P). An agricultural ecosystem can also have disease reservoirs; this idea is supported by the presence of multiple strains of the same genotype-plasmid combination on different individuals. Location C7_N18 had 27 strains of the same G7 genotype (type III Ti) on four individuals of three different host species (Fig. 6A and fig. S25Q).

There were at least seven cases in which global distribution of plants is hypothesized to have contributed to the transmission of a strain-plasmid combination. These are patterns in which genotype and plasmid nodes are connected by multiple edges. Patterns by themselves are insufficient for differentiating between direct and indirect transmission routes. However, one case is highlighted that includes a facility that produces plants for wholesalers and could be a common source. Strains belonging to a G1 genotype (type III Ti) were identified from facility C4_N13 (Fig. 6E and fig. S25E). Strains of the same genotype-plasmid combination were later identified in two other facilities. The strains and plasmids from these three locations have ≤10 and 0 SNP differences, respectively, among them. The dataset has other instances in which sets of closely related strains marginally exceeding the >15 SNP difference threshold have plasmids that belong to the same cluster. It is thus possible that there were more strain-plasmid transmissions than reported.

Horizontal transmission of plasmids greatly diversifies and amplifies the spread of pathogens. Fifteen networks have a plasmid cluster as a central hub connected to multiple genotype nodes (fig. S25). We use the simple case of the type I.a Ti plasmid of strain Di1411 as an example (Fig. 6F and fig. S25F). This plasmid has only two SNP differences relative to the type I.a Ti plasmid of another BV2 strain. In contrast, the two strains linked by the type I.a Ti plasmid cluster have more than 22,000 SNP differences in their chromosomes. The most pronounced example has 28 BV2 strains and one G1 strain collected from more than 20 locations that have type I.a Ti plasmids with no more than 13 SNP differences among them (fig. S25R and data S11). The clustering of these 29 plasmids is not a consequence of low SNP diversity. The total of 49 type I.a Ti plasmids in this study formed 10 unique clusters that vary by nearly 5000 SNP differences. The diversity of BV2 strains is also not a consequence of the conservative threshold used to define genotypes. Even at 1000 SNP differences, the BV2 strains would separate into 17 different genotypes.

Management practices in agricultural ecosystems have been suggested to increase opportunities for plasmid conjugation (31). Supporting evidence is found in three networks in which the same location is mapped to a plasmid node and multiple associated genotype nodes. Strains of a G1 and G7 genotype carrying type III Ti plasmids with 0 SNP differences were identified in facility S9_N46 (Fig. 6G and fig. S25G). In the second example, one strain of a BV2 genotype and three strains of a G1 genotype were collected from facility S2_N7 (fig. S25K). Their type II Ti plasmids have 0 SNP differences among them. Last, a strain of a G4 genotype and two strains of closely related BV2 genotypes have type I.a Ti plasmids with ≤1 SNP difference (fig. S25S).

Plasmid dissemination can also lead to the temporal spread and persistence of disease. The type I.a Ti plasmid of strain K27, collected prior to 1964, has ≤2 SNP differences relative to seven other type I.a Ti plasmids present in strains collected during the period 1995–2009 (Fig. 6H and fig. S25H). As previously stated, this is likely not a reflection of low SNP diversity within the type I.a Ti plasmids.

We found one instance in which two closely related strains have distantly related Ti plasmids, suggesting independent acquisition events. Strain Z4/95 and strain K27 differ by only 48 SNPs (fig. S25, L and T, and data S19). The type I.a Ti plasmid of Z4/95 has ~2400 SNP differences in comparison to the type I.a plasmid of K27 and does not cluster with any other type I.a Ti plasmid (data S11).


Modularity confers phenotypic robustness and allows oncogenic plasmids to diversify. But diversification is opposed by selective forces that preserve genetic and physical links necessary to maintain functionality of the plasmids. By accounting for variation and constraints, we were able to sort oncogenic plasmids into defined lineages and infer their evolution. We revealed the most commonly observed types, which are foundational for understanding types yet to be discovered.

Diversification of pathogen populations by plasmid transmission is promoted by the agricultural industry, where large populations of susceptible host species are intensively managed and often clonally propagated within the same facilities, individuals are moved across the world, and disease can persist undetected for extended periods of time. Although clinical settings have some similar features that promote diversification, antibiotic use is suggested to result in a recurring pattern of clonal expansion of pathogens followed by global spread of “super-fit” bacterial lineages (32). However, analysis of plasmids has shown that recombination of antibiotic resistance genes can occur, and entire plasmid sequences need to be studied to avoid drawing misleading conclusions (33). In addition, transmission of plasmids can occur within and even between genera, enhancing and promoting new epidemics (34, 35). Our strategy relied on multiple methods to determine plasmid relationships and coupled it with ancestry of chromosomes to unravel the mosaicism of bacterial evolution. This approach has applications to other systems in modeling the impact of mobile genetic elements on bacterial evolution to limit risks from infectious diseases.

Supplementary Materials

Materials and Methods

Figs. S1 to S25

Data S1 to S19

References (3686)

References and Notes

Acknowledgments: We thank members of the community for their support, their eagerness in sharing ideas, data, and resources, as well as their lively discussions at Crown Gall Conferences. We thank L. Moore for foresight and endeavors in preserving the agrobacterial culture collection, and the many growers, S. Farrand, and J. Puławska for providing samples and strains. We also thank J. Anderson, T. Sharpton, C. Meehan, E.-M. Lai, W. Ream, and members of the Chang, Putnam (Plant Clinic), Loper, and Grünwald labs for their assistance and insightful comments. We acknowledge the staff of the CGRB for their services. Funding: Supported by the National Institute of Food and Agriculture, U.S. Department of Agriculture, award 2014-51181-22384 (J.H.C., M.L.P., and N.J.G.); intramural funding from Academia Sinica (C.-H.K.); USDA NIFA award 2017-67012-26126 (A.J.W.); and a Provost’s Distinguished Graduate Fellowship awarded by Oregon State University (E.W.D.). This material is based on work supported by a NSF Graduate Research Fellowship under grant DGE-1314109 to E.W.D. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Author contributions: A.J.W., E.W.D., J.T., M.M., J.E.L., N.J.G., M.L.P., and J.H.C. conceptualized and designed the experiments. A.J.W., E.W.D., J.T., and M.S.B. performed the research. A.J.W., E.W.D., J.T., N.J.G., M.L.P., and J.H.C. analyzed the data. C.-H.K. contributed resources prior to their publication. A.J.W., E.W.D., J.T., C.-H.K., N.J.G., M.L.P., and J.H.C. wrote the paper. A.J.W., E.W.D., C.-H.K., N.J.G., M.L.P., and J.H.C. acquired funding. Competing interests: The authors declare no competing interests. Data and materials availability: Short reads and assemblies have been deposited in NCBI as BioProject PRJNA607555 and accession numbers are listed in data S1 and S10. Network graphs in nexus or sif format, phylogenetic trees in Newick format, genome annotations, and scripts can be downloaded from (DOI:10.5281/zenodo.3754985). Strains sequenced in this study are available upon request. Records were deidentified to mitigate privacy risks to growers.

Stay Connected to Science

Navigate This Article