Supplemental Data


Abstract
Full Text
Complete Genome Sequence of a Virulent Isolate of Streptococcus pneumoniae
Hervé Tettelin, Karen E. Nelson, Ian T. Paulsen, Jonathan A. Eisen, Timothy D. Read, Scott Peterson, John Heidelberg, Robert T. DeBoy, Daniel H. Haft, Robert J. Dodson, A. Scott Durkin, Michelle Gwinn, James F. Kolonay, William C. Nelson, Jeremy D. Peterson, Lowell A. Umayam, Owen White, Steven L. Salzberg, Matthew R. Lewis, Diana Radune, Erik Holtzapple, Hoda Khouri, Alex M. Wolf, Terry R. Utterback, Cheryl L. Hansen, Lisa A. McDonald, Tamara V. Feldblyum, Samuel Angiuoli, Tanja Dickinson, Erin K. Hickey, Ingeborg E. Holt, Brendan J. Loftus, Fan Yang, Hamilton O. Smith, J. Craig Venter, Brian A. Dougherty, Donald A. Morrison, Susan K. Hollingshead, and Claire M. Fraser

Supplementary Material

Supplemental Figure 1. Linear representation of the S. pneumoniae TIGR4 genome. The location of predicted coding regions color-coded by biological role (see Fig. 1) is displayed, as well as rRNA and tRNAs genes. Arrowed boxes represent the direction of transcription for each ORF. Thin arrows represent IS elements. Numbers next to the tRNA symbols represent the number of tRNAs at a locus. Numbers next to GES regions represent the number of membrane-spanning domains predicted by TopPred (displayed only for ORF products with five or more predicted membrane spanning regions). Transcriptional terminators are represented by hairpins.


Medium version | Full size version


Supplemental Figure 2. Comparison of the S. pneumoniae ORFs to those of other completely sequenced genomes. All ORFs were searched with FASTA3 against all ORFs from other complete genomes including those of plasmids, organelles and phages. The number of S. pneumoniae ORFs whose highest similarity (P < 10-5) is to an ORF from a given species is shown. Abbreviations: LACLA, Lactobacillus lactis; BACHA, Bacillus halodurans; BACSU, Bacillus subtilis; STAAU, Staphylococcus aueus; ECOLI, Escherichia coli; ECO157H7, Escherichia coli O157H7; PASMU, Pasteurella multocida; HAEIN, Haemophilus influenzae; THEMA, Thermotoga maritima; SYNSP, Synechocystis sp.; PORGI, Porphyromonas gingivalis; NEIMEa, Neisseria meningitidis serogroup A; VIBCH, Vibrio cholera; METJA, Methanococcus jannaschii; PSEAE, Pseudomonas aeruginosa; PYRFU, Pyrococcus furiosus; CAUCR, Caulobacter crescentus; DEIRA, Deinococcus radiodurans; MYCTU, Mycobacterium tuberculosis; CAMJE, Campylobacter jejuni; NEIMEb, Neisseria meningitidis serogroup B; ARCFU, Archaeoglobus fulgidus; BBUR, Borrelia burgdoferi; GEOSU, Geobacter sulfurreducens; HELP99, Helicobacter pylori J99; HELPY, Helicobacter pylori 26695; PYRHO, Pyrococcus horikoshii; UREUR, Ureaplasma urealyticum; AQUAE, Aquifex aeolicus; CELEG, Caenorhabditis elegans; METTH, Methanobacterium thermoautotrophicum; PLASMID_lacla.pMRC01, Lactobacillus lactis plasmid; PYRAB, Pyrococcus abyssii; XYLFA, Xylella fastidiosa; YEAST, Saccharomyces cerevisiae; ARATH, Arabidopsis thaliana; CHLTE, Chlamydia trachomatis; PHAGE_streptococcus_thermophilus_720, Streptococcus thermophilus phage; THEAC, Thermoplasma acidophilum; TREPA, Treponema pallidum; DROME, Drosophila melanogaster; HALSP, Halobacterium sp.; HUMAN, Homo sapiens; MYCGE, Mycoplasma genitalium.


Medium version | Full size version


Supplemental Figure 3. Organization of the polymorphic type I restriction enzyme (hsdS) operon (SP0505-SP0510). SP0505 and SP0507 are partial hsdS (specificity subunit) genes. SP0506 is an integrase gene, SP0508-SP0510 are the specifity, modification and restriction subunits. 'A' (thick bar) is an inverted 85 bp repeat, 'B' (thin bar) is an inverted 15 bp repeat. 'A' and 'B' share the core sequence similarity ATTATGGGAA. Clones were sequenced that were fusion of the hsdS and the hsdS' pseudogenes with the boundary being either the 'A' or the 'B' repeat.


Medium version | Full size version


Supplemental Figure 4. Structure of the 21 PTS transporters. Schematic representation of the PTS Enzyme II gene clusters in S. pneumoniae. Each gene is indicated with an arrow indicating the direction of transcription, and the gene number is provided below the line. The regions of each gene encoding PTS domains are color-coded (IIA, blue; IIB, red; IIC, green; and IID, magenta). Additionally, non-PTS genes are indicated in black (transcriptional regulator) or orange (sugar hydrolase). Genes flanking the PTS genes that may be co-transcribed are not shown. For each Enzyme II cluster the probable substrate specificity is provided to the right, a question mark indicates the assignment is speculative, where a substrate specificity assignment was not possible, the PTS family of the Enzyme II constituents is indicated.


Medium version | Full size version


Supplemental Table 1.S. pneumoniae lineage-specific duplications (a). Predicted proteins were grouped into clusters if at least one member of a cluster is more similar to another member of the cluster than to any other gene.
ClusterFirst gene in cluster
SP0137 SP1342 ABC transporter, ATP-binding protein
SP0687 SP1957 SP1987 ABC transporter, ATP-binding protein
SP1341 SP0912 ABC transporter, ATP-binding protein
SP0148 SP0620 ABC transporter, substrate-binding protein
SP1826 SP0243 ABC transporter, substrate-binding protein
SP1002 SP2169 adhesion lipoprotein
SP0112 SP1394 amino acid ABC transporter, periplasmic amino acid-binding protein, putative
SP0533 SP0041 SP0541 bacteriocin BlpK
SP0524 SP1281 BlpT protein, fusion
SP0529 SP0043 BlpC ABC transporter
SP0527 SP2236 sensor histidine kinase BlpH, putative
SP0536 SP0543 immunity protein BlpL
SP0526 SP2235 response regulator BlpR
SP0377 SP0391 SP0390 SP0378 choline binding protein C
SP0440 SP1770 SP1765 SP1766 SP1767 SP1771 SP1764 SP1365 glycosyl transferase, degenerate
SP0658 SP0999 cytochrome c-type biogenesis protein CcdA
SP0603 SP2193 DNA-binding response regulator VncR
SP1089 SP2072 glutamine amidotransferase, class I
SP0301 SP0578 glycosyl hydrolase, family 1, truncation
SP0834 SP1923 hemolysin-related protein
SP0712 SP0715 lactate oxidase, truncation
SP1273 SP1274 SP1367 licD1 protein
SP1235 SP0740 MutT/nudix family protein
SP1330 SP1685 N-acetylmannosamine-6-P epimerase, putative
SP1326 SP1687 neuraminidase, putative
SP1325 SP1686 oxidoreductase, Gfo/Idh/MocA family
SP1471 SP1472 oxidoreductase, putative
SP1427 SP1429 peptidase, U32 family
SP1359 SP0660 peptide methionine sulfoxide reductase
SP1700 SP1701 phospho-2-dehydro-3-deoxyheptonate aldolase
SP1331 SP1674 phosphosugar-binding transcriptional regulator, putative
SP0117 SP2190 SP0930 SP2136 SP2201 SP0069 pneumococcal surface protein A
SP0667 SP1573 SP0965 pneumococcal surface protein, putative
SP1061 SP1612 protein kinase, putative
SP0645 SP1198 SP1199 PTS system IIA component, putative
SP0248 SP0308 PTS system, IIA component
SP0064 SP0284 PTS system, IIA component
SP0646 SP1197 PTS system, IIB component, putative
SP0062 SP2162 PTS system, IIC component
SP0860 SP2060 pyrrolidone-carboxylate peptidase
SP0680 SP0280 ribosomal small subunit pseudouridine synthase A
SP1324 SP1675 ROK family protein
SP0604 SP2192 sensor histidine kinase VncS
SP0466 SP0467 sortase, putative
SP0659 SP1000 thioredoxin family protein
SP0141 SP1115 transcriptional regulator
SP2006 SP0014 transcriptional regulator ComX2
SP0163 SP1057 SP1946 transcriptional regulator PlcR, putative
SP1989 SP2090 transcriptional regulator PlcR, putative
SP0584 SP1144 transcriptional regulator, putative
SP0676 SP0927 transcriptional regulator, putative
SP1858 SP2234 transcriptional regulator, TetR family
SP1615 SP2030 transketolase, authentic frameshift
SP0599 SP0601 transmembrane protein Vexp1
SP0886 SP0509 type I restriction-modification system, M subunit, putative
SP0892 SP0510 type I restriction-modification system, R subunit, putative
SP0505 SP0508 type I restriction-modification system, S subunit, putative
SP0887 SP0891 type I restriction-modification system, S subunit, putative
SP0664 SP1154 SP0071 zinc metalloprotease ZmpB, putative
SP0143 SP0144 SP0925 SP0545 conserved domain protein
SP1136 SP1575 conserved domain protein
SP1332 SP1346 conserved domain protein
SP0145 SP0379 conserved hypothetical protein
SP0619 SP1637 conserved hypothetical protein
SP0686 SP1988 SP1956 conserved hypothetical protein
SP1003 SP1174 SP1175 SP1004 conserved hypothetical protein
SP1327 SP1691 SP1680 conserved hypothetical protein
SP1334 SP1348 conserved hypothetical protein
SP0108 SP0114 hypothetical protein
SP0153 SP1436 hypothetical protein
SP0164 SP1058 hypothetical protein
SP0455 SP0917 hypothetical protein
SP0733 SP1487 SP0810 SP1302 hypothetical protein
SP0734 SP0809 SP1303 hypothetical protein
SP0833 SP2159 hypothetical protein
SP1109 SP0906 hypothetical protein
SP1333 SP1347 hypothetical protein
SP1335 SP1349 hypothetical protein
SP1480 SP0031 SP1481 hypothetical protein
SP1488 SP0087 hypothetical protein
SP1531 SP1805 hypothetical protein
SP1707 SP1708 hypothetical protein
SP1262 SP1444 SP1639 SP1792 SP0028 SP1692 IS1167, transposase
SP0361 SP0836 SP1582 SP0460 IS1167, transposase, truncation
SP0328 SP1418 SP0343 SP0495 SP0714 SP1337 SP1352 SP1439 SP1503 SP1595 SP2089 SP2179 IS1380-Spn1, transposase
SP0537 SP1927 SP0900 SP2137 SP2080 IS1381, transposase OrfA
SP0942 SP1310 IS1381, transposase OrfA
SP1729 SP1928 SP0039 SP2138 IS1381, transposase OrfA/OrfB, truncation
SP0538 SP1195 SP0941 SP1086 SP2079 IS1381, transposase OrfB
SP2154 SP1596 IS3-Spn1, hypothetical protein, truncation
SP0299 SP0345 SP0818 SP0995 IS630-Spn1, transposase Orf1, authentic frameshift
SP0132 SP1149 SP0015 SP2015 SP0086 IS630-Spn1, transposase Orf1, degenerate
SP0300 SP1148 SP0344 SP0819 SP0996 IS630-Spn1, transposase Orf2
SP1314 SP1443 IS66 family element, Orf1
SP0363 SP0811 SP2212 SP0643 transposase family protein, truncation
SP1064 SP1622 transposase, IS200 family
SP1496 SP2014 SP0016 transposase, IS630-Spn1 related, Orf2
(a) The extent of potential lineage specific gene duplications in this genome was estimated by identification of ORFs that are more similar to other ORFs within the TIGR4 genome than to ORFs from other complete genomes including those of plasmids, organelles, and phages. All ORFs were searched with FASTA3 against all ORFs from the complete genomes and matches with a FASTA p value of 10-5 were considered significant.


Supplemental Table 2. ORFs that do not have a 10-5 E value match in other low-GC Gram-positive species (a).
ORFDescription
SP0015 IS630-Spn1, transposase Orf1
SP0016 IS630-Spn1, transposase Orf2
SP0024 conserved hypothetical protein
SP0060 beta-galactosidase
SP0065 sugar isomerase domain protein AgaS
SP0075 phosphorylase, Pnp/Udp family
SP0086 IS630-Spn1, transposase Orf1, truncation
SP0159 conserved hypothetical protein
SP0276 conserved hypothetical protein
SP0298 conserved hypothetical protein
SP0299 IS630-Spn1, transposase Orf1, authentic frameshift
SP0300 IS630-Spn1, transposase Orf2
SP0304 conserved hypothetical protein
SP0318 carbohydrate kinase, PfkB family
SP0344 IS630-Spn1, transposase Orf2
SP0345 IS630-Spn1, transposase Orf1, authentic frameshift
SP0390 choline binding protein G
SP0409 conserved hypothetical protein
SP0481 conserved hypothetical protein
SP0574 hypothetical protein
SP0584 transcriptional regulator, putative
SP0606 oxidoreductase, putative
SP0628 HIT family protein
SP0637 membrane protein
SP0638 conserved hypothetical protein
SP0641 serine protease, subtilase family
SP0695 HesA/MoeB/ThiF family protein
SP0751 branched-chain amino acid ABC transporter, permease protein
SP0795 PEP-utilizing enzymes family protein
SP0818 IS630-Spn1, transposase Orf1, authentic frameshift
SP0819 IS630-Spn1, transposase Orf2
SP0858 membrane protein, putative
SP0859 membrane protein
SP0887 type I restriction-modification system, S subunit, putative
SP0892 type I restriction-modification system, R subunit, putative
SP0907 capsular polysaccharide biosynthesis protein, putative
SP0930 choline binding protein E
SP0939 conserved hypothetical protein
SP0962 lactoylglutathione lyase
SP0965 endo-beta-N-acetylglucosaminidase
SP0977 tellurite resistance protein TehB
SP0995 IS630-Spn1, transposase Orf1, authentic frameshift
SP1063 ABC-2 transporter, permease protein, putative
SP1068 phosphoenolpyruvate carboxylase
SP1069 conserved hypothetical protein
SP1077 conserved domain protein
SP1143 conserved hypothetical protein
SP1144 conserved hypothetical protein
SP1148 IS630-Spn1, transposase Orf2
SP1149 IS630-Spn1, transposase Orf1
SP1222 type II restriction endonuclease, putative
SP1240 conserved hypothetical protein
SP1251 endonuclease, putative
SP1261 conserved hypothetical protein
SP1264 conserved domain protein
SP1268 licB protein
SP1269 choline kinase
SP1270 alcohol dehydrogenase, zinc-containing
SP1315 v-type sodium ATP synthase, subunit D
SP1319 v-type sodium ATP synthase, subunit C
SP1321 v-type sodium ATP synthase, subunit K
SP1322 v-type sodium ATP synthase, subunit I
SP1326 neuraminidase, putative
SP1327 conserved hypothetical protein
SP1343 prolyl oligopeptidase family protein
SP1344 conserved hypothetical protein
SP1350 conserved domain protein
SP1428 conserved hypothetical protein
SP1431 type II DNA modification methyltransferase, putative
SP1442 IS66 family element, Orf2
SP1492 cell wall surface anchor family protein
SP1496 transposase, IS630-Spn1 related, Orf2
SP1543 conserved hypothetical protein, authentic point mutation
SP1546 conserved domain protein
SP1547 conserved hypothetical protein
SP1549 polypeptide deformylase
SP1550 glutathione S-transferase family protein
SP1600 hypothetical protein
SP1680 conserved hypothetical protein
SP1687 neuraminidase B
SP1691 conserved hypothetical protein
SP1740 conserved hypothetical protein
SP1765 glycosyl transferase, family 8
SP1768 conserved hypothetical protein
SP1770 glycosyl transferase, family 8
SP1783 MutT/nudix family protein
SP1809 transcriptional regulator
SP1826 ABC transporter, substrate-binding protein
SP1827 hypothetical protein
SP1850 type II restriction endonuclease DpnI
SP1851 conserved hypothetical protein
SP1894 sucrose phosphorylase
SP1916 PAP2 family protein
SP2014 IS630-Spn1, transposase Orf2
SP2015 IS630-Spn1, transposase Orf1
SP2017 membrane protein
SP2027 conserved hypothetical protein
SP2031 conserved hypothetical protein
SP2037 PTS system, IIB component
SP2063 LysM domain protein, authentic frameshift
SP2081 conserved hypothetical protein
SP2122 conserved hypothetical protein
SP2146 conserved hypothetical protein
SP2158 L-fucose isomerase
SP2165 fucose operon FucU protein
(a) All ORFs were searched with FASTA3 against all ORFs from other complete genomes including those of plasmids, organelles and phages. Matches with a FASTA p value of 10-5 were considered significant.


Supplemental Table 3. Genes involved in competence.
OperonORFGene nameInduced (a)Required (b)Function (c)Closest relatives (d, e)Other relatives (e, f)
1SP0954 celA++DNA bindingBS BH e-20TM e-7
1SP0955 celB++DNA uptake poreBS BH e-18MG e-7
2SP0978 coiA++UnknownBH BS e-12-
3SP1266 dal, cilB++DNA processingBH e-31many e-25
4SP1808 cclA, cilC++Prepilin processing peptidaseAQ e-10TM SS NM DR BS e-6
5SP1908 ssbB, cilA++ssDNA bindingBH BS e-18many e-10
6SP1937 lytA+-AutolysinSP e-18family of 15: SPN e-10
6SP1939 dinF+-Efflux pumpBH BS e-28many e-20
6SP1940 recA++Strand assimilationBH BS e-50many e-50
6SP1941 cinA+-UnknownBH BS e-50SS e-50; many e-17
7SP2051 cglC++Pilin-like wall structureBH e-6-
7SP2052 cglB++Prepilin transport ATPaseBH BS e-10-
7SP2053 cglA++Prepilin transport poreBS BH e-25many e-18
8SP2208 cflA++Helicase?BS BH e-50SS CJ SP TP AA e-6
8SP2207 cflB++Uptake pilot protein?BS BH e-21NM e-6
(1) Expression induced strongly in competent cells.
(2) Required for transformation (mutation decreases recombinants < 70%).
(c) Lacks, S. A., 1999. DNA uptake by transformable bacteria. In Transport of Molecules across Microbial Membranes, Broome-Smith, J. K., Baumberg, S., Stirling, C. J., and Ward, F. B. (eds.), pp.138-168, Cambridge University Press, Cambridge, UK.
(d) Among completed genomes, species with closest ortholog are given, with BLAST probability.
(e) Species abbreviations: BS B. subtilis, BH B. halodurans, SS Synechocystis spp., CJ Campylobacter jejuni, TP T. pallidum, AA Aquifex aeolicus, NM N. meningitidis, DR Deinococcus radiodurans, MG Mycoplasma genitalium, TM Thermotoga maritima, - none.
(f) Species with more distant homologs.


Supplemental Table 4. Genes related to virulence based on experimental data (a).
ORF (b)DescriptionGene NameOther known Roles (b)Support. data(c)Reference (d)
Adherence
SP0377 choline binding protein CpcpC/cbpC (smaA)A,DGosink et al. 2000, Lau et al. 2001
SP0660 peptide methionine sulfoxide reductasemsrAAWizemann 1996
SP0730 pyruvate oxidasespxBHost defense, cellular metabolismA,HOverweg et al. 2000, Spellerberg 1996
SP0933 pyrrolidone-5-carboxlate reductaseproC, smmJA,CTuomanen et al. 2000, Lau et al. 2001
SP0966 adherence and virulence proteinpavAALau et al. 2001
SP1274 LicD2 proteinlicD2A,B,CZhang et al. 1999
SP1650 manganese ABC transporter, manganese-binding adhesion liproteinpsaAAcquisitionA,B,C,FSampson et al. 1994, Dintilhac et al. 1997, Berry and Paton, 1996
SP2190 choline binding protein AcbpAHost defenseA,B,CRosenow et al. 1997
SP2194 ATP dependent Clp protease, ATP binding subunitclpCACharpentier et al. 2000
Cellular metabolism/ acquisition of nutrients
SP0044 phosphoribosylaminoimidazole-succinocarboxamide synthasepurC, SPN-1646BPolissi et al. 1998
SP0045 phosphoribosylformylglycinamidine synthase, putativepurL, SPN-786BPolissi et al. 1998
SP0047 phosphoribosylformylglycinamide cyclo-ligasepurM, smmBC,DLau et al. 2001
SP0048 phosphoribosylglycinamide formyltransferasepurN, smmBC,DLau et al. 2001
SP0053 phosphoribosylaminoimidazole carboxylase, catalytic subunitpurE, SPN-1404BPolissi et al. 1998
SP0054 phosphoribosylaminoimidazole carboxylase, ATPase subunitpurKB,DPolissi et al. 1998
SP0251 formate acetyltransferase, putativesmmFC,DLau et al. 2001
SP0265glycosyl hydrolase, family 1SPN-1818BPolissi et al. 1998
SP0267oxidoreductase, putativeSPN-962BPolissi et al. 1998
SP0268 alkaline amylopullulanase, putativespuAGZysk et al. 2000, Bongaerts et al. 2000
SP0314 hyaluronidasehylDBerry et al. 1994
SP0498 endo-beta-N-acetylglucosaminidase, putativeGZysk et al. 2000
SP0648 beta-galactosidasebgaAGZysk et al. 2000
SP0659 thioredoxin family proteinsmsBC,DLau et al. 2001
SP0766 superoxide dismutase, manganese-dependentsodADYesilkaya et al. 2000
SP0932 gamma-glutamyl phosphate reductasesmmKCLau et al. 2001
SP0948 PhoH family proteinSPN-1585BPolissi et al. 1998
SP0965 endo-beta-N-acetylglucosaminidaselytBColonization, Host defenseB,C,EGarcia et al. 1999, Wizemann et al. 2001
SP0981 protease maturation protein, putativeppmAHost defenseD,G,HOverweg et al. 2000b
SP1024 serine hydroxymethyltransferasesmmICLau et al. 2001
SP1068phosphoenolpyruvate carboxylasesmmEC,DLau et al. 2001
SP1168 mutator MutX proteinmutXB,DMejean et al. 1994, Polissi et al. 1998
SP1326 neuraminidase, putativenanBCBerry et al. 1996
SP1445 GMP synthaseguaABPolissi et al. 1998
SP1469 NADH oxidasenoxCAuzat et al. 1999
SP1498 phosphoglucomutasepgmCHardy et al. 2000
SP1573 lysozymelytCColonization, adherence, host defenseA,B,ELopez et al. 2000, Wizemann et al. 2001
SP1574 triosphosphate isomeraetpiGZysk et al. 2000
SP1687 neuraminidase BnanBEBerry et al. 1996
SP1693 neuraminidase A, authentic frameshiftnanAC,DCamara et al. 1994
SP1782ribosomal protein L11 methyltransferaseSPN-1583BPolissi et al. 1998
SP1923 pneumolysinplyHost defenseC,D,E,GWallker et al. 1987, Berry et al. 2000
SP1937 autolysinlytAC,D,EGarcia et al. 1986, Berry et al. 2000
SP2066 threonine synthasesmmGDLau et al. 2001
SP2091 glycerol-3-phosphate dehydrogenase (NAD(P)+)smmH, gpsAAdherenceA,DLau et al. 2001, Tuomanen, 2000
SP2099 penicillin-binding protein 1Bpbp1B, smcACLau et al. 2001
SP2142ROK family proteinSPN-1808BPolissi et al. 1998
Transporters/ proteases
SP0043 competence factor transport proteincomBCLau et al. 2001
SP0185 magnesium transporter, CorA familysmtJC,DLau et al. 2001
SP0284PTS system, mannose-specific IIAB componentssmtKDLau et al. 2001
SP0366oligopeptide ABC transporter, oligopeptide-binding proteinaliA/plpAAdherenceATuomanen et al. 2000, Pearce 1994, Alloing 1994, Cundell et al. 1995
SP0483 ABC transporter, ATP-binding proteinsmtGC,DLau et al. 2001
SP0609 amino acid ABC transporter, amino-acid binding proteinglnQ, SPN-1364BPolissi et al. 1998
SP0610 amino acid ABC transporter, ATP binding proteinglnH, SPN-1452BPolissi et al. 1998
SP0641 serine protease, subtilase familyprtAG,EZysk et al. 2000, Wizemann et al. 2001
SP0664zinc metalloprotease ZmpB, putativeSPN-1338B,CPolissi et al. 1998
SP0737 sodium-dependent transporterGZysk et al. 2000
SP0820 ATP-dependent Clp protease, ATP-binding subunitclpE, SPN-1055BPolissi et al. 1998
SP0823 amino acid ABC transporter, permease proteinsmtCC,DLau et al. 2001
SP0913 ABC transporter, permease protein, putativesmtFC,DLau et al. 2001
SP1032 iron-compound ABC transporter, iron compound-binding proteinsmtA, pitlAC,D,GZysk et al. 2000, Brown et al. 2001
SP1033iron-compound ABC transporter, permease proteinsmtA, pitlBC,DLau et al. 2001, Brown et al. 2001
SP1241 amino acid ABC transporter, periplasmic solute-binding protein/permease proteinGZysk et al. 2000
SP1342toxin secretion ABC transporter, ATP-binding/permease proteinSPN-948BPolissi et al. 1998
SP1386 spermidine/putrescine ABC transporter, periplasmic spermidine/putrescine-binding proteinpotD, SPN-924BPolissi et al. 1998
SP1389 spermidine/putrescine ABC transporter, ATP-binding proteinpotA, SPN-2041B,DPolissi et al. 1998
SP1527 oligopeptide ABC transporter, oligopeptide-binding proteinaliBC,DLau et al. 2001, Alloing et al. 1994
SP1580 sugar ABC transporter, ATP-binding proteinmsmK, SPN-1802BPolissi et al. 1998
SP1623cation-transporting ATPase, E1-E2 familySPN-1145BPolissi et al. 1998
SP1715 ABC transporter, ATP-binding proteinsmtB,HC,DLau et al. 2001
SP1889 oligopeptide ABC transporter, permease proteinamiDAdherenceACundell et al. 1995
SP1891 oligopeptide ABC transporter, oligopeptide-binding proteinamiAHost defense, adherenceA,HCundell et al. 1995, Overweg et al. 2000
SP1957ABC transporter, ATP-b inding proteinSPN-1113BPolissi et al. 1998
SP2019 ABC transporter, ATP-binding protein, truncationCBartilson et al. 2001
SP2169zinc ABC transporter, zinc-binding lipoproteinadcAB,CDintilhac et al. 1997
SP2220 ABC transporter, ATP-binding proteinsmtEC,DLau et al. 2001
SP2230 ABC transporter, ATP-binding proteinGZysk et al. 2000
Host defense
SP0069 choline binding protein IcbpIGosink et al. 2000
SP0071 immunoglobulin A1 proteaseiga, SPN-1471ColonizationB,D,FPoulsen et al. 1998, Polissi et al. 1998
SP0117 pneumococcal surface protein ApspACellular metabolismC,E,F
G,H,
Yother et al. 1994, Hollingshead et al. 2000, Hammerschmidt et al. 1999, Berry et al. 2000
SP0346 capsular polysaccharide biosynthesis proteincps4AColonizationB,CCaimano et al. 2000, Garcia et al. 2000, Paton et al. 2000
SP0347 capsular polysaccharide biosynthesis proteincps4BColonizationB,CCaimano et al. 2000, Garcia et al. 2000, Paton et al. 2000
SP0348 capsular polysaccharide biosynthesis proteincps4CColonizationB,CCaimano et al. 2000, Garcia et al. 2000, Paton et al. 2000
SP0349 capsular polysaccharide biosynthesis proteincps4DColonizationB,CCaimano et al. 2000, Garcia et al. 2000, Paton et al. 2000
SP0350 capsular polysaccharide biosynthesis proteincps4EColonizationB,CCaimano et al. 2000, Garcia et al. 2000, Paton et al. 2000
SP0351 capsular polysaccharide biosynthesis proteincps4FColonizationB,CCaimano et al. 2000, Garcia et al. 2000, Paton et al. 2000
SP0352 capsular polysaccharide biosynthesis proteincps4GColonizationB,CCaimano et al. 2000, Garcia et al. 2000, Paton et al. 2000
SP0353 capsular polysaccharide biosynthesis proteincps4HColonizationB,CCaimano et al. 2000, Garcia et al. 2000, Paton et al. 2000
SP0357 UDP-N-acetylglucosamine-2-epimerasecps4IColonizationB,CCaimano et al. 2000, Garcia et al. 2000, Paton et al. 2000
SP0358 capsular polysaccharide biosynthesis proteincps4JColonizationB,CCaimano et al. 2000, Garcia et al. 2000, Paton et al. 2000
SP0359 capsular polysaccharide biosynthesis proteincps4KColonizationB,CCaimano et al. 2000, Garcia et al. 2000, Paton et al. 2000
SP0360 UDP-N-acetylglucosamine-2-epimerasecps4LColonizationB,CCaimano et al. 2000, Garcia et al. 2000, Paton et al. 2000
SP0390 choline binding protein GcbpGColonization, adherenceA,B,CGosink et al. 2000
SP0930 choline binding protein EcbpE, pceColonization, adherence, cellular metabolismA,BGosink et al. 2000
SP1003 conserved hypothetical proteinphtDE,GAdamou et al. 2001
SP1004 conserved hypothetical proteinphtCE,GAdamou et al. 2001
SP1154 immunoglobulin A1 proteaseigaColonizationB,D,FPoulsen et al. 1998
SP1174 conserved domain proteinphtBE,GAdamou et al. 2001
SP1175 conserved domain proteinphtAE,GAdamou et al. 2001
SP1693 neuraminidase A, authentic frameshiftnanABTong et al. 2000
SP2190 choline binding protein AcbpAAdhesionA,B,CRosenow et al. 1997
SP2201 choline binding protein DcbpDColonizationBGosink et al. 2000
Other categories/ unknown
SP0023DNA repair protein RadA, authentic point mutationSPN-636BPolissi et al. 1998
SP0143conserved domain proteinSPN-627BPolissi et al. 1998
SP0175 riboflavin synthase, beta subunitribHGZysk et al. 2000
SP0370recombination protein USPN-633BPolissi et al. 1998
SP0371conserved hypothetical proteinSPN-631BPolissi et al. 1998
SP0629conserved hypothetical proteinsmcBC,DLau et al. 2001
SP0742conserved hypothetical proteinSPN-641BPolissi et al. 1998
SP0761 ATP-dependent RNA helicase, DEAD/DEAH box familyGZysk et al. 2000
SP0771 peptidyl-prolyl cis-trans isomerase, cyclophilin-typeGZysk et al. 2000
SP1026 conserved hypothetical proteinEWizemann et al. 2001
SP1207 exodeoxyribonuclease VII, large subunitxseAGZysk et al. 2000
SP1291Cof family proteinSPN-224BPolissi et al. 1998
SP1482 oxidoreductase, Gfo/Idh/MocA familysmuAC,DLau et al. 2001
SP1637conserved hypothetical proteinSPN-1119BPolissi et al. 1998
SP1654 conserved hypothetical proteinsmuBC,DLau et al. 2001
SP1972membrane proteinSPN-233BPolissi et al. 1998
SP1997 Cof family proteinSPN-1101BPolissi et al. 1998
SP2053 competence proteincglACBartilson et al. 2001
SP2116conserved domain proteinSPN-655BPolissi et al. 1998
SP2144conserved hypothetical proteinSPN-1631BPolissi et al. 1998
SP2145 antigen, cell wall surface anchor familysmuDC,DLau et al. 2001
SP2146conserved hypothetical proteinSPN-1200BPolissi et al. 1998
SP2236 putative sensor histidine kinasecomDC,DBartilson et al. 2001
(a) The table is limited to genes for which experimental data supports a role in virulence. This list connects data in the scientific literature prior to May 2001 with specific gene loci in the TIGR4 isolate. A broad definition of virulence as anything required for the infectious process but not for life in vitro has been used. This non-exhaustive list demonstrates the range of factors that are required to maintain the complex interaction between the bacterium and its host. We thank Dr. Gregor Zysk and Dr. Andrea Polissi for sharing specific sequence data not deposited in Genbank (those for which nucleotide sequence identity is greater than 92% are displayed).

(b) The genes have been divided into four general categories: Adherence/Colonization, Cellular Metabolism/acquisition of Nutrients, Transporters/proteases, Host Defense and Other categories/unknown.

(c) Eight specific types of experimental data justified inclusion in the table: A- in vitro adherence assay, B- gene knockout ineffective in mouse, rat or chinchilla nasopharyngeal colonization model, C- gene knockout ineffective in mouse septicemia model, D- gene knockout ineffective in mouse S. pneumoniae respiratory tract model, E- antibodies are protective against invasive disease in animal model, F- antibodies are protective against colonization in animal model, G- antibodies are elicited during infection in humans, H- antibodies show opsonic activity with human polymorphonuclear leukocytes.

(d) References: P. Garcia, J. L. Garcia, E. Garcia, R. Lopez, Gene43, 265 (1986); J. A. Walker, R. L. Allen, P. Falmagne, M. K. Johnson, G. J. Boulnois, Infect Immun55, 1184 (1987); G. Alloing, P. de Philip, J. P. Claverys, J Mol Biol241, 44 (1994); A. M. Berry et al., Infect Immun62, 1101 (1994); M. Camara, G. J. Boulnois, P. W. Andrew, T. J. Mitchell, Infect Immun62, 3688 (1994); V. Mejean, C. Salles, L. C. Bullions, M. J. Bessman, J. P. Claverys, Mol Microbiol11, 323 (1994); B. J. Pearce, A. M. Naughton, H. R. Masure, Mol Microbiol12, 881 (1994); J. S. Sampson, S. P. O'Connor, A. R. Stinson, J. A. Tharpe, H. Russell, Infect Immun62, 319 (1994); J. Yother, J. M. White, J Bacteriol176, 2976 (1994); D. R. Cundell, B. J. Pearce, J. Sandros, A. M. Naughton, H. R. Masure, Infect Immun63, 2493 (1995); A. M. Berry, J. C. Paton, Infect Immun64, 5255 (1996); A. M. Berry, R. A. Lock, J. C. Paton, J Bacteriol178, 4854 (1996); B. Spellerberg et al., Mol Microbiol19, 803 (1996); T. M. Wizemann et al., Proc Natl Acad Sci U S A93, 7985 (1996); A. Dintilhac, G. Alloing, C. Granadel, J. P. Claverys, Mol Microbiol25, 727 (1997); C. Rosenow et al., Mol Microbiol25, 819 (1997); A. Polissi et al., Infect Immun66, 5620 (1998); K. Poulsen et al., Infect Immun66, 181 (1998); I. Auzat et al., Mol Microbiol34, 1018 (1999); P. Garcia, M. P. Gonzalez, E. Garcia, R. Lopez, J. L. Garcia, Mol Microbiol31, 1275 (1999); S. Hammerschmidt, G. Bethe, H. R. P, G. S. Chhatwal, Infect Immun67, 1683 (1999); J. R. Zhang, I. Idanpaan-Heikkila, W. Fischer, E. I. Tuomanen, Mol Microbiol31, 1477 (1999); R. J. Bongaerts, H. P. Heinz, U. Hadding, G. Zysk, Infect Immun68, 7141 (2000); A. M. Berry, J. C. Paton, Infect Immun68, 133 (2000); M. J. Caimano, G. G. Hardy, J. Yother, in Streptococcus pneumoniae - Molecular biology and mechanisms of disease A. Tomasz, Ed. (Mary Ann Liebert, Larchmont, NY, 2000) pp. 115; E. Charpentier, R. Novak, E. Tuomanen, Mol Microbiol37, 717 (2000); E. Garcia, C. Arrecubieta, R. Munoz, M. Mollerach, R. Lopez, in Streptococcus pneumoniae - Molecular biology and mechanisms of disease A. Tomasz, Ed. (Mary Ann Liebert, Larchmont, NY, 2000) pp. 139; K. K. Gosink, E. R. Mann, C. Guglielmo, E. I. Tuomanen, H. R. Masure, Infect Immun68, 5690 (2000); G. G. Hardy, M. J. Caimano, J. Yother, J Bacteriol182, 1854 (2000); S. K. Hollingshead, R. Becker, D. E. Briles, Infect Immun68, 5889 (2000); R. Lopez, M. P. Gonzalez, E. Garcia, J. L. Garcia, P. Garcia, Res Microbiol151, 437 (2000); K. Overweg et al., Infect Immun68, 4604 (2000); K. Overweg et al., Infect Immun68, 4180 (2000b); J. C. Paton, J. K. Morona, R. Morona, in Streptococcus pneumoniae - Molecular biology and mechanisms of disease A. Tomasz, Ed. (Mary Ann Liebert, Larchmont, NY, 2000) pp. 129; H. H. Tong, L. E. Blue, M. A. James, T. F. DeMaria, Infect Immun68, 921 (2000); E. I. Tuomanen, H. R. Masure, in Streptococcus pneumoniae - Molecular biology and mechanisms of disease A. Tomasz, Ed. (Mary Ann Liebert, Larchmont, NY, 2000) pp. 295; H. Yesilkaya et al., Infect Immun68, 2819 (2000); G. Zysk et al., Infect Immun68, 3740 (2000); J. E. Adamou et al., Infect Immun69, 949 (2001); M. Bartilson et al., Mol Microbiol39, 126 (2001); J. S. Brown, S. M. Gilliland, D. W. Holden, Mol Microbiol40, 572 (2001); G. W. Lau et al., Mol Microbiol40, 555 (2001); T. M. Wizemann et al., Infect Immun69, 1593 (2001).


Supplemental Table 5. Complete list of genes containing stretches of iterative DNA that could induce phase-variation (a).
ORFRepeatRegionDescription
SP0001 CCCCCCCmiddlechromosomal replication initiator protein DnaA
SP0001 AAAAAAAA3primechromosomal replication initiator protein DnaA
SP0006 TGATGATGATGATmiddletranscription-repair coupling factor
SP0014 GGGGGGmiddletranscriptional regulator ComX1
SP0035 GGGGGGpromotaromatic amino acid aminotransferase
SP0037 GGGGGG5primefatty acid/phospholipid synthesis protein PlsX
SP0071 ATATATATmiddleimmunoglobulin A1 protease
SP0071 TATATATA3primeimmunoglobulin A1 protease
SP0080 AGAGAGAG3primehypothetical protein
SP0097 CACACACA3primeconserved domain protein
SP0097 TGTGTGTGmiddleconserved domain protein
SP0100 TATATATApromotconserved hypothetical protein
SP0102 GGGGGGmiddleglycosyl transferase
SP0106 GAGAGAGAmiddleL-serine dehydratase, iron-sulfur-dependent, beta subunit
SP0111 AGAGAGAGmiddleamino acid ABC transporter, ATP-binding protein, putative
SP0129 CCCCCC3primeglycoprotease family protein
SP0130 CCCCCC3primeIS1167, transposase, degenerate
SP0130 CTCTCTCTmiddleIS1167, transposase, degenerate
SP0131 CCCCCC3primeIS630-Spn1, transposase Orf2, degenerate
SP0133 AAAAAAAA3primehypothetical protein
SP0134 AAAAAAAApromothypothetical protein
SP0137 TTTTTTTTmiddleABC transporter, ATP-binding protein
SP0137 AAAAAAAA3primeABC transporter, ATP-binding protein
SP0138 TATATATAT3primehypothetical protein
SP0144 TTTTTTTTmiddlehypothetical protein
SP0145 TCTCTCTC3primeconserved hypothetical protein
SP0153 AGGAAGGAAGGAAmiddlehypothetical protein
SP0163 ATATATATAT5primetranscriptional regulator PlcR, putative
SP0167 ATATATAT3primehypothetical protein
SP0168 TTATTATTATTA5primemacrolide efflux protein, putative
SP0178 ATATATAT3primeriboflavin biosynthesis protein RibD
SP0188 AAAAAAAAmiddlehypothetical protein
SP0205 CTCTCTCT5primeanaerobic ribonucleoside-triphosphate reductase activating protein
SP0210 GTGGTGGTGGTGmiddleribosomal protein L4
SP0227 TTGTTGTTGTTG5primeribosomal protein S5
SP0254 GAGAGAGApromotleucyl-tRNA synthetase
SP0259 TTTTTTTTpromotHolliday junction DNA helicase RuvB
SP0274 CTCTCTCTCmiddleDNA polymerase III, alpha subunit, Gram-positive type
SP0274 GGGGGG3primeDNA polymerase III, alpha subunit, Gram-positive type
SP0278 CTCTCTCT5primeaminopeptidase PepS
SP0288 CCCCCCpromotconserved hypothetical protein
SP0294 TATATATApromotribosomal protein L13
SP0312 GAGAGAGA5primeglycosyl hydrolase, family 31
SP0319 CCCCCCmiddleconserved domain protein
SP0338 CTCTCTCTC5primeATP-dependent Clp protease, ATP-binding subunit, putative
SP0346 TATTTATTTATT5primecapsular polysaccharide biosynthesis protein Cps4A
SP0349 AAAAAAAA5primecapsular polysaccharide biosynthesis protein Cps4D
SP0350 AGAGAGAGAmiddlecapsular polysaccharide biosynthesis protein Cps4E
SP0351 AAAAAAAA5primecapsular polysaccharide biosynthesis protein Cps4F
SP0351 AAAAAAAAA5primecapsular polysaccharide biosynthesis protein Cps4F
SP0352 ATATATAT5primecapsular polysaccharide biosynthesis protein Cps4G
SP0352 TTTTTTTTmiddlecapsular polysaccharide biosynthesis protein Cps4G
SP0353 AAAAAAAA5primecapsular polysaccharide biosynthesis protein Cps4H
SP0356 GAGAGAGA5primeO-antigen transporter RfbX, putative
SP0362 GGGGGGmiddleIS66 family element, Orf3, degenerate
SP0362 TCTCTCTC5primeIS66 family element, Orf3, degenerate
SP0372 GAGAGAGApromotconserved hypothetical protein
SP0380 TGTGTGTGT5primehypothetical protein
SP0380 GGGGGG5primehypothetical protein
SP0386 TTTTTTTT5primesensor histidine kinase, putative
SP0387 TCTCTCTC3primeDNA-binding response regulator
SP0394 GGGGGGmiddlePTS system, mannitol-specific IIBC components
SP0401 TTTTTTTTmiddlehelicase, putative
SP0411 AGAGAGAG3primeseryl-tRNA synthetase
SP0415 GGGGGGmiddleenoyl-CoA hydratase/isomerase family protein
SP0419 GGGGGGmiddleenoyl-(acyl-carrier-protein) reductase
SP0422 CTCTCTCTmiddle3-oxoacyl-(acyl-carrier-protein) synthase II
SP0437 AGCAGCAGCAGC3primeglutamyl-tRNA(Gln) amidotransferase, A subunit
SP0453 AGAGAGAGmiddleamino acid ABC transporter, amino acid-binding protein/permease protein
SP0458 TCTTTCTTTCTT3primeDNA-damage inducible protein P
SP0460 AGAGAGAG3primeIS1167, transposase
SP0460 GGTAGAGGTAGAGGTAGAGGTAGAG5primeIS1167, transposase
SP0462 GAGAGAGAGmiddlecell wall surface anchor family protein
SP0481 CTCTCTCT3primeconserved hypothetical protein
SP0484 GAGAGAGAmiddleconserved hypothetical protein
SP0494 GCTGCTGCTGCT3primeCTP synthase
SP0496 AAAAAAAA5primeNa/Pi cotransporter II-related protein
SP0505 CCCCCC5primetype I restriction-modification system, S subunit, putative
SP0508 GGGGGGmiddletype I restriction-modification system, S subunit
SP0514 TCTTTCTTTCTT3primehypothetical protein
SP0523 ACTGGACTGGACTGGmiddleABC transporter, permease protein, putative
SP0532 GGGGGG5primebacteriocin BlpJ
SP0544 AAAAAAAA3primeimmunity protein BlpX
SP0547 CTCTCTCT3primeconserved domain protein
SP0559 GAGAGAGA3primehypothetical protein
SP0560 GAGAGAGA3primehypothetical protein
SP0565 GAGAGAGA5primeconserved domain protein
SP0567 ACACACACA3primeconserved domain protein
SP0568 ACACACACApromotvalyl-tRNA synthetase
SP0570 TTTTTTTT3primeconserved domain protein
SP0570 AAAAAAAA3primeconserved domain protein
SP0574 GGGGGG3primehypothetical protein
SP0575 GAGAGAGAmiddlehelicase, putative
SP0577 ATGATGATGATG5primePTS system, beta-glucosides-specific IIABC components
SP0580 GAGAGAGAmiddleacetyltransferase, GNAT family
SP0582 AAAAAAAA5primehypothetical protein
SP0590 GAGAGAGA5primeacetyltransferase, GNAT family
SP0593 TCTCTCTCT3primeleucine-rich protein
SP0595 TATATATAmiddlehypothetical protein
SP0604 CTCTCTCT5primesensor histidine kinase VncS
SP0604 GAGAGAGAmiddlesensor histidine kinase VncS
SP0609 TTTTTTTTpromotamino acid ABC transporter, amino acid-binding protein
SP0614 GGGGGG5primetributyrin esterase
SP0615 ATATATAT3primebeta-lactam resistance factor
SP0618 TGTGTGTGmiddleexcinuclease ABC, subunit C
SP0618 GGGGGG3primeexcinuclease ABC, subunit C
SP0621 ATATATATpromothypothetical protein
SP0636 CTCTCTCT3primeABC transporter, ATP-binding protein
SP0641 TATATATA5primeserine protease, subtilase family
SP0641 ATGATGATGATGmiddleserine protease, subtilase family
SP0641 AAAAAAAA3primeserine protease, subtilase family
SP0644 GAGAGAGA5primeIS66 family element, Orf3, degenerate
SP0652 GAGAGAGA3primeconserved hypothetical protein
SP0663 GAGAGAGA3primeconserved hypothetical protein
SP0664 CAAAACAAAACAAAA5primezinc metalloprotease ZmpB, putative
SP0683 AGAGAGAGpromothypothetical protein
SP0689 GGGGGG5primeUDP-N-acetylglucosamine--N-acetylmuramyl-(pentapeptide) pyrophosphoryl-undecaprenol N-acetylglucosamine transferase
SP0689 GGGGGG5primeUDP-N-acetylglucosamine--N-acetylmuramyl-(pentapeptide) pyrophosphoryl-undecaprenol N-acetylglucosamine transferase
SP0695 ATATATATATmiddleHesA/MoeB/ThiF family protein
SP0697 TTTTTTTTpromotABC transporter, ATP-binding protein, authentic point mutation
SP0704 TTTTTTTT5primehypothetical protein
SP0704 TTATTATTATTAmiddlehypothetical protein
SP0705 ATATATAT5primehypothetical protein
SP0705 TTTTTTTTmiddlehypothetical protein
SP0715 GGGGGGmiddlelactate oxidase
SP0719 CTCTCTCTmiddleconserved hypothetical protein
SP0729 GCTTGCTTGCTT3primecation-transporting ATPase, E1-E2 family
SP0753 GGGGGGGmiddlebranched-chain amino acid ABC transporter, ATP-binding protein
SP0758 GGGGGG5primePTS system, IIABC components
SP0762 GGGGGGpromotS-adenosylmethionine synthetase
SP0785 GGGGGG5primeconserved hypothetical protein
SP0797 CTCTCTCT5primeaminopeptidase N
SP0798 TTTTTTTTpromotDNA-binding response regulator CiaR
SP0802 CACACACA5primeDNA polymerase III, epsilon subunit/ATP-dependent helicase DinG
SP0807 AAAAAAAA5primeseptation ring formation regulator EzrA, putative
SP0818 AAAAAAAAAA3primeIS630-Spn1, transposase Orf1, authentic frameshift
SP0829 GAGAGAGA3primephosphopentomutase
SP0837 CTCTCTCT3primeDNA topology modulation protein FlaR, putative
SP0842 AAAAAAAA5primepyrimidine-nucleoside phosphorylase
SP0846 GGGGGG5primesugar ABC transporter, ATP-binding protein
SP0851 AGAGAGAG3primeconserved hypothetical protein
SP0852 GCGCGCGC3primetopoisomerase IV, subunit B
SP0872 AAAAAAAApromotD-alanyl-D-alanine carboxypeptidase
SP0887 AATAATAATAAT5primetype I restriction-modification system, S subunit, putative
SP0887 AAAAAAAA3primetype I restriction-modification system, S subunit, putative
SP0887 CCCCCC3primetype I restriction-modification system, S subunit, putative
SP0890 GAGAGAGAG5primeintegrase/recombinase, phage integrase family
SP0891 AAAAAAAA3primetype I restriction-modification system, S subunit, putative
SP0891 CCCCCC3primetype I restriction-modification system, S subunit, putative
SP0892 TCTCTCTC3primetype I restriction-modification system, R subunit, putative
SP0894 AAAAAAAAApromotX-pro dipeptidyl-peptidase
SP0895 TGATATTGATATTGATATmiddleDNA polymerase III, alpha subunit
SP0897 ATATATATApromotpyruvate kinase
SP0901 AAAAAAAA3primehypothetical protein
SP0907 GGGGGG5primecapsular polysaccharide biosynthesis protein, putative
SP0911 GTGTGTGTGTmiddlehypothetical protein
SP0931 CTCTCTCTC3primeglutamate 5-kinase
SP0966 AAAAAAAA5primeadherence and virulence protein A
SP0981 AAAAAAAA5primeprotease maturation protein, putative
SP0987 AAAAAAAApromothypothetical protein
SP0994 AGAGAGAGmiddlehypothetical protein
SP0999 TCTCTCTC5primecytochrome c-type biogenesis protein CcdA
SP1001 TGTGTGTG5primeamino acid permease family protein
SP1005 TATATATAmiddleconserved domain protein, degenerate
SP1010 TTTTTTTT3primelarge conductance mechanosensitive channel protein MscL
SP1033 TATATATA5primeiron-compound ABC transporter, permease protein
SP1045 GAGAGAGA5primeconserved hypothetical protein TIGR00147
SP1052 TTTTTTTT5primephosphoesterase, putative
SP1057 ATATATATAT5primetranscriptional regulator PlcR, putative
SP1062 AAAAAAAAmiddleABC transporter, ATP-binding protein
SP1063 TATATATATAmiddleABC-2 transporter, permease protein, putative
SP1083 AGAGAGAGA5primeconserved hypothetical protein
SP1083 GGGGGG5primeconserved hypothetical protein
SP1083 GAGAGAGAmiddleconserved hypothetical protein
SP1087 TGTGTGTG5primeATP-dependent DNA helicase PcrA
SP1090 AGAGAGAGAG5primeconserved hypothetical protein
SP1113 CAGCAGCAGCAG5primeDNA-binding protein HU
SP1130 TTTTTTTT3primetranscriptional regulator
SP1130 ATATATAT3primetranscriptional regulator
SP1151 GAGAGAGAmiddleexonuclease RexB
SP1152 CTCTCTCTmiddleexonuclease RexA
SP1153 GAGAGAGAmiddlehypothetical protein
SP1153 AAAAAAAA3primehypothetical protein
SP1160 TTTTTTTTmiddlelipoate-protein ligase, putative
SP1166 AAATAAATAAAT5primeMATE efflux family protein
SP1168 CTCTCTCT5primemutator MutT protein
SP1171 TGTGTGTG3primehydrolase, haloacid dehalogenase-like family
SP1215 TTTTTTTTmiddletransporter, FNT family, putative
SP1219 GCGCGCGCmiddleDNA gyrase subunit A
SP1219 ATTCATTCATTC5primeDNA gyrase subunit A
SP1222 TTTTTTTTmiddletype II restriction endonuclease, putative
SP1238 CCCCCC5primeexcinuclease ABC, subunit B
SP1260 CCCCCC3primecopper homeostasis protein CutC
SP1263 TTTTTTTT5primeDNA topoisomerase I
SP1264 AGAGAGAG5primeconserved domain protein
SP1267 AGAGAGAGmiddlelicC protein
SP1267 ATGATGATGATG5primelicC protein
SP1272 CTCTCTCTmiddlepolysaccharide biosynthesis protein, putative
SP1272 CTCTCTCT3primepolysaccharide biosynthesis protein, putative
SP1274 AAAAAAAA5primelicD2 protein
SP1283 CCCCCC3primeheat shock protein HtpX
SP1286 GGGGGGGG5primeuracil permease
SP1305 AAAAAAAAmiddlehypothetical protein
SP1311 GGGGGGGmiddleIS66 family element, Orf3, degenerate
SP1311 TCTCTCTC5primeIS66 family element, Orf3, degenerate
SP1316 GAGAGAGAGA3primev-type sodium ATP synthase, subunit B
SP1321 TATATATATpromotv-type sodium ATP synthase, subunit K
SP1326 TTTTTTTT5primeneuraminidase, putative
SP1336 CTCTCTCTmiddletype II DNA modification methyltransferase Spn5252IP
SP1340 AATAATAATAAT5primehypothetical protein
SP1340 CCCCCC5primehypothetical protein
SP1341 TTTTTTTTmiddleABC transporter, ATP-binding protein
SP1342 TATATATAT3primetoxin secretion ABC transporter, ATP-binding/permease protein
SP1344 TATATATAT3primeconserved hypothetical protein
SP1356 TCCTTCCTTCCT3primeAtz/Trz family protein
SP1356 TCTCTCTCmiddleAtz/Trz family protein
SP1358 CCCCCC3primeABC transporter, ATP-binding/permease protein
SP1361 TGTGTGTG3primehomoserine dehydrogenase
SP1361 TGATTGATTGATT5primehomoserine dehydrogenase
SP1364 ATTATTATTATTA5primehypothetical protein
SP1368 TCATCATCATCA3primepsr protein
SP1368 TCTCTCTC5primepsr protein
SP1374 CCCCCC5primechorismate synthase
SP1375 TTCTTTCTTTCTT5prime3-dehydroquinate synthase
SP1375 AGAGAGAG5prime3-dehydroquinate synthase
SP1380 GAGAGAGA3primehypothetical protein
SP1383 CGCGCGCGmiddlealanyl-tRNA synthetase
SP1392 CCCCCCmiddlealpha-acetolactate decarboxylase
SP1393 AAAAAAAApromotconserved hypothetical protein
SP1402 CCCCCC5primeNOL1/NOP2/sun family protein
SP1416 AAAAAAAA3primeS-adenosylmethionine:tRNA ribosyltransferase-isomerase
SP1417 TTTTTTTTmiddlepspC protein, degenerate
SP1430 AGAGAGAGpromottype II restriction endonuclease, putative, authentic point mutation
SP1431 TATATATAmiddletype II DNA modification methyltransferase, putative
SP1431 AAAAAAAA3primetype II DNA modification methyltransferase, putative
SP1441 GGGGGGmiddleIS66 family element, Orf3, degenerate
SP1445 GCTTGCTTGCTTmiddleGMP synthase
SP1450 CTCTCTCTCmiddleplatelet activating factor, putative
SP1457 TCTCTCTCT5primespoU rRNA methylase family protein
SP1458 CACACACAmiddlethioredoxin reductase
SP1465 AAAAAAAAAmiddlehypothetical protein
SP1478 TCTCTCTC3primeoxidoreductase, aldo/keto reductase family
SP1479 AGAGAGAG3primepeptidoglycan N-acetylglucosamine deacetylase A
SP1483 ACACACAC3primeATP-dependent RNA helicase, DEAD/DEAH box family
SP1492 CTCTCTCT3primecell wall surface anchor family protein
SP1493 TCTCTCTC5primehypothetical protein
SP1506 GGGGGG5primeconserved hypothetical protein
SP1506 CCCCCCmiddleconserved hypothetical protein
SP1519 TCTCTCTC3primeacetyltransferase, GNAT family
SP1526 CCCCCC3primeABC transporter, ATP-binding protein, authentic frameshift
SP1526 ATATATATmiddleABC transporter, ATP-binding protein, authentic frameshift
SP1529 CCCCCC3primepolysaccharide biosynthesis protein, putative
SP1547 AGAGAGAGAmiddleconserved hypothetical protein
SP1560 TTTTTTTT5primeconserved hypothetical protein
SP1561 TTTTTTTT3primeconserved hypothetical protein
SP1562 AAAAAAAApromothypothetical protein
SP1563 AAAAAAAApromotpyridine nucleotide-disulphide oxidoreductase family protein
SP1573 TTTTTTTTmiddlelysozyme
SP1576 CCCCCCmiddlehomoserine O-succinyltransferase
SP1577 TTTCTTTCTTTCpromotadenine phosphoribosyltransferase
SP1596 GGGGGGpromotIS3-Spn1, hypothetical protein, interruption
SP1604 CCCCCCCmiddlehypothetical protein
SP1604 TTTTTTTTTpromothypothetical protein
SP1605 TTTTTTTTTpromotferredoxin
SP1612 AAAAAAAA5primeconserved domain protein
SP1612 AATAAATAAATAmiddleconserved domain protein
SP1617 TTTTTTTTmiddlePTS system, IIC component
SP1617 CCCCCC5primePTS system, IIC component
SP1619 CTCTCTCTmiddlePTS system, IIA component
SP1621 TCTCTCTCmiddletranscription antiterminator BglG family protein, authentic frameshift
SP1623 CCCCCCmiddlecation-transporting ATPase, E1-E2 family
SP1623 TTTTTTTTpromotcation-transporting ATPase, E1-E2 family
SP1624 GGGGGGmiddleacyltransferase family protein
SP1625 AAAAAAAApromotcadmium resistance transporter, putative
SP1626 TTTTTTTT5primeribosomal protein S15
SP1631 CTCTCTCTmiddlethreonyl-tRNA synthetase
SP1645 TTGGTTGGTTGG3primeGTP pyrophosphokinase
SP1645 CCCCCC3primeGTP pyrophosphokinase
SP1648 TTTTTTTTTTpromotmanganese ABC transporter, ATP-binding protein
SP1652 CTCTCTCTmiddlehypothetical protein
SP1654 AACCAACCAACCAmiddleconserved hypothetical protein
SP1671 CTGACTGACTGA5primeD-alanine--D-alanine ligase
SP1681 TTTTTTTT5primesugar ABC transporter, permease protein
SP1686 CCCCCCmiddleoxidoreductase, Gfo/Idh/MocA family
SP1693 TTTTTTTT5primeneuraminidase A, authentic frameshift
SP1697 CCCCCC3primeATP-dependent DNA helicase RecG
SP1702 CCCCCCC5primepreprotein translocase, SecA subunit
SP1708 CCCCCCpromothypothetical protein
SP1709 ATATATAT5primephosphoglycerate dehydrogenase-related protein
SP1716 CCCCCC3primeconserved hypothetical protein
SP1731 CCCCCCmiddleconserved hypothetical protein
SP1737 TTCTTCTTCTTCTT3primeDNA-directed RNA polymerase, omega subunit, putative
SP1739 CTCTCTCT5primeKH domain protein
SP1747 CCCCCC5primeconserved hypothetical protein
SP1749 GGGGGG5primeGTP-binding protein
SP1751 TCTCTCTCmiddlemagnesium transporter, CorA family, putative
SP1761 CCCCCCmiddlehypothetical protein
SP1764 CCCCCCCC5primeglycosyl transferase, family 2
SP1766 CCCCCC5primeglycosyl transferase, family 8
SP1768 AAAAAAAAmiddleconserved hypothetical protein
SP1769 CTCTCTCTmiddleglycosyl transferase, authentic frameshift
SP1769 CCCCCCCCC5primeglycosyl transferase, authentic frameshift
SP1772 (TCAGCGTCGACAAGTGCGTCGGCC)5405prime
middle
3prime
cell wall surface anchor family protein
SP1799 TTTTTTTT5primesugar-binding transcriptional regulator, LacI family
SP1800 TCTCTCTC5primetranscriptional activator, putative
SP1820 CACACACAmiddlehypothetical protein
SP1821 TTTTTTTT5primesugar-binding transcriptional regulator, LacI family
SP1823 CTCTCTCT5primeMgtC/SapB family protein
SP1836 TCTCTCTCTmiddlehypothetical protein
SP1844 AAAAAAAAmiddlehypothetical protein
SP1850 AAAAAAAAmiddletype II restriction endonuclease DpnI
SP1852 CCCCCCmiddlegalactose-1-phosphate uridylyltransferase
SP1855 CCCCCC5primealcohol dehydrogenase, zinc-containing
SP1865 CCCCCCmiddleglutamyl-aminopeptidase
SP1871 GGGGGGmiddleiron-compound ABC transporter, ATP-binding protein
SP1872 CAAGCAAGCAAGmiddleiron-compound ABC transporter, iron-compound-binding protein
SP1875 AGAGAGAG5primeconserved hypothetical protein
SP1883 ACACACAC5primedextran glucosidase DexS, putative
SP1891 TTTTTTTT5primeoligopeptide ABC transporter, oligopeptide-binding protein AmiA
SP1892 TTTTTTTTpromothypothetical protein
SP1898 CCACCACCACCA3primealpha-galactosidase
SP1899 AAAAAAAAmiddlemsm operon regulatory protein
SP1914 TTTTTTTTmiddlehypothetical protein
SP1914 AAAAAAAA5primehypothetical protein
SP1934 CCCCCCpromothypothetical protein
SP1945 TCTCTCTCpromothypothetical protein
SP1948 GGGGGG3primeconserved domain protein
SP1949 GGGGGGpromothypothetical protein
SP1950 TTTTTTTTTpromotbacteriocin formation protein, putative
SP1951 GGGGGGmiddleconserved hypothetical protein
SP1952 AAAAAAAA5primehypothetical protein
SP1952 TTTTTTTTmiddlehypothetical protein
SP1954 AAAAAAAAA5primeserine protease, subtilase family, authentic frameshift
SP1955 ACACACACAmiddlehypothetical protein
SP1967 TTTTTTTT5primeconserved hypothetical protein
SP1968 TTTTTTTT3primephosphopantetheine adenylyltransferase
SP1968 AAAAAAAA5primephosphopantetheine adenylyltransferase
SP1969 CTCTCTCTC3primetype II DNA modification methyltransferase, putative
SP1971 AAAAAAAAmiddlehypothetical protein
SP1973 TTTTTTTT5primespoU rRNA Methylase family protein
SP1980 CATCATCATCAT3primecmp-binding-factor 1
SP1984 CGCGCGCG5primeconserved hypothetical protein TIGR00157
SP1994 CCCCCC3primeaminotransferase, class I
SP1997 AAAAAAAA5primeCof family protein
SP2006 CCCCCCmiddletranscriptional regulator ComX2
SP2017 TCTCTCTCT5primemembrane protein
SP2020 GAGAGAGA5primetranscriptional regulator, GntR family
SP2020 TAAATAAATAAATA5primetranscriptional regulator, GntR family
SP2021 AAAAAAAA5primeglycosyl hydrolase, family 1
SP2021 AGAGAGAGAmiddleglycosyl hydrolase, family 1
SP2054 ATTCATTCATTCpromotconserved hypothetical protein
SP2059 CCCCCC5primeconserved hypothetical protein
SP2064 TATATATAmiddlehydrolase, haloacid dehalogenase-like family
SP2067 ACACACAC3primehypothetical protein
SP2072 ATGAATGAATGA5primeglutamine amidotransferase, class-I
SP2072 TTTTTTTTpromotglutamine amidotransferase, class-I
SP2077 TTTTTTTT5primetranscriptional repressor, putative
SP2079 TTTTTTTTpromotIS1381, transposase OrfB
SP2086 CTCTCTCT5primephosphate ABC transporter, permease protein
SP2098 AGAGAGAG3primemembrane protein
SP2111 GGGGGG5primemalA protein
SP2114 CGCGCGCGmiddleaspartyl-tRNA synthetase
SP2117 CCCCCCpromothypothetical protein
SP2126 CGTCCGTCCGTC3primedihydroxy-acid dehydratase
SP2133 AAAAAAAA5primeconserved domain protein
SP2136 TTTTTTTT5primecholine binding protein PcpA
SP2136 TTTTTTTT5primecholine binding protein PcpA
SP2145 GGGGGG5primeantigen, cell wall surface anchor family
SP2159 GTGTGTGTmiddlefucolectin-related protein
SP2173 CCAACCAACCAA3primedltD protein
SP2173 CCCCCC5primedltD protein
SP2178 CCCCCCpromotconserved hypothetical protein, interruption
SP2182 GGGGGGmiddlehypothetical protein
SP2190 TTTTTTTTmiddlecholine binding protein A
SP2190 TTTTTTTT5primecholine binding protein A
SP2193 TCTCTCTCpromotDNA-binding response regulator
SP2195 CTCTCTCTpromottranscriptional regulator CtsR
SP2207 CTCTCTCT3primecompetence protein ComF, putative
SP2211 TCTCTCTCmiddleIS66 family element, Orf3, degenerate
SP2216 CTGCTGCTGCTGC3primesecreted 45 kd protein
SP2220 CCCCCC3primeABC transporter, ATP-binding protein
SP2221 CTCTCTCTmiddleABC transporter, ATP-binding protein
SP2223 TTTTTTTT5primeconserved hypothetical protein
SP2224 TCTCTCTC5primepeptidase, M16 family
SP2233 TTTTTTTTpromothypothetical protein
SP2236 GAGAGAGAmiddleputative sensor histidine kinase ComD
SP2240 GAGAGAGA5primespspoJ protein
(a) Iterative DNA motifs (k-nucleotide repeats), including homopolymeric tracts, were searched in the TIGR4 genome sequence using the REPEATS program [G. Benson, M. S. Waterman, Nucleic Acids Res22, 4828 (1994)]. The minimum length of homopolymeric tracts was set to 8 for A and T, and 6 for G and C; 4 tandem copies of di- and trinucleotides; and 3 copies of tetra-, penta- and hexanucleotides. Heptanucleotides and above were not found in 3 or more copies, except for the imperfect repeats in SP1772. The ratio of observed frequency of homopolymeric tracts versus their expected frequency was performed by means of Markov chains analysis as described [N. J. Saunders et al., Mol Microbiol37, 207 (2000)]. It revealed that G or C tracts of size 8 and A or T tracts of size 10 and 11 bp are slightly over-represented.


Supplemental Table 6. Comparative genome hybridizations (a).
Gene absent in strain D39Gene absent in strain R6Description
SP0067 SP0067 hypothetical protein
SP0069 SP0069 choline binding protein I
SP0071 SP0071 immunoglobulin A1 protease
SP0074 acetyltransferase, CysE/LacA/LpxA/NodL family
SP0163 transcriptional regulator PlcR, putative
SP0165 SP0165 flavoprotein
SP0166 SP0166 pyridoxal-dependent decarboxylase, Orn/Lys/Arg family
SP0167 hypothetical protein
SP0168 SP0168 macrolide efflux protein, putative
SP0298 conserved hypothetical protein
SP0328 IS1380-Spn1, transposase
SP0343 IS1380-Spn1, transposase
SP0347 capsular polysaccharide biosynthesis protein Cps4B
SP0349 capsular polysaccharide biosynthesis protein Cps4D
SP0350 SP0350 capsular polysaccharide biosynthesis protein Cps4E
SP0351 SP0351 capsular polysaccharide biosynthesis protein Cps4F
SP0352 SP0352 capsular polysaccharide biosynthesis protein Cps4G
SP0353 SP0353 capsular polysaccharide biosynthesis protein Cps4H
SP0354 SP0354 hypothetical protein
SP0355 SP0355 hypothetical protein
SP0356 SP0356 O-antigen transporter RfbX, putative
SP0357 SP0357 UDP-N-acetylglucosamine-2-epimerase
SP0358 SP0358 capsular polysaccharide biosynthesis protein cps4J
SP0379 conserved hypothetical protein
SP0380 hypothetical protein
SP0460 IS1167, transposase
SP0461 SP0461 transcriptional regulator, putative
SP0463 SP0463 cell wall surface anchor family protein
SP0464 cell wall surface anchor family protein
SP0466 SP0466 sortase, putative
SP0467 SP0467 sortase, putative
SP0468 SP0468 sortase, putative
SP0495 IS1380-Spn1, transposase
SP0539 bacteriocin BlpM
SP0544 immunity protein BlpX
SP0666 conserved hypothetical protein
SP0714 IS1380-Spn1, transposase
SP0826 hypothetical protein
SP0889 hypothetical protein
SP0890 SP0890 integrase/recombinase, phage integrase family
SP0891 SP0891 type I restriction-modification system, S subunit, putative
SP1055 SP1055 Tn5252, Orf 9 protein
SP1056 SP1056 Tn5252, relaxase
SP1057 SP1057 transcriptional regulator PlcR, putative
SP1059 SP1059 hypothetical protein
SP1061 SP1061 protein kinase, putative
SP1062 ABC transporter, ATP-binding protein
SP1063 SP1063 ABC-2 transporter, permease protein, putative
SP1129 integrase/recombinase, phage integrase family
SP1130 SP1130 transcriptional regulator
SP1131 transcriptional regulator, putative
SP1132 hypothetical protein
SP1134 SP1134 hypothetical protein
SP1135 SP1135 hypothetical protein
SP1136 SP1136 conserved domain protein
SP1137 SP1137 GTP-binding protein, putative
SP1139 SP1139 hypothetical protein
SP1141 hypothetical protein
SP1142 hypothetical protein
SP1143 conserved hypothetical protein
SP1189 hypothetical protein
SP1292 SAP domain protein
SP1315 v-type sodium ATP synthase, subunit D
SP1316 SP1316 v-type sodium ATP synthase, subunit B
SP1317 SP1317 v-type sodium ATP synthase, subunit A
SP1319 SP1319 v-type sodium ATP synthase, subunit C
SP1320 v-type sodium ATP synthase, subunit E
SP1321 SP1321 v-type sodium ATP synthase, subunit K
SP1322 SP1322 v-type sodium ATP synthase, subunit I
SP1324 SP1324 ROK family protein
SP1325 SP1325 oxidoreductase, Gfo/Idh/MocA family
SP1326 SP1326 neuraminidase, putative
SP1327 conserved hypothetical protein
SP1329 SP1329 N-acetylneuraminate lyase
SP1330 SP1330 N-acetylmannosamine-6-P epimerase, putative
SP1331 SP1331 phosphosugar-binding transcriptional regulator, putative
SP1336 type II DNA modification methyltransferase Spn5252IP
SP1337 IS1380-Spn1, transposase
SP1352 SP1352 IS1380-Spn1, transposase
SP1439 IS1380-Spn1, transposase
SP1503 SP1503 IS1380-Spn1, transposase
SP1616 ribulose-phosphate 3-epimerase family protein
SP1617 PTS system, IIC component
SP1618 PTS system, IIB component
SP1619 PTS system, IIA component
SP1620 PTS system, nitrogen regulatory component IIA, putative
SP1621 transcription antiterminator BglG family protein, authentic frameshift
SP1622 transposase, IS200 family
SP1755 hypothetical protein
SP1757 conserved hypothetical protein
SP1758 SP1758 glycosyl transferase, group 1
SP1759 SP1759 preprotein translocase, SecA subunit
SP1760 SP1760 conserved domain protein
SP1761 hypothetical protein
SP1762 SP1762 hypothetical protein
SP1763 SP1763 preprotein translocase SecY family protein
SP1764 SP1764 glycosyl transferase, family 2
SP1765 SP1765 glycosyl transferase, family 8
SP1766 SP1766 glycosyl transferase, family 8
SP1770 SP1770 glycosyl transferase, family 8
SP1771 SP1771 glycosyl transferase, family 2/glycosyl transferase family 8
SP1772 SP1772 cell wall surface anchor family protein
SP1793 hypothetical protein
SP1796 ABC transporter, substrate-binding protein
SP1797 ABC transporter, permease protein
(a) This method is used to identify genomic differences between the TIGR4 isolate and strains R6 and D39. All the predicted genes from the TIGR4 isolate were amplified by PCR and arrayed on glass microscope slides as previously described [S. Peterson, R. T. Cline, H. Tettelin, V. Sharov, D. A. Morrison, J Bacteriol182, 6192 (2000)]. Genomic DNA for comparative genome hybridization studies was labeled according to protocols provided by J. DeRisi (www.microarrays.org/Pdfs/GenomicDNALabel_B.pdf) except that genomic DNA was not digested or sheared prior to labeling. Arrays were scanned using a GenePix 4000B scanner from Axon Inc. and individual hybridization signals quantitated using TIGR SPOTFINDER [P. Hegde et al., Biotechniques29, 548 (2000)].


Supplemental Table 7. Regions of atypical nucleotide composition (a).
Score: 895 (a), %GC: 49.5
SP0014 transcriptional regulator ComX1
SP0015 IS630-Spn1, transposase Orf1
SP0016 IS630-Spn1, transposase Orf2
Score: 749.6, %GC: 29.5
SP0131 IS630-Spn1, transposase Orf2, degenerate
SP0132 IS630-Spn1, transposase Orf1, degenerate
SP0133 hypothetical protein
SP0134 hypothetical protein
SP0135 glycosyl transferase, putative
SP0136 glycosyl transferase, family 2
SP0137 ABC transporter, ATP-binding protein
SP0138 hypothetical protein
SP0139 conserved domain protein
SP0140 UDP-glucose 6-dehydrogenase, authentic frameshift
SP0141 transcriptional regulator
Score: 938.4, %GC: 28.1
SP0163 transcriptional regulator PlcR, putative
SP0164 hypothetical protein
SP0165 flavoprotein
SP0166 pyridoxal-dependent decarboxylase, Orn/Lys/Arg family
SP0167 hypothetical protein
SP0168 macrolide efflux protein, putative
SP0169 lactose phosphotransferase system repressor, degenerate
SP0170 hypothetical protein
SP0171 ROK family protein
SP0172 hypothetical protein
SP0173 DNA mismatch repair protein HexB
Score: 645.5, %GC: 43.1
SP0210 ribosomal protein L4
SP0211 ribosomal protein L23
SP0212 ribosomal protein L2
SP0213 ribosomal protein S19
SP0214 ribosomal protein L22
SP0215 ribosomal protein S3
SP0216 ribosomal protein L16
SP0217 ribosomal protein L29
SP0218 ribosomal protein S17
Score: 651, %GC: 29.9
SP0350 capsular polysaccharide biosynthesis protein Cps4E
SP0351 capsular polysaccharide biosynthesis protein Cps4F
SP0352 capsular polysaccharide biosynthesis protein Cps4G
SP0353 capsular polysaccharide biosynthesis protein Cps4H
SP0354 hypothetical protein
Score: 663, %GC: 29.8
SP0568 valyl-tRNA synthetase
SP0569 type II DNA modification methyltransferase, truncation
SP0570 conserved domain protein
SP0571 cell filamentation protein Fic-related protein
Score: 626, %GC: 30.4
SP0575 helicase, putative
SP0576 transcription antiterminator Lict
SP0577 PTS system, beta-glucosides-specific IIABC components
Score: 620.5, %GC: 31.8
SP0664 zinc metalloprotease ZmpB, putative
SP0665 chorismate binding enzyme
Score: 947.7, %GC: 29.1
SP0690 cell division protein DivIB
SP0691 hypothetical protein
SP0692 hypothetical protein
SP0693 hypothetical protein
SP0694 conserved domain protein
SP0695 HesA/MoeB/ThiF family protein
SP0696 hypothetical protein
SP0697 ABC transporter, ATP-binding protein, authentic point mutation
SP0698 hypothetical protein
SP0699 hypothetical protein
SP0700 transposase, IS30 family, degenerate
Score: 744.7, %GC: 29.8
SP1029 RNA methyltransferase, TrmA family
SP1030 conserved hypothetical protein
SP1031 hypothetical protein
SP1032 iron-compound ABC transporter, iron compound-binding protein
SP1033 iron-compound ABC transporter, permease protein
SP1034 iron-compound ABC transporter, permease protein
SP1035 iron-compound ABC transporter, ATP-binding protein
SP1036 hypothetical protein
SP1037 type II restriction endonuclease, putative
SP1038 hypothetical protein
SP1039 hypothetical protein
SP1040 site-specific recombinase, resolvase family
Score: 921.4, %GC: 28.3
SP1056 Tn5252, relaxase
SP1057 transcriptional regulator PlcR, putative
SP1058 hypothetical protein
SP1059 hypothetical protein
SP1060 hypothetical protein
SP1061 protein kinase, putative
SP1062 ABC transporter, ATP-binding protein
SP1063 ABC-2 transporter, permease protein, putative
SP1064 transposase, IS200 family
Score: 647, %GC: 30.4
SP1129 integrase/recombinase, phage integrase family
SP1130 transcriptional regulator
SP1131 transcriptional regulator, putative
SP1132 hypothetical protein
SP1133 hypothetical protein
SP1134 hypothetical protein
Score: 721, %GC: 29.2
SP1317 v-type sodium ATP synthase, subunit A
SP1318 v-type sodium ATP synthase, subunit G
SP1319 v-type sodium ATP synthase, subunit C
SP1320 v-type sodium ATP synthase, subunit E
SP1321 v-type sodium ATP synthase, subunit K
Score: 732.8, %GC: 29.4
SP1337 IS1380-Spn1, transposase
SP1338 hypothetical protein
SP1339 hypothetical protein
SP1340 hypothetical protein
SP1341 ABC transporter, ATP-binding protein
SP1342 toxin secretion ABC transporter, ATP-binding/permease protein
SP1343 prolyl oligopeptidase family protein
Score: 639, %GC: 31.2
SP1422 hypothetical protein
SP1423 transcriptional repressor, putative
SP1424 hypothetical protein
SP1425 hypothetical protein
SP1426 ABC transporter, ATP-binding protein
SP1427 peptidase, U32 family
SP1428 conserved hypothetical protein
SP1429 peptidase, U32 family
SP1430 type II restriction endonuclease, putative, authentic point mutation
SP1431 type II DNA modification methyltransferase, putative
SP1432 hypothetical protein
SP1433 transcriptional regulator, araC family
SP1434 ABC transporter, ATP-binding/permease protein
SP1435 ABC transporter, ATP-binding protein
SP1436 hypothetical protein
SP1437 conserved domain protein
SP1438 ABC transporter, ATP-binding protein
SP1439 IS1380-Spn1, transposase
Score: 3152.6, %GC: 54.7
SP1769 glycosyl transferase, putative, authentic frameshift
SP1770 glycosyl transferase, family 8
SP1771 glycosyl transferase, family 2/glycosyl transferase family 8
SP1772 cell wall surface anchor family protein
Score: 614.5, %GC: 29.8
SP1799 sugar-binding transcriptional regulator, LacI family
SP1800 transcriptional activator, putative
SP1801 conserved hypothetical protein
SP1802 hypothetical protein
Score: 697, %GC: 29.4
SP1819 hypothetical protein
SP1820 hypothetical protein
SP1821 sugar-binding transcriptional regulator, LacI family
SP1822 conserved domain protein
SP1823 MgtC/SapB family protein
SP1824 ABC transporter, permease protein
Score: 883.8, %GC: 28.4
SP1828 UDP-glucose 4-epimerase
SP1829 galactose-1-phosphate uridylyltransferase
SP1830 phosphate transport system regulatory protein PhoU, putative
SP1831 hypothetical protein
SP1832 hypothetical protein
SP1833 cell wall surface anchor family protein
Score: 978.4, %GC: 49.8
SP1900 BirA bifunctional protein
SP1901 RNA methyltransferase, TrmA family
Score: 694, %GC: 30.1
SP1946 transcriptional regulator PlcR, putative
SP1947 hypothetical protein
SP1948 conserved domain protein
SP1949 hypothetical protein
SP1950 bacteriocin formation protein, putative
SP1951 conserved hypothetical protein
SP1952 hypothetical protein
SP1953 toxin secretion ABC transporter, ATP-binding/permease protein
SP1954 serine protease, subtilase family, authentic frameshift
SP1955 hypothetical protein
SP1956 hypothetical protein
Score: 657, %GC: 45.5
SP1961 DNA-directed RNA polymerase, beta subunit
Score: 914.1, %GC: 49.5
SP2005 hypothetical protein
SP2006 transcriptional regulator ComX2
SP2007 transcription antitermination protein NusG
Score: 939.1, %GC: 49.7
SP2067 hypothetical protein
SP2068 cytidine/deoxycytidylate deaminase family protein
SP2069 glutamyl-tRNA synthetase
Score: 700, %GC: 30.3
SP2136 choline binding protein PcpA
SP2137 IS1381, transposase OrfA, internal deletion
(a) Regions of atypical nucleotide composition were identified by the x2 analysis: the distribution of all 64 trinucleotides (3mers) was computed for the complete genome in all 6 reading frames, followed by the 3mer distribution in 2000 bp windows. Windows overlapped by 1500 bp. For each window, the x2 statistic on the difference between its 3mer content and that of the whole genome was computed. The most atypical regions, with a score of 600 and above, were considered in this analysis.