A National Tuberculosis Archive

Science  03 Mar 2006:
Vol. 311, Issue 5765, pp. 1245-1246
DOI: 10.1126/science.1125762

Currently, no disease has the type of large-scale, systematic biological and informatic integration that permits researchers to cross easily between field-relevant and research-relevant isolates in the context of clinical, epidemiological, and phylogenetic characterizations. This is due, in large part, to the intense demands systematic data collection and organization place on clinicians and the public health apparatus. However, the complete population-based data collection infrastructure necessary for such a resource is already in place for tuberculosis (TB) in the United States.

About one-third of the world's population is infected with Mycobacterium tuberculosis (MTB) (1). TB disproportionately burdens the world's poorest countries (2, 3). [HN1] The threat of emerging multidrug-resistant (MDR) strains (4) is severe. [HN2] The number of TB cases in the United States is relatively small: just under 15,000 per year (5). Yet TB is fundamentally a “transnational” disease, with more than half of all U.S. cases occurring in non-U.S.-born persons (5). Schwartzman et al. (6) [HN3] estimate that under current practices the United States will spend about $2 billion over the next 20 years just treating immigrants from Mexico. And although “only” 15,000 cases is a public health success story compared with historic epidemics, indolence in efforts to combat the disease would be unwise (7). It is estimated that cutbacks in TB-related resources in the late 1970s and 1980s contributed to a resurgence in TB among predominately immunocompromised and socially marginalized patients that cost more than $1 billion to control in New York City alone (8). [HN4]

Every verified TB case in the United States is reported to the Centers for Disease Control and Prevention (CDC), along with clinical and epidemiological information, in a document called the Report of Verified Case of Tuberculosis (RVCT) [HN5] (9). In 2004, CDC began a program to genotype a MTB isolate from every patient reported in the United States under its TB Universal Genotyping Program [HN6] (10, 11). Other laboratories already have substantial information on strains from countries in which epidemiologic trends are well described (12) or drug-resistant MTB is epidemic (13, 14). The genome of MTB has been sequenced (15). [HN7] Collections of genotypic, epidemiological, and/or clinical data are available in electronic databases but are not integrated, and phylogenetic data relating strains are incomplete. What is missing is an integrated, comprehensive, population-based biologic and informatic resource that can drive evidence-based decision-making.

We propose creation of a National Tuberculosis Archive, a comprehensive repository of characterized M. tuberculosis isolates along with their genomic, clinical, and epidemiological data (see figure, this page). Such an integrated resource would close the loop between clinical isolates and research data, allowing users to search on metadata criteria and to obtain samples of isolates matching field-relevant criteria. Molecular variation could be readily linked with phenotypic characteristics, and geographic distribution with temporal sampling. Bench scientists could explore fundamental questions about the relation between molecular variation and clinical consequences, health-care providers could alter patient care on the basis of strain-specific pathogen properties, and public health officials could track outbreaks across jurisdictions and back through time. Disparate data would be integrated in a Web-accessible platform for easy access.

Differences between the current configuration of clinical isolates, research strains, and data and the proposed National Tuberculosis Archive.

(Bottom) Current unintegrated configuration.


Archiving etiologic material along with an integrated information resource has previously proved to be a prescient step in public health preparedness, as was seen in the 1993 hantavirus epidemic [HN8] when museum archives of rodent sera and tissue samples were crucial in demonstrating that the virus had been widely endemic for years (1618). This gave public health policy-makers invaluable baseline information to determine appropriate and targeted responses, while removing biowarfare concerns.

Results from prior molecular epidemiologically based efforts [HN9] are a harbinger of the value of a comprehensive national archive for TB. A population biologic analysis of 10 years of data in San Francisco suggests that strains of M. tuberculosis may spread more efficiently in human populations when they are within the sympatric populations in which they evolved (19). [HN10] So knowing an outbreak's characteristic molecular and phylogenetic signature can help in identifying new human ethnic groups at risk. A clinical study in New York City suggests that patients afflicted with specific clades of bacteria manifest a more profound disease (20, 21). [HN11] Other public health jurisdictions are seeing the full extent of unsuspected transmission and the need for new interventions (22). For the MDRTB outbreaks caused by strain W [HN12] in New York in the early 1990s, availability of archived samples linked to public health surveillance data enabled investigators to identify the origin of strain W, trace its acquisition of drug resistances, track its spread in New York City and around the country, and develop public health control measures (8, 23, 24).

The RVCT-based public health infrastructure and CDC Universal Tuberculosis Genotyping Program are already in place. We estimate the cost of integration for TB to be $15 million over 3 years.

Because M. tuberculosis is a human pathogen, but a poor candidate for bioterrorism, it is an excellent pilot for a more systematic program of human pathogen socioecological-genomic characterization. Improvements in disaster preparedness will result from a more focused and thoughtful integration of science, medicine, and public health.

