Report

In Silico Mapping of Complex Disease-Related Traits in Mice

See allHide authors and affiliations

Science  08 Jun 2001:
Vol. 292, Issue 5523, pp. 1915-1918
DOI: 10.1126/science.1058889

Abstract

Experimental murine genetic models of complex human disease show great potential for understanding human disease pathogenesis. To reduce the time required for analysis of such models from many months down to milliseconds, a computational method for predicting chromosomal regions regulating phenotypic traits and a murine database of single nucleotide polymorphisms were developed. After entry of phenotypic information obtained from inbred mouse strains, the phenotypic and genotypic information is analyzed in silico to predict the chromosomal regions regulating the phenotypic trait.

Identification of genetic susceptibility loci has promised insight into pathophysiologic mechanisms and the development of therapies for common human diseases. Analysis of experimental murine genetic models of human disease biology should greatly facilitate identification of genetic susceptibility loci for common human diseases. We present a computational method that markedly accelerates genetic analysis of murine disease models. A linkage prediction program scans a murine single nucleotide polymorphism (SNP) database and, only on the basis of known inbred strain phenotypes and genotypes, predicts the chromosomal regions that most likely contribute to complex traits. The computational prediction method does not require generation and analysis of experimental intercross progeny, but it correctly predicted the chromosomal regions identified by analysis of experimental intercross populations for multiple traits analyzed.

A Web-accessible database was developed, which contains allele information across 15 inbred strains and specifies genotyping assays for over 500 SNPs at defined locations in the mouse genome (http://mouseSNP.roche.com). These SNPs were identified in our laboratories by direct sequencing of polymerase chain reaction (PCR) amplification products from defined chromosomal locations. This database also incorporates published allele information for 2848 SNPs, 45% of which are characterized in a subset of Mus musculusstrains; 55% of the SNPs are polymorphic between Mus castaneus and one or more M. musculus subspecies (1). User queries regarding SNPs found within a specified chromosomal region or between selected inbred strains are executed in real time and provided through a graphical user interface. The oligonucleotide primer sequences and conditions for performing allele-specific kinetic PCR genotyping assays (2) are also provided in the mSNP database [see supplemental material (3)].

To demonstrate the utility of this information, the genome of pooled DNA samples obtained from intercross progeny was analyzed by two different genotyping methods. At 16 weeks of age, the 1000 F2 progeny of a C57BL/6 × B6D2 intercross display a non–sex linked, normal distribution of bone mineral density (BMD) (4). Phenotypically extreme F2progeny with the highest (n = 150 mice) and lowest (n = 149 mice ) BMD (top and bottom 15%, respectively), were subjected to a whole-genome scan for association with BMD by genotyping individual DNA samples with 112 microsatellite markers. In addition, equal amounts of DNA from the high and low BMD F2 progeny was used to form two pools of DNA samples. Allele frequencies in the pooled samples were measured for 109 SNPs found in the mSNP database with the use of the previously described allele-specific kinetic PCR method (2). Differences in allele frequency between the two extremes for each marker were scored. If a marker has no association with BMD, its expected frequency is 50% for both extremes. The significance of each allele-frequency difference was calculated using the z-test and plotted as a lod score (a logarithm of the odds ratio for linkage) (Fig. 1). A significant association (lod score > 3.3) was found for four regions on chromosomes 1, 2, 4, and 11 by the microsatellite and SNP genotyping methods. SNP-based genotyping identified a linkage region near the centromere of chromosome 13, which was not found using microsatellite markers. Two SNP markers (2.2 and 6.6 cM) were more proximal to the centromere of chromosome 13 than the most proximal (10 cM) microsatellite marker used for genotyping the intercross progeny. This region is being investigated with additional markers.

Figure 1

Comparison of SNP-based genotyping of pooled DNA samples with microsatellite genotyping of individual DNA samples. Phenotypically extreme F2 progeny from a B6D2 intercross with the highest and lowest BMD were subjected to whole-genome scanning for association with BMD by genotyping either individual DNA samples (from 299 mice) with 112 microsatellite markers or two pooled DNA samples (150 mice per pool) with 109 SNP markers. The significance of each allele-frequency difference was calculated using the z-test and plotted as a lod score for all chromosomes. Dashed line indicates a lod score of 3.3, the threshold for genome-wide significance.

SNP-based genotyping of pooled samples required about 20-fold fewer PCR reactions and was performed much more quickly than microsatellite genotyping of individual DNA samples. Replicate determinations (four times) were performed here to assess the reproducibility of the SNP-allele frequency determination and measurement error. On average, the standard deviation in allele frequency measurement was ±1.7%. In the future, it should be possible to reduce the number of replicate PCR assays.

We wanted to determine whether chromosomal regions regulating quantitative traits (QTL intervals) could be computationally predicted with the use of the mSNP database and available phenotypic information on inbred strains. Using the allelic distributions across inbred strains contained in the mSNP database, the computational method calculates genotypic distances between loci for a pair of mouse strains. These genotypic distances are then compared with phenotypic differences between the two mouse strains. The process is repeated for all mouse strain pairs for which phenotypic information is available. Lastly, a correlation value is derived using linear regression on the phenotypic and genotypic distances for each genomic locus.

As a first example, we used the computational method to predict the chromosomal location of the major histocompatibility complex (MHC) complex, which has been mapped to murine chromosome 17, using the known H2 haplotypes for the MHC K locus for 10 inbred strains (5). Phenotypic distances for strains that shared a haplotype were set to zero, and a distance of one was used for strains of different haplotypes. The SNPs within and near the MHC region had a genotypic distribution that was highly correlated with the phenotypic distances; the correlation value for this interval was 5.3 standard deviations above the average for all loci analyzed. No other peaks in the mouse genome exhibited a comparable correlation with this phenotype (Fig. 2A). This computational analysis, which required less than 1 s to run on a standard desktop computer, excluded 98% of the mouse genome from consideration without missing the genomic region known to contain the MHC.

Figure 2

Computational prediction of chromosomal regions regulating (A) MHC haplotype and (B) airway hyperresponsiveness. The correlation between the genotypic and phenotypic distributions is graphically shown for each trait; segments are arranged from centromeric to telomeric for all 19 autosomes. Each bar represents a 30-cM interval, and neighboring bars are offset by 10 cM. The dotted line represents a useful cutoff for analyzing this data; the most highly correlated 10% of the loci are above this line. Striped bars represent locations of experimentally verified QTLs.

In addition to the MHC locus, we tested the computational method using nine quantitative traits known from published studies that provided mapped QTL intervals and phenotypic data across multiple inbred strains for each trait (Table 1) (3). The ability of this algorithm to identify chromosomal regions regulating susceptibility to experimental allergic asthma was investigated. Analysis of intercross progeny between susceptible (A/J) and resistant (C3H/HeJ) mouse strains identified a QTL interval on chromosome 2 and a suggested interval on chromosome 7 (6). Analysis of a different experimental intercross identified QTL intervals on chromosomes 10 and 11 (7). Phenotypic measurements for allergen-induced airway hyperresponsiveness (AHR) in four inbred strains was used for a computational genome scan. The experimentally identified QTL intervals on chromosome 2, 7, 10, and 11 were among the strongest peaks identified by the computational genome scan (Fig. 2B). The computational method excluded 85% of the mouse genome from consideration without missing the experimentally mapped QTL regions.

Table 1

Comparison between experimentally identified QTL intervals with computationally predicted chromosomal regions for 10 phenotypic traits. The experimentally identified QTL intervals and phenotypic information used for computational prediction are described in the references indicated and are summarized in supplementary tables 1 and 2 (3). PKC, protein kinase C; Exp., total number of experimentally verified QTL intervals; Correct, number of computationally predicted regions that overlap with the experimentally verified locus; Predicted, total number of predicted regions for each phenotype; Cutoff, percentage of the mouse genome included within the computationally predicted regions.

View this table:

The ability of the computational method to correctly predict chromosomal regions containing experimentally verified QTL intervals was evaluated using 10 phenotypic traits (Table 1) (3). The percentage of correct predictions was characterized as a function of the percentage of the mouse genome contained within the predicted chromosomal regions. If predicted regions contained 10% of the mouse genome (by selecting 10% of the peaks with the highest correlation), then 15 of the 26 experimentally verified QTL intervals were correctly identified. As the threshold was raised, limiting the number of predicted candidate regions, more experimentally verified QTL intervals were missed. In summary, at cutoff values ranging from 2 to 16%, 19 of 26 experimentally verified QTL intervals regulating 10 phenotypic traits were correctly identified (Table 1).

We applied a Fisher Exact test to assess the significance of the computational predictions. The average size of a predicted genomic region was 38 cM, segmenting the 1500-cM mouse genome into 40 regions. Therefore, a total of 400 genomic intervals were analyzed for the 10 quantitative traits examined. At a 10% genome-wide threshold, the computational method correctly identified 15 (true positive) and missed 11 (false negative) of the 26 experimentally verified QTL intervals. The algorithm further predicted that 24 genomic intervals (false positive) contributed to a phenotypic trait where no QTL had yet been experimentally characterized, and the predictions agreed with available experimental data that 350 regions (true negative) were not QTL intervals for the 10 phenotypes examined. The Fisher Exact test yields a highly significant P value (1.0 × 10−10), confirming significant agreement between the computationally predicted and experimentally determined chromosomal regions.

Computational analysis of the murine SNP database using phenotypic data from inbred parental strains rapidly identifies candidate QTL intervals. This can eliminate many months to years of laboratory work required to generate, characterize, and genotype intercross progeny, reducing the time required for QTL interval identification to milliseconds. In addition to its rapidity and low cost, the computational prediction method has a substantial advantage over QTL analysis using intercross progeny or recombinant inbred strains (8). Because it performs multiple comparisons across a range of inbred strains, the computational method takes advantage of the total genetic variation provided by available inbred mouse strains.

The ability of the computational genome scan to perform whole-genome association studies using the mouse SNP database indicates that linkage disequilibrium may extend over large regions among inbred mouse strains. Our computational results were unexpected because the number of different inbred strains for which phenotypic data was available (4 to 10) was quite limited. Positional cloning and case-control studies in human populations are routinely performed with hundreds to thousands of individuals (9). Several factors contribute to the successful QTL predictions by computational scanning of the mouse SNP database. The use of inbred mouse strains limits variability due to environment, and timed experimental intervention and sampling limits error in phenotypic assessment. The inbred strains are homozygous at all loci, which eliminates confounding effects due to heterozygosity found in human populations.

Recently, there has been increased emphasis on using chemical mutagenesis in the mouse as a method for studying complex biology. This has occurred as a result of the difficulties noted by investigators using standard methods for QTL analysis [reviewed in (10)]. However, these studies can be markedly accelerated by application of the genotyping method and computational tools described here. Of course, specific gene candidates must be identified to understand the genetic basis of complex disease. We have already shown how integration of gene expression data obtained with high-density oligonucleotide microarrays can be used in conjunction with the SNP genotyping method to accelerate QTL analysis (11). Therefore, databases with tissue-specific gene expression and phenotypic information across mouse strains could be used in conjunction with the murine SNP database to computationally identify candidate disease genes. In a hypothetical experiment, the expression of 40,000 murine genes in an affected tissue obtained from different mouse strains can be profiled. As many as 1% of the genes will be reliably demonstrated to be differentially expressed in the tissue of the mouse strains with a different phenotype. The resulting list of 400 gene candidates could be computationally reduced by 90% by searching for genes that are encoded within computationally predicted chromosomal regions, providing a reasonable starting point for analysis of complex disease biology. The application of this approach should reduce the frustrations and overcome the difficulties associated with QTL analysis in murine complex disease models.

  • * These authors contributed equally to this work.

  • To whom correspondence should be addressed: gary.peltz{at}roche.com

REFERENCES AND NOTES

View Abstract

Stay Connected to Science

Navigate This Article