Selection at Linked Sites Shapes Heritable Phenotypic Variation in C. elegans

See allHide authors and affiliations

Science  15 Oct 2010:
Vol. 330, Issue 6002, pp. 372-376
DOI: 10.1126/science.1194208


Mutation generates the heritable variation that genetic drift and natural selection shape. In classical quantitative genetic models, drift is a function of the effective population size and acts uniformly across traits, whereas mutation and selection act trait-specifically. We identified thousands of quantitative trait loci (QTLs) influencing transcript abundance traits in a cross of two Caenorhabditis elegans strains; although trait-specific mutation and selection explained some of the observed pattern of QTL distribution, the pattern was better explained by trait-independent variation in the intensity of selection on linked sites. Our results suggest that traits in C. elegans exhibit different levels of variation less because of their own attributes than because of differences in the effective population sizes of the genomic regions harboring their underlying loci.

Some phenotypes exhibit abundant heritable variation and others almost none. As heritable variation is the raw material for adaptation, the forces that shape its distribution across traits are a central concern of evolutionary genetics (1). Among wild strains of the partially selfing nematode Caenorhabditis elegans, transcript abundance traits—model quantitative phenotypes (27)—differ in their levels of heritable variation (4, 8) and, on the basis of experimental measurements of the rate at which mutation increases their variance, they exhibit lower levels of heritable variation than expected under neutral mutation-drift equilibrium (4). These findings and similar results in other species are consistent with the prediction that trait-dependent stabilizing selection should result in different levels of variation among traits (37).

To genetically dissect the causes of different variabilities among C. elegans traits, we measured transcript abundances by microarray in developmentally synchronized young adult hermaphrodites of 208 recombinant inbred advanced intercross lines from a cross between the laboratory strain, N2, and a wild isolate from Hawaii, CB4856 (9). These strains, though relatively divergent for C. elegans, are closely related, differing at roughly 1 base pair per 900 (10). Each line was genotyped at 1455 single-nucleotide polymorphism (SNP) markers. Interval mapping for each of 15,888 traits identified 2309 quantitative trait loci (QTLs) at a false discovery rate (FDR) of 5% (Fig. 1A) (11).

Fig. 1

(A) QTLs for each transcript abundance phenotype, significant at an FDR of 5%, are plotted in rows located at the genomic positions of the transcripts. Gray bars represent 1-lod support intervals. The diagonal includes local QTLs, those that colocalize with the transcript they affect. Three robust QTL hotspots are indicated with arrows. (B) Local lod score is plotted for each probe at its physical position along the chromosomes. Points in blue are significant at a 5% FDR according to a single-marker linkage test. Points are scaled according to the fraction of variance in transcript abundance explained by the local QTL.

The majority of QTLs (65%) are local; that is, these QTLs occur at the genomic locations of the genes whose transcript abundances they influence [the spatial coincidence is defined here by overlap between the l-lod (logarithm of the odds ratio for linkage) QTL support interval and the gene]. Nearly a quarter of the remaining QTLs (distant QTLs) map to three statistically robust hotspots (11) (Fig. 1A and fig. S1). The X-linked hotspot encompasses more than a megabase and probably contains multiple causal variants, one of which may be the known pleiotropic mutation of phenylalanine to valine at residue 215 in the neuropeptide receptor npr-1 (12). Candidate genes for the other hotspots include Y17G9B.8, a putative component of a chromatin regulatory complex whose transcript abundance maps strongly to a local QTL at its position in the hotspot on the left side of chromosome IV, and Y105C5A.15, a putative zinc-finger transcription factor whose transcript abundance maps locally to a QTL at its position in the hotspot on the right side of chromosome IV.

The global distribution of QTLs is markedly nonuniform. Both local and distant QTLs are strongly enriched in the arms of the chromosomes relative to the centers (Table 1). C. elegans lacks heterochromatic centromeres, and the chromosomes are structured in semidiscrete domains that exhibit correlated variation in gene density, evolutionary conservation, repeat sequence density, and recombination rate (9, 13, 14). The chromosomal centers have high gene density and low recombination rates, whereas chromosome arms have lower gene density and higher recombination rates. Chromosome tips have an intermediate gene density but effectively no recombination (9). Under a simple mutational null model, QTL density is expected to correlate with the density of potentially functional sites and hence to be higher in chromosomal centers than in arms, contrary to the observed pattern. Furthermore, as QTL detection is most favored in low-recombination areas (15, 16), the observed pattern also runs counter to the expected effect of mapping bias.

Table 1

Both distant and local QTLs are overrepresented in chromosome arms relative to centers.

View this table:

The chromosomal patterning of causal variants is particularly pronounced for local QTLs, which we confirmed in a focused single-marker analysis (17), which increased detection power over our initial genome scan. We identified 2538 transcripts affected by QTLs that are linked to their own genomic locations at a 5% FDR (Fig. 1B). We found that 23.7% of transcripts in chromosome arms and 20.1% of those in chromosome tips have local QTLs, compared to only 10.2% of those in chromosome centers (χ22 = 495.7, P < 10−107). The chromosomal patterning is robust to confounding by potential hybridization artifacts, as demonstrated by analysis of only the 7694 transcripts for which the CB4856 genotype is associated with higher expression than the reference N2 genotype. The 1057 significant local QTLs among these exhibit the same pattern of enrichment: 20.0% of arm transcripts, 17.9% of tip transcripts, and 9.6% of center transcripts have significant local QTLs (χ22 = 162.7, P < 10−35).

We corroborated the results of linkage mapping by estimating the amount of heritable phenotypic variation attributable to each type of chromosomal domain, using a genome-partitioning approach that avoids assumptions about the number, location, and effect sizes of QTLs (11, 18). We estimated the amount of genetic variance attributable to chromosomal arms versus centers for each of the 1191 traits that are significantly heritable by this method (FDR = 0.05; fig. S2), and we observed an excess of both arm-biased and center-biased traits (fig. S3), consistent with contributions from large-effect or spatially clustered loci. A significant majority of heritable traits are arm-biased (permutation two-tailed P = 0.0325). The arm bias remains when the effects of local QTLs are removed by linear regression (P ≤ 0.0025), and the pattern is not driven by the QTL hotspots (11) (fig. S7).

Several nonexclusive models may explain global patterns of variation in the density of functional variants influencing transcript abundance traits (1, 37, 1921). In standard multivariate quantitative genetic models, equilibrium trait variation results from mutation, selection, and drift, the last governed by effective population size (Ne) and acting uniformly across traits (22). We asked whether mutation and selection could explain why some transcript abundance traits are influenced by their own genomic loci and why others are not. We focused on these local QTLs because they represent largely independent genetic variants, are precisely localized, and account for a large fraction of the phenotypic variance in traits with local QTLs (Fig. 1B).

Variation in local QTL density should reflect variation in rates of local mutational input. In C. elegans, the rate of spontaneous single-base mutation has been directly measured and is uniform on a chromosomal scale, with no dependence on recombination rate or domain structure (23). Consequently, the rate of mutation that generates local QTLs probably depends on the local mutational target size. Indeed, genes with local QTLs are longer than those without (t test on log-transformed lengths, P = 0.004).

Variation in QTL density should also reflect variation in the intensity of purifying selection, which eliminates mutations that adversely affect the phenotype. We used measurable correlates of purifying selection to test this model. Genes that exhibit phenotypes when their expression is knocked down by RNA interference (RNAi) [effectively essential genes; nearly all characterized RNAi phenotypes would be lethal in nature (11)] are less likely to have local QTLs than genes with no RNAi phenotype (χ2 = 55.1, P < 2 × 10−13). Moreover, we observed fewer evolutionarily constrained nucleotides in genes with local QTLs [(11); genes include introns and flanking sequence] than in genes without (t test on Box-Cox transformed values, P < 4 × 10−23).

Phenotypic variance not attributable to local QTLs, including measurement error and environmental variance as well as distant genetic effects, does not differ significantly between transcripts with and without local QTLs (t test on log-transformed data, P = 0.93). However, traits with local QTLs are more likely than traits without to also map to additional QTLs (χ2 = 63.2, P < 2 × 10−15). Thus, traits that can withstand local genetic variation can also withstand other genetic perturbations, consistent with these transcript abundances experiencing weaker stabilizing selection compared to other genes.

To determine whether the variables associated with mutational target size and strength of selection have independent effects on local QTL probability, we tested their explanatory value in multiple logistic regression models. Gene interval length, number of conserved bases, RNAi phenotype, and presence of distant QTLs are all significant predictors of local QTL probability in a model that includes them all (model M1 in Table 2).

Table 2

Logistic regression models implicate mutation, stabilizing selection, and linked selection in explaining the distribution of local linkages. LRT: likelihood ratio test statistic comparing the logistic regression model in which the specified term has been dropped to the model in which all terms are included. LRT is equivalent to the drop in explained deviance due to excluding the term from the model. The null deviance is 12897.5. The LRT was tested against a χ2 distribution to yield the associated P values. ∆df: difference in degrees of freedom between the specified model and the null model, including only the intercept. Chromosomal domain is a factor with three levels and hence contributes two degrees of freedom. AIC: Akaike information criterion. Model 4 includes all two-, three-, and four-way interactions among the variables. Consequently the LRT and P values for dropping single terms cannot be calculated.

View this table:

However, when the chromosomal domain of each gene (tip, arm, or center) was included as a factor (model M2), it was by far the most explanatory variable. Indeed, chromosomal domain alone (model M3) explained the QTL data better than a model incorporating all of the gene-level attributes, even when all interactions among the variables were included (model M4). Genic point estimates of the recombination rate, although significant if domain type was excluded (model M5), had no significant explanatory value after taking the domains into account (M6). Thus, the domain patterning of local QTLs is not explained by gene-level measures of mutation, selection, or recombination.

Although the effective population size (Ne), which governs genetic drift, is shared by all measured traits, natural selection can cause variation in apparent Ne along the genome. Selection—positive or negative—causes alleles in future generations to be descended from a smaller subset of current alleles than would occur without selection, decreasing the Ne of the linked genomic interval (2426). In C. elegans, high levels of self-fertilization reduce the effective recombination rate, increasing the effect of selection at linked sites on standing variation at the level of sequence polymorphism (23, 2729).

In primarily selfing species with small effective population sizes, such as C. elegans, background selection, the reduction in neutral variation due to linkage between neutral variants and deleterious mutations undergoing deterministic elimination from the population (26), is likely to be the predominant form of linked selection (28, 30), and it provides a parsimonious explanation for patterns of variation given the certainty that deleterious mutations arise and are eliminated by selection. Although hitchhiking due to positive selection may also be operating, data from C. briggsae, a nematode that shares C. elegans’s mating system, strongly favor background selection over the alternative models of selection at linked sites (30). Under background selection, the level of neutral variation at a gene is a function of the number of linked sites susceptible to deleterious mutation and the effective rate of recombination between each such site and the gene. We fitted an explicit model of background selection to each gene (26, 31), estimating the physical distribution of deleterious mutations from comparative genomic data and considering a range of values for two poorly constrained parameters: the strength of selection against deleterious mutations and the inbreeding coefficient, F, whose complement (P = 1 − F) rescales the meiotic recombination rate to yield the effective rate in partially selfing species (11).

Background selection was a highly significant (P < 10−80, model M7) predictor of local QTL probability in logistic regression analyses that include all of the gene-specific mutation and selection variables, and it entirely accounts for the effect of domain type (model M8). Background selection accounts for more of the explained deviance than all gene-specific variables combined, across nearly all of the parameter space of inbreeding and selection intensity (Fig. 2A and fig. S4).

Fig. 2

(A) The significance of background selection in a logistic regression model (which includes gene-specific mutation and selection variables) is plotted as a function of the index of panmixis and strength of selection against deleterious mutations. Background selection is significant at P < 0.01 across all but a small slice of parameter space corresponding to very low rates of outcrossing (black). The red lines bracket the region of parameter space over which background selection explains more of the local linkage probability than any other variable in the model. See fig. S4. (B) Effects of background selection on levels of variation along the chromosomes under the best-fitting background selection model.

These results were robust to variation in deleterious mutation rate, alternative treatments of the genetic map and genic variables, different significance thresholds for linkage, alternative modeling methods, and exclusion of all genes susceptible to hybridization artifacts (fig. S5). Although our model omits the effects of Hill-Robertson interference between linked mutations, such effects are expected to operate primarily as a scaling factor on the expected reduction in variation due to background selection (32). The background selection model that best explains the data predicts high levels of neutral variation on the chromosome arms and low levels in the centers (Fig. 2B). The low-recombination chromosome tips are more similar to the high-recombination arms than to the low-recombination centers because they are linked to deleterious mutations only on one side.

Although the effects of selection on linked neutral nucleotide polymorphism are widely recognized, we have shown that such selection at linked sites is also a major factor shaping heritable phenotypic variation. Consequently, quantitative genetic models predicated on uniform effects of genetic drift across traits are not valid in C. elegans.

Transcript abundances in C. elegans, as in other species, are undoubtedly shaped by trait-specific mutation rates and selection pressures (37, 1921). At the global level, however, the propensity of traits to vary in C. elegans is explained by processes independent of the functions of the individual transcripts. These findings provide an alternative explanation for the observed discordance between standing phenotypic variation in C. elegans and that predicted from neutral mutation-drift equilibrium (4). It may also explain the fine-scale correlation between cis-acting regulatory polymorphism and gene density in humans (20).

Natural selection and quantitative genetic analyses both rely on replicated measurements of the marginal effects of alleles across randomized genetic backgrounds. We have used quantitative genetics in C. elegans to show that randomization in this partially selfing species is ineffective, diminishing the ability of natural selection to evaluate individual alleles. Consequently the evolutionary fates of alleles—and hence phenotypes—are determined less by their own effects than by the genomic company they keep.

Supporting Online Material

Materials and Methods

Figs. S1 to S7


References and Notes

  1. Information on materials and methods is available on Science Online.
  2. We thank E. Andersen, A. Chang, J. Gerke, R. Ghosh, D. Gresham, M. Hahn, A. Paaby, D. Pollard, H. Seidel, and J. Shapiro for comments on the manuscript. We thank the Caenorhabditis Genetics Center, funded by the NIH National Center for Research Resources, for strains. Our work was supported by the NIH (grants R01 HG004321 to L.K., R01 GM089972 to M.V.R., and P50 GM071508 to the Lewis-Sigler Institute), a Jane Coffin Childs Fellowship (M.V.R.), an Ellison Foundation New Scholar Award (M.V.R.), and a James S. McDonnell Foundation Centennial Fellowship (L.K.). L.K. is an investigator of the Howard Hughes Medical Institute. Microarray data have been deposited at the Gene Expression Omnibus with accession number GSE23857.
View Abstract

Navigate This Article