An Allele of COL9A2 Associated with Intervertebral Disc Disease

See allHide authors and affiliations

Science  16 Jul 1999:
Vol. 285, Issue 5426, pp. 409-412
DOI: 10.1126/science.285.5426.409


Intervertebral disc disease is one of the most common musculoskeletal disorders. A number of environmental and anthropometric risk factors may contribute to it, and recent reports have suggested the importance of genetic factors as well. The COL9A2 gene, which codes for one of the polypeptide chains of collagen IX that is expressed in the intervertebral disc, was screened for sequence variations in individuals with intervertebral disc disease. The analysis identified a putative disease-causing sequence variation that converted a codon for glutamine to one for tryptophan in six out of the 157 individuals but in none of 174 controls. The tryptophan allele cosegregated with the disease phenotype in the four families studied, giving a lod score (logarithm of odds ratio) for linkage of 4.5, and subsequent linkage disequilibrium analysis conditional on linkage gave an additional lod score of 7.1.

Intervertebral disc disease is among the most common musculoskeletal disorders. It is a major cause of work disability and an extremely costly health care problem. It is typically associated with sciatica, which has a prevalence of about 5% in Finland (1). Sciatica is defined as pain caused by a lesion of the spinal nerve root radiating along the femoral or sciatic nerve from the back into the dermatome of the root. It is usually related to either disc protrusion or herniation, which will cause both chemical and mechanical irritation of the nerve root. Even though there is an association between various environmental and anthropometric risk factors and sciatica, their effects are modest (2, 3). A recent twin study suggests that genetic factors may be involved in the pathogenesis of intervertebral disc disease and sciatica (3). This is supported by findings of a considerable genetic predisposition to early-onset sciatica and lumbar disc herniation in certain families (4). Therefore, intervertebral disc disease appears to be similar to other common complex diseases with multiple genetic forms and a high phenocopy rate, such as breast cancer and Alzheimer's disease.

Intervertebral discs contain an abundant extracellular matrix of proteoglycans and collagens (5). The outer layer, the annulus fibrosus, consists mainly of collagen I, and the interior structure of the disc, the nucleus pulposus, is about 50% proteoglycan, mainly aggrecan, and 20% collagen II. Both contain small amounts of collagen IX. Recent results indicate that mutations in collagen IX and aggrecan can cause age-related disc degeneration and herniation in mice (6, 7).

Collagen IX is a heterotrimer of three α chains, α1(IX), α2(IX), and α3(IX), encoded by the genes COL9A1,COL9A2, and COL9A3, respectively. It consists of three collagenous (COL1 to COL3) and four noncollagenous (NC1 to NC4) domains (8). The COL2 domain is covalently linked to collagen II fibrils (9). Collagen IX is thought to serve as a bridge between collagens and noncollagenous proteins in tissues.

To study the role of collagen IX in intervertebral disc disease and associated sciatica, we selected 157 unrelated Finnish individuals (ages 19 to 78; mean = 44, SD = 13) with unilateral pain of duration over 1 month radiating from the back to below the knee (dermatomes L4, L5, and S1). Therefore, the subjects had the most characteristic symptom of herniated intervertebral disc (1). After clinical evaluation, 156 of them were examined by magnetic resonance imaging (MRI) and one was evaluated by computerized tomography (CT) (10). Radiologically detectable intervertebral disc disease was present at the time of the examination in 73% of the cases. Initially, COL9A2 was screened for sequence variations by conformation-sensitive gel electrophoresis (CSGE) in 10 patients (11), yielding a unique CSGE pattern in the polymerase chain reaction (PCR) products for exon 19 in one case (Fig. 1). Sequencing (12) indicated a heterozygous substitution of Trp for either Gln326 or Arg326 in the COL2 domain (Fig. 2). This finding was surprising, because Trp is rarely found in collagenous domains and there are no Trp residues in the collagenous domains of collagen IX in humans or in the mouse (13). The remaining 147 patients with sciatica were then analyzed, and the Trp allele was found in five of them (Table 1).

Figure 1

CSGE analysis of exon 19 of theCOL9A2 gene. The exon and its flanking sequences were amplified by PCR from the proband (P) and a control (C) using primers specific for intron 18 (5′-TGGATCTCAGTTTCCCTACCTG, −92 to −71 in infron 18) and infron 19 (5′-CAAGAGGTGGTGATTGAGCAAGAGC, +99 to +75 in infron 19). The analysis indicated heteroduplexes in the proband's sample.

Figure 2

Sequences for exon 19 of the COL9A2gene. The PCR products were cloned and sequenced, indicating three alleles, containing a CAG codon for Gln, a CGG codon for Arg, and a TGG codon for Trp.

Table 1

Allele counts and frequencies of the sequence variations at α2-326 in collagen IX.

View this table:

Exon 19 was also analyzed in 174 unrelated Finnish controls. All patients and controls were from the same region of Finland. The control group consisted of 101 asymptomatic subjects (ages 21 to 73; mean = 37, SD = 10), 54 with osteoarthritis, and 19 with various chondrodysplasias but no history of sciatica. Not a single Trp allele was found among the 348 chromosomes analyzed (Table 1).

Coinheritance of the Trp allele and the phenotype was studied in the families of four original patients. The two other families were not available for the study. The families were evaluated clinically and by CT (one family) or MRI (three families) (10) and analyzed for the presence of the Trp allele. All family members who had inherited the allele had intervertebral disc disease (Fig. 3).

Figure 3

MRI findings of a 45-year-old male who has the Trp allele, showing endplate degeneration in the L2-L3, L4-L5, and L5-S1 interspaces, and protrusions in all interspaces except L2-L3.

To evaluate the statistical evidence for a connection between the Trp allele in the COL9A2 gene and intervertebral disc disease, we conducted linkage and linkage disequilibrium analysis. Artificial pedigrees were created from the case-control data (14, 15), which made it possible to perform the linkage and linkage disequilibrium analysis on the pedigrees and the singletons jointly, using standard analysis software (16). This is more satisfactory than splitting the data into subsets for different analyses, as is generally done, because joint analysis of linkage and linkage disequilibrium can give more information than the sum of the parts (15, 17).

A dominant inheritance model with full penetrance and a high phenocopy rate [disease allele (D) frequency 0.0024; penetrancesf DD = 1, f D+ = 1, f ++ = 0.0434] was chosen for the analysis. The model predicts a disease prevalence of 4.8%, which is the estimated frequency of sciatica in Finland (1). A high phenocopy rate was chosen, as the etiology of intervertebral disc disease is believed to be multifactorial (2). The modeled disease locus explains only 10% of all cases. The pedigree members were assigned affected status according to the clinical and radiological findings (18). All singleton subjects without intervertebral disc disease were used as controls, as no statistically significant differences in allele frequencies at the polymorphism were found among the three groups (Table 1).

Table 2 summarizes the linkage and linkage disequilibrium results. Linkage analysis gave a lod score ofZ1 = 4.5 (P < 10−5) at a recombination fraction of 0.12 (19,20). After linkage had been demonstrated, linkage disequilibrium analysis given the presence of linkage was performed (21). An additional lod score for linkage disequilibrium ofZ2 = 7.1 (P < 10–7) was obtained (22). The joint lod score is therefore Z1 +Z2 = 11.6 (P < 10−10).

Table 2

Summary of lod scores obtained in linkage and linkage disequilibrium analyses.

View this table:

To investigate the extent to which the results were dependent on the assumed disease model, we varied individual model parameters while keeping the predicted population prevalence unchanged at 4.8%. The linkage and linkage disequilibrium lod scores remained high for a dominant model when the proportion of cases explained by the disease locus was varied and when high but incomplete penetrance was used (Table 2).

Because we speculated that the Trp allele may actually itself be the disease-predisposing allele, the analyses were repeated in a way that addressed this question more directly. Under this hypothesis, recombination between the disease locus and the polymorphism is clearly impossible, as they are the same locus. Furthermore, under this hypothesis, the frequency of the Trp allele is necessarily equal to the frequency of the disease-causing allele, as they are really the same, and the disease allele and the Trp allele are in complete linkage disequilibrium. The previous analyses were therefore repeated under these constraints (23). The results are shown in Table 3. Although these lod scores are much lower than those in Table 2, this does not necessarily contradict the hypothesis of causality: The reason is that the results obtained in “parametric” linkage or linkage disequilibrium analysis depend on the chosen disease model. If the disease model is incorrect—and in practice it is impossible to specify the model with 100% accuracy, in particular for common diseases such as intervertebral disc disease—the obtained lod scores may be greatly reduced and the recombination fraction overestimated (24, 25). This could easily explain why the lod scores are lower in Table 3 than in Table 2. The same phenomenon explains the high rate of false negative results in multipoint “parametric” linkage analysis (24, 25). On the other hand, an incorrect disease model does not systematically lead to false positive evidence of linkage (26).

Table 3

Summary of lod scores obtained in linkage and linkage disequilibrium analyses when testing whether the Trp allele is itself a disease-causing allele.

View this table:

The statistical analyses provided strong evidence that the Trp allele is associated with the phenotype. All 26 individuals carrying the Trp allele had intervertebral disc disease. The results do not, however, prove a direct causal role for the Trp allele in the etiology of the disease, and other interpretations remain open. It is conceivable, for instance, that the true disease locus may lie in close physical proximity to COL9A2, with the disease-predisposing allele in linkage disequilibrium with the Trp allele. The fact that the families had some individuals who were symptomatic but did not have the Trp allele does not contradict the hypothesis that the Trp allele causes the disease. Indeed, this finding is not even surprising, as sciatica and intervertebral disc disease are very common and have a prevalence of about 5% in the Finnish population (1). Any one locus is therefore likely to be responsible for only a small proportion of all affected individuals, and different disease-predisposing alleles will often segregate within one pedigree. In the disease model chosen here, the locus accounted for 10% of the disease prevalence, as stated above.

To exclude the possibility that other sequence variations in the gene might cause the disease, we used CSGE to analyze all the exons and exon boundaries of COL9A2 in patients with the Trp substitution (11). The analysis did not identify any other possible causes of disease in the coding sequences or RNA splice sites, but it indicated two additional, presumably neutral polymorphisms: an A to G change affecting the third nucleotide in the codon for proline (CCA to CCG) at nucleotide 9 in exon 21, and a G to A change at nucleotide +17 in intron 30. These were analyzed in 90 and 286 individuals, respectively, and were found in all six original patients with the Trp allele but in none of the other individuals analyzed. The coexistence of the Trp allele and the two other rare sequence variants indicates that the individuals with the Trp allele have inherited the same, relatively rare, ancestral haplotype.

There are several possible mechanisms by which Trp substitution could contribute to the disease. As the most hydrophobic amino acid, Trp may disrupt the collagen triple helix, and it is also possible that it could interfere with the interaction between collagens IX and II or prevent the action of lysyl oxidase, which catalyzes cross-link formation (27), because the Trp in the α2(IX) chain of collagen IX is located only three amino acid residues NH2-terminal of the covalent lysine-derived cross-link between the α3(IX) chain and collagen II. In addition, the role of collagen IX in intervertebral disc disease is supported by the finding of intervertebral disc degeneration and herniation in a long-term follow-up of transgenic mice expressing a Col9a1 gene with a large in-frame deletion (6).

Thus far, only two collagen IX mutations, one in the COL9A2and one in the COL9A3 gene, have been reported in humans (28). Both mutations cause a skipping of exon 3 and result in multiple epiphyseal dysplasia. They lead to similar deletions of 12 amino acids in the COL3 domain of the molecule, which suggests the importance of this domain in the pathogenesis of the dysplasia.

There are now a large number of examples that illustrate the difficulty of relating genotypes to phenotypes caused by mutated genes, including mutated collagen genes (27, 29). For example, different mutations in collagen II cause phenotypes ranging from only ocular manifestations or osteoarthritis to various chondrodysplasias with or without ocular symptoms (29). Thus, it is not surprising that Trp for Gln substitution in the COL2 domain and splicing mutation in the COL3 domain of the collagen IX molecule can cause different phenotypes.

  • * To whom correspondence should be addressed (in Finland). E-mail: leena.ala-kokko{at}


View Abstract

Stay Connected to Science

Navigate This Article