Report

Genome Sequence, Comparative Analysis, and Population Genetics of the Domestic Horse

See allHide authors and affiliations

Science  06 Nov 2009:
Vol. 326, Issue 5954, pp. 865-867
DOI: 10.1126/science.1178158

Abstract

We report a high-quality draft sequence of the genome of the horse (Equus caballus). The genome is relatively repetitive but has little segmental duplication. Chromosomes appear to have undergone few historical rearrangements: 53% of equine chromosomes show conserved synteny to a single human chromosome. Equine chromosome 11 is shown to have an evolutionary new centromere devoid of centromeric satellite DNA, suggesting that centromeric function may arise before satellite repeat accumulation. Linkage disequilibrium, showing the influences of early domestication of large herds of female horses, is intermediate in length between dog and human, and there is long-range haplotype sharing among breeds.

As one of the earliest domesticated species, the horse, Equus caballus, has played an important role in human exploration of novel territories. Belonging to the order perissodactyla (i.e., odd-toed animals with hooves), the genus Equus radiated into 8 or 9 species around three million years ago (1). Members of the family equidae exhibit diverged karyotypes (2) and variable centromeric positioning (1). With over 90 hereditary conditions, which may serve as models for human disorders (3, 4) (such as infertility, inflammatory diseases, and muscle disorders), the horse has much to offer as a model species.

DNA from a single mare of the Thoroughbred breed was sequenced to 6.8× coverage [supporting online material (SOM) text], resulting in a high-quality draft assembly (designated EquCab2.0) with a 112-kb N50 contig size (SOM) and a 46-Mb N50 scaffold size (tables S1 and S2), and >95% of the sequence anchored to the 64 (2N) equine chromosomes. The 2.5- to 2.7-Gb genome size is somewhat larger than the dog genome (2.5 Gb) and smaller than the human and bovine genomes (2.9 Gb) (57). Segmental duplications (8) make up <1% of the equine genome, and most are intrachromosomal duplications such as are seen in many other mammalian genomes (SOM). Repetitive sequences, many equine-specific, make up 46% of the genome assembly (SOM). The predominant repeat classes include long interspersed nuclear elements, dominated by L1 and L2 types (tables S3 and S4) (19% of bases), and short interspersed nuclear elements, including the recent ERE1 and ERE2 and the ancestral main immunogenic regions (7% of bases). Comparison of horse and human chromosomes reveals strong conserved synteny between these species (fig. S1). Indeed, 17 horse chromosomes (53%) comprise material from a single human chromosome (in the dog, it is 29%).

One unexpected feature of the horse genome landscape was the identification of an evolutionary new centromere (ENC) on chromosome 11 (ECA11), captured in an immature state. Several ENCs have been generated in the genus Equus by centromere repositioning (a shift of centromeric position without chromosome rearrangement) (1). Mammalian centromeres are typically complex structures characterized by the presence of satellite tandem repeats. ENCs are believed to form initially by unknown mechanisms in repeat-free regions and then progressively acquire extended arrays of satellite tandem repeats that may contribute to functional stability (9). The centromere of ECA11 resides in a large region of conserved synteny in many mammals, where the horse is the only species with a centromere present, strongly suggesting that this centromere is evolutionarily new. The ECA11 centromere is the only horse centromere lacking any hybridization signal in fluorescence in situ hybridization experiments probing with the two major horse satellite sequences (fig. S2A, table S8, and SOM text), as if it had not had enough time to acquire satellite DNA. We cytogenetically localized the primary constriction (fig. S2B), then precisely mapped, at the sequence level, the centromeric function using chromatin immunoprecipitation (ChIP)–on–chip experiments (fig. S5). In this region, we found only five sequence gaps [none >200 base pairs (bp)], no protein coding sequences, normal levels of noncoding conserved elements, and typical levels of interspersed repetitive sequences, but no satellite tandem repeated sequences (Fig. 1A). We also found no evidence of accumulation of L1 transposons (10) or KERV-1 elements (11), which were previously hypothesized to influence ENC formation. We propose that the ECA11 centromere was formed very recently during the evolution of the horse lineage, and, in spite of being functional and stable in all horses, has not yet acquired the marks typical of mammalian centromeres.

Fig. 1

Major findings of the genome analysis. (A) Analysis of the primary centromeric constriction of ECA11: 26,000,000 to 30,000,000 bases. ChIP-on-chip analysis with antibodies against centromeric proteins (CENP-A and CENP-C) shows two regions (136 and 99 kb) bound by kinetochore proteins. There are no uncaptured and few captured gaps, a normal fraction of bases in repeat sequences, no satellite tandem repeats, no protein-coding sequences present nearby, and normal levels of noncoding conserved elements (29 eutherians). (B) Horse LD is intermediate between human and dog. (C) Horses exhibit more long-range across-breed haplotype sharing than dogs. Haplotypes have the same color across breeds. Haplotypes in <5% of all individuals are light gray, and haplotypes in >5% of all individuals but a single breed are dark gray. Data show LD regions on ECA18 (first 100 kb) and dog chromosome 12 (first 100 kb), which are representative. Full data are in table S11.

The equine gene set is similar to those of other eutherian mammals and has a predicted 20,322 protein-coding genes (ENSEMBL build 52.2b), of which 16,617, 17,106, and 17,106 have evidenced orthology to human, mouse, and dog, respectively. The remainder is composed of projected protein-coding genes, novel protein-coding genes, and pseudogenes. One-to-one orthologs with the human account for 15,027 horse gene predictions (SOM). Transcriptome analysis of eight equine samples confirmed the expression of 87% of the 18,039 nonoverlapping genes predicted by ENSEMBL and 88% of the 169,073 predicted exons. Gene family analysis shows paralogous expansion in horses as compared to both human and bovine (SOM) for several interesting families, such as keratin genes related to the condition of pachyonychia (nail bed thickening) in humans (12), perhaps affecting hoof formation; and opsin genes for photoreception, possibly advantageous for visual perception of predators (table S9).

The history of horse domestication, which has important implications for trait mapping strategies, differs in important ways from that of the domestic dog but is perhaps similar to that of the cow. Horses do not appear to have undergone a tight domestication bottleneck, and the presence of many matrilines in domestic horse history has been postulated (13). Screening the horse Y chromosome revealed a limited number of patrilines, consistent with a strong sex bias in the domestication process (14).

We first generated a single-nucleotide polymorphism (SNP) map of more than one million markers at an average density of one SNP per 2 kb by lightly sequencing seven horses from different breeds and by mining the assembly for SNPs (table S10).

We characterized the haplotype structure within and across breeds by genotyping 1,007 SNPs from 10 regions of the genome (SOM) in 12 populations, including 11 breed sets (each with 24 representatives), and 1 set of individual representatives from 24 other breeds and equids. 98% of SNPs were validated, with an average of 69% being polymorphic in alternate breeds (SOM). Like the bovine (15), within-breed linkage disequilibrium (LD) is moderate, dropping to twice the background levels (r2) at 100 to 150 kb (Fig. 1B). The majority of breeds showed similar LD (SOM and fig. S7), and major haplotypes were frequently shared among diverse populations (Fig. 1C). Based on the length of LD in the horse, the number of haplotypes within haplotype blocks, and the polymorphism rate, power calculations suggest that ~100,000 SNPs are sufficient for association mapping within all breeds as well as across breeds (SOM and fig. S8).

Phylogenetic relationships among breeds were inconsistent across resequenced regions (fig. S9), which is most likely a consequence of the close relationships of horse breeds worldwide. We were unable to phylogenetically separate E. przewalskii from the domesticated horses, despite its different karyotype (2N = 66 versus 2N = 64 for the domesticated horse), which is in agreement with recent findings (16), whereas the donkey (E. africanus) is clearly a distinct taxon (fig. S9, table S14, and SOM text). This suggests that either intermixing of E. przewalskii and E. caballus occurred after subspecies separation or that E. przewalskii is recently derived from E. caballus.

We demonstrated the utility of the equine genome sequence and a SNP map by applying these resources to mutation detection for the Leopard Complex (LP) spotting locus (SOM). LP (Appaloosa spotting) is defined by patterns of white occurring with or without pigmented spots (fig. S10). Homozygosity confers a phenotype associated with congenital stationary night blindness in the Appaloosa breed (17). Fine mapping of a 2-Mb region followed by regional sequence capture and sequencing (300 kb) found no indications of associated copy number variants or insertions or deletions but found 42 associated SNPs. Of these, 21 reside within an associated haplotype near a candidate gene melastatin 1 (TRPM1), which is expressed in the eye and melanocytes (18). Two conserved SNPs may be good candidates for the causal mutation.

Our analysis of the first high-quality draft sequence of a horse (E. caballus) distinguishes E. caballus from earlier eutherian genomes by its large synteny with humans and the identification of a centromere repositioning event that may provide an effective model to study epigenetic factors responsible for centromere function. Our results demonstrate that horse population history has led to across-breed haplotype sharing, increasing the feasibility of across-breed mapping. Mapping projects in the horse are likely to accelerate in the coming years and will identify mutations in genes related to morphology, immunology, and metabolism, which may benefit human health.

Supporting Online Material

www.sciencemag.org/cgi/content/full/326/5954/865/DC1

Materials and Methods

SOM Text

Figs. S1 to S11

Tables S1 to S14

References

References and Notes

  1. We thank the Kentucky Horse Park and L. Chemnick for samples, L. Gaffney for graphics, and M. Daly for useful discussions. Supported by the National Human Genome Research Institute, the Dorothy Russell Havemeyer Foundation, the Volkswagen Foundation, the Morris Animal Foundation, the Centro di Eccellenza di Genomica in Campo Biomedico e Agrario, and the Programmi di Ricerca Scientifica di Rilevante Interesse Nazionale (PRIN-2006). K.L.T. is the recipient of a European Young Investigator award funded by the European Science Foundation. Sequences have GenBank accession numbers AAWR02000001 to AAWR02055316. 1,163,466 discovery SNPs have accession numbers rs6844103 to rs69617090 in dbSNP 130.
View Abstract

Navigate This Article