Supplemental Data
Web Fig. 1: Annotation of the Celera Human Genome Assembly
Supplementary Material
Supplemental Table 1.Chromosomal distribution of intronless paralogs. | |||||
IP | SOURCE | IP | SOURCE | DEFINTION/PROTEIN NAME | SOURCE GENBANK INDEX NUMBER |
---|---|---|---|---|---|
hCP48693 | hCP37465 | chr20 | chr6 | BCL2-antagonist/killer 1; cell death inhibitor 1 | 4502363 |
hCP45781 | hCP44963 | chr1 | chr13 | RNA polymerase I 16 kDa subunit | 7705740 |
hCP45175 | hCP40330 | chr9 | chr6 | eukaryotic translation elongation factor 1 alpha 1 | 4503471 |
hCP47770 | hCP40330 | chr5 | chr6 | eukaryotic translation elongation factor 1 alpha 1 | 4503471 |
hCP48153 | hCP40330 | chr7 | chr6 | eukaryotic translation elongation factor 1 alpha 1 | 4503471 |
hCP45176 | hCP40330 | chr9 | chr6 | eukaryotic translation elongation factor 1 alpha 1-like 14 | 4503473 |
hCP47771 | hCP40330 | chr5 | chr6 | eukaryotic translation elongation factor 1 alpha 1-like 14 | 4503473 |
hCP48154 | hCP40330 | chr7 | chr6 | eukaryotic translation elongation factor 1 alpha 1-like 14 | 4503473 |
hCP43102 | hCP45627 | chr19 | chr8 | eukaryotic translation elongation factor 1 delta | 4503479 |
hCP35783 | hCP52348 | chr2 | chr11 | eukaryotic translation initiation factor 3, subunit 5 | 4503519 |
hCP34934 | hCP39451 | chr2 | chr12 | heterogeneous nuclear ribonucleoprotein A1 | 4504445 |
hCP44529 | hCP39451 | chr13 | chr12 | heterogeneous nuclear ribonucleoprotein A1 | 4504445 |
hCP43335 | hCP44582 | chr12 | chr17 | non-metastatic cells 2, protein (NM23B) | 4505409 |
hCP50486 | hCP40136 | chr8 | chr14 | proteasome (prosome, macropain) 26S subunit, ATPase, 6 | 4506215 |
hCP42158 | hCP46912 | chr12 | chr2 | prothymosin, alpha (gene sequence 28) | 4506277 |
hCP45662 | hCP37866 | chr2 | chr18 | ras homolog gene family, member B; RhoB; | 4757764 |
hCP36390 | hCP37104 | chr8 | chr7 | H2A histone family, member Z | 4504255 |
hCP43150 | hCP37104 | chr15 | chr7 | H2A histone family, member Z | 4504255 |
hCP46278 | hCP39295 | chr4 | chr3 | NADH dehydrogenase (ubiquinone) 1 beta | 6041669 |
hCP36897 | hCP34863 | chr5 | chr12 | RAP1A, member of RAS oncogene | 4506413 |
hCP46034 | hCP35342 | chr17 | chrX | adaptor-related protein complex 1, sigma | 4506957 |
hCP35830 | hCP51363 | chr6 | chr4 | alcohol dehydrogenase 5 (class III), | 4501937 |
hCP44848 | hCP40594 | chr9 | chr1 | cell division cycle 20 | 4557437 |
hCP50246 | hCP49760 | chr4 | chr1 | cell division cycle 42 | 4757952 |
hCP48857 | hCP48263 | chr15 | chr7 | chromobox homolog 3 | 6005780 |
hCP46158 | hCP42461 | chr13 | chr11 | diacylglycerol kinase, zeta (104kD) | 4503317 |
hCP37376 | hCP49621 | chr7 | chr11 | eukaryotic translation elongation factor 1 | 4503481 |
hCP36628 | hCP39069 | chr15 | chr8 | fatty acid binding protein 5 | 4557581 |
hCP43166 | hCP39069 | chr15 | chr8 | fatty acid binding protein 5 | 4557581 |
hCP49159 | hCP35287 | chr20 | chr19 | ferritin, light polypeptide; hypothetical protein | 4503797 |
hCP38447 | hCP34306 | chr19 | chr16 | glycine cleavage system protein H | 4758424 |
hCP51879 | hCP34306 | chr1 | chr16 | glycine cleavage system protein H | 4758424 |
hCP42685 | hCP43793 | chr20 | chr13 | high-mobility group (nonhistone chromosomal) protein | 4504425 |
hCP42865 | hCP43793 | chr3 | chr13 | high-mobility group (nonhistone chromosomal) protein | 4504425 |
hCP49883 | hCP43793 | chr20 | chr13 | high-mobility group (nonhistone chromosomal) protein | 4504425 |
hCP50559 | hCP43793 | chr15 | chr13 | high-mobility group (nonhistone chromosomal) protein | 4504425 |
hCP50984 | hCP43793 | chr22 | chr13 | high-mobility group (nonhistone chromosomal) protein | 4504425 |
hCP37795 | hCP42486 | chrX | chr10 | phosphoglycerate mutase 1 (brain) | 4505753 |
hCP49866 | hCP42486 | chr12 | chr10 | phosphoglycerate mutase 1 (brain) | 4505753 |
hCP48871 | hCP33485 | chr4 | chr5 | pituitary tumor-transforming protein 1 | 4758980 |
hCP43053 | hCP201144 | chr3 | chr6 | pre-B-cell leukemia transcription factor 2 | 4505625 |
hCP47725 | hCP49987 | chr5 | chr14 | proteasome (prosome, macropain) activator subunit | 4506237 |
hCP38333 | hCP35418 | chrX | chr7 | ras-related C3 botulinum toxin substrate | 5902042 |
hCP40764 | hCP36922 | chr3 | chr6 | ribosomal protein L10a; neural precursor | 6325472 |
hCP45150 | hCP34348 | chr17 | chr16 | ribosomal protein L13 | 4506599 |
hCP42475 | hCP35262 | chr10 | chr19 | ribosomal protein L13a | 6912634 |
hCP43258 | hCP35262 | chr12 | chr19 | ribosomal protein L13a | 6912634 |
hCP43889 | hCP35262 | chr13 | chr19 | ribosomal protein L13a | 6912634 |
hCP48078 | hCP35262 | chr12 | chr19 | ribosomal protein L13a | 6912634 |
hCP51648 | hCP35262 | chr10 | chr19 | ribosomal protein L13a | 6912634 |
hCP40196 | hCP41680 | chr3 | chr18 | ribosomal protein L17 | 4506617 |
hCP35655 | hCP44971 | chr1 | chr13 | ribosomal protein L21 | 4506611 |
hCP39305 | hCP44971 | chr14 | chr13 | ribosomal protein L21 | 4506611 |
hCP41351 | hCP44971 | chr11 | chr13 | ribosomal protein L21 | 4506611 |
hCP47660 | hCP44971 | chr4 | chr13 | ribosomal protein L21 | 4506611 |
hCP47833 | hCP44971 | chr7 | chr13 | ribosomal protein L21 | 4506611 |
hCP48726 | hCP44971 | chr4 | chr13 | ribosomal protein L21 | 4506611 |
hCP49814 | hCP44971 | chr10 | chr13 | ribosomal protein L21 | 4506611 |
hCP50215 | hCP44971 | chr10 | chr13 | ribosomal protein L21 | 4506611 |
hCP34220 | hCP45368 | chr3 | chr17 | ribosomal protein L23a | 4506615 |
hCP43068 | hCP45090 | chr12 | chr17 | ribosomal protein L26 | 4506621 |
hCP38068 | hCP41270 | chr11 | chr11 | ribosomal protein L27a | 4506625 |
hCP38885 | hCP41270 | chr6 | chr11 | ribosomal protein L27a | 4506625 |
hCP34480 | hCP51948 | chr3 | chr3 | ribosomal protein L29 | 4506629 |
hCP36437 | hCP43819 | chr7 | chr3 | ribosomal protein L32 | 4506635 |
hCP39494 | hCP43819 | chr6 | chr3 | ribosomal protein L32 | 4506635 |
hCP201498 | hCP44392 | chr1 | chr9 | ribosomal protein L35 | 6005860 |
hCP51162 | hCP44392 | chr7 | chr9 | ribosomal protein L35 | 6005860 |
hCP39467 | hCP43627 | chr14 | chr9 | ribosomal protein L7a | 4506661 |
hCP41685 | hCP48439 | chr15 | chr4 | ribosomal protein L9 | 4506665 |
hCP201561 | hCP35286 | chr12 | chr19 | ribosomal protein S11 | 4506681 |
hCP42984 | hCP39006 | chr11 | chr6 | ribosomal protein S12 | 4506683 |
hCP42446 | hCP47279 | chr1 | chr16 | ribosomal protein S15a | 4506689 |
hCP201365 | hCP52071 | chr19 | chr19 | ribosomal protein S16 | 4506691 |
hCP42269 | hCP52071 | chr1 | chr19 | ribosomal protein S16 | 4506691 |
hCP40118 | hCP42669 | chr5 | chr15 | ribosomal protein S17 | 4506693 |
hCP51382 | hCP42669 | chr22 | chr15 | ribosomal protein S17 | 4506693 |
hCP35240 | hCP42007 | chr8 | chr12 | ribosomal protein S26 | 4506709 |
hCP35551 | hCP42007 | chr2 | chr12 | ribosomal protein S26 | 4506709 |
hCP36459 | hCP42007 | chrX | chr12 | ribosomal protein S26 | 4506709 |
hCP38950 | hCP42007 | chr4 | chr12 | ribosomal protein S26 | 4506709 |
hCP39533 | hCP42007 | chr8 | chr12 | ribosomal protein S26 | 4506709 |
hCP41719 | hCP42007 | chr13 | chr12 | ribosomal protein S26 | 4506709 |
hCP43701 | hCP42007 | chr9 | chr12 | ribosomal protein S26 | 4506709 |
hCP44708 | hCP42007 | chr9 | chr12 | ribosomal protein S26 | 4506709 |
hCP45758 | hCP42007 | chr15 | chr12 | ribosomal protein S26 | 4506709 |
hCP46725 | hCP42007 | chr7 | chr12 | ribosomal protein S26 | 4506709 |
hCP50862 | hCP42007 | chr8 | chr12 | ribosomal protein S26 | 4506709 |
hCP51315 | hCP42007 | chr10 | chr12 | ribosomal protein S26 | 4506709 |
hCP38914 | hCP34424 | chr16 | chr2 | ribosomal protein S27a | 4506713 |
hCP50884 | hCP34424 | chr1 | chr2 | ribosomal protein S27a | 4506713 |
hCP50962 | hCP38574 | chr22 | chr19 | ribosomal protein S9 | 4506745 |
hCP37989 | hCP35777 | chrX | chr3 | teratocarcinoma-derived growth factor 1 | 4507425 |
hCP40506 | hCP35239 | chr10 | chr8 | tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation | 4507953 |
hCP50191 | hCP35239 | chr2 | chr8 | tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation | 4507953 |
Supplemental Table 2. Examples of paralogs with disease associations on duplicated genome segments. (Both panels should be viewed in a linear format to appreciate the similarities in the diseases caused by the paralogous duplications) | ||||
Gene name | Protein | Chr | Disease | OMIM |
---|---|---|---|---|
complement component 8 | hCP40779 | 1 | C8 deficiency, type II | 120960 |
nuclear receptor subfamily 3, group C | hCP48259 | 4 | Pseudohypoaldosteronism type I, | 600983 |
complement component 9 | hCP47512 | 5 | C9 deficiency Immune | 120940 |
glucocorticoid receptor | hCP47558 | 5 | Cortisol resistance Metabolic | 138040 |
hexosaminidase B (beta polypeptide) | hCP48123 | 5 | Sandhoff disease, Neurological | 268800 |
homeo box A13hCP48915 | 7 | Hand-foot-uterus syndrome | 142959 | |
tyrosine hydroxylase | hCP35709 | 11 | Segawa syndrome, | 191290 |
phenylalanine hydroxylase | hCP39374 | 12 | Hyperphenylalaninemia, | 261600 |
connexin 26hCP38170 | 13 | Deafness, autosomal dominant | 121011 | |
coagulation factor VII | hCP43441 | 13 | Factor VII deficiency | 227500 |
coagulation factor X | hCP43442 | 13 | Factor X deficiency | 227600 |
insulin promoter transcription factor 1 | hCP44966 | 13 | Pancreatic agenesis | 600733 |
hexosaminidase A | hCP50228 | 15 | Hex A pseudodeficiency | 272800 |
coagulation factor IX | hCP35448 | X | Hemophilia B Hematological | 306900 |
coagulation factor IX | hCP35448 | X | Hemophilia B Hematological | 306900 |
connexin 32hCP37674 | X | Charcot-Marie-Tooth neuropathy | 304040 | |
Duplicated gene name | Homolog | Chr | Disease | OMIM |
complement component 9 | hCP47512 | 5 | C9 deficiency120940 | |
glucocorticoid receptor | CP47558 | 5 | Cortisol resistance Metabolic | 138040 |
complement component 8 | hCP40779 | 1 | C8 deficiency, type II Immune | 120960 |
nuclear receptor subfamily 3 | hCP48259 | 4 | Pseudohypoaldosteronism type I | 600983 |
hexosaminidase A | hCP50228 | 15 | Hex A pseudodeficiency Neurological | 272800 |
insulin promoter TF-1 | hCP44966 | 13 | Pancreatic agenesis Gastrointestinal | 600733 |
phenylalanine hydroxylase | hCP39374 | 12 | Hyperphenylalaninemia, mild Neurological | 261600 |
tyrosine hydroxylase | hCP35709 | 11 | Segawa syndrome, recessive Neurological | 191290 |
connexin 32 | hCP37674 | 23 | Charcot-Marie-Tooth neuropathy | 304040 |
coagulation factor IX | hCP35448 | 23 | Hemophilia B Hematological | 306900 |
coagulation factor IX | hCP35448 | 23 | Hemophilia B Hematological | 306900 |
homeo box A13 | hCP48915 | 7 | Hand-foot-uterus syndrome Renal | 142959 |
hexosaminidase B | hCP48123 | 5 | Sandhoff disease, infantile | 268800 |
coagulation factor VII | hCP43441 | 13 | Factor VII deficiency | 227500 |
coagulation factor X | hCP43442 | 13 | Factor X deficiency | 227600 |
connexin 26 | hCP38170 | 13 | Deafness, autosomal dominant 3 | 121011 |
Supplemental Figure 2A. Karyotype analysis of donors.
Medium version | Full size version
Supplemental Figure 2B.
Medium version | Full size version
Supplemental Figure 2C.
Medium version | Full size version
Supplemental Figure 2D.
Medium version | Full size version
Supplemental Figure 2E.
Medium version | Full size version
Supplemental Figure 3 -Chromosome 1. Comparison of the CSA and the PFP assembly. To generate the figure, Celera fragment sequences were mapped onto each assembly. The PFP assembly is indicated in the upper third of each panel; the Celera assembly is indicated in the lower third. In the center of the panel, green lines show Celera sequences that are in the same order and orientation in both assemblies and form the longest consistently ordered run of sequences. Yellow lines indicate sequence blocks that are in the same orientation, but out of order. Red lines indicate sequence blocks that are not in the same orientation. For clarity, in the latter two cases, lines are only drawn between segments of matching sequence that are at least 50 kbp long. The top and bottom thirds of each panel show the extent of Celera mate-pair violations (red, misoriented; yellow, incorrect distance between the mates) for each assembly grouped by library size. (Mate pairs that are within the correct distance, as expected from the mean library insert size, are omitted from the figure for clarity.) Predicted breakpoints, corresponding to stacks of violated mate pairs of the same type, are shown as blue ticks on each assembly axis. Runs of more than 10,000 Ns are shown as cyan bars. Plots for each of the 24 chromosomes can be seen as separate files.
Medium version | Full size version
Supplemental Figure 3 -Chromosome 2.
Medium version | Full size version
Supplemental Figure 3 -Chromosome 3.
Medium version | Full size version
Supplemental Figure 3 -Chromosome 4.
Medium version | Full size version
Supplemental Figure 3 -Chromosome 5.
Medium version | Full size version
Supplemental Figure 3 -Chromosome 6.
Medium version | Full size version
Supplemental Figure 3 -Chromosome 7.
Medium version | Full size version
Supplemental Figure 3 -Chromosome 8.
Medium version | Full size version
Supplemental Figure 3 -Chromosome 9.
Medium version | Full size version
Supplemental Figure 3 -Chromosome 10.
Medium version | Full size version
Supplemental Figure 3 -Chromosome 11.
Medium version | Full size version
Supplemental Figure 3 -Chromosome 12.
Medium version | Full size version
Supplemental Figure 3 -Chromosome 13.
Medium version | Full size version
Supplemental Figure 3 -Chromosome 14.
Medium version | Full size version
Supplemental Figure 3 -Chromosome 15.
Medium version | Full size version
Supplemental Figure 3 -Chromosome 16.
Medium version | Full size version
Supplemental Figure 3 -Chromosome 17.
Medium version | Full size version
Supplemental Figure 3 -Chromosome 18.
Medium version | Full size version
Supplemental Figure 3 -Chromosome 19.
Medium version | Full size version
Supplemental Figure 3 -Chromosome 20.
Medium version | Full size version
Supplemental Figure 3 -Chromosome 21.
Medium version | Full size version
Supplemental Figure 3 -Chromosome 22.
Medium version | Full size version
Supplemental Figure 3 -Chromosome 23.
Medium version | Full size version
Supplemental Figure 3 -Chromosome 24.