Supplemental Data


Abstract
Full Text
The Sequence of the Human Genome
J. Craig Venter, Mark D. Adams, Eugene W. Myers, Peter W. Li, Richard J. Mural, Granger G. Sutton, Hamilton O. Smith, Mark Yandell, Cheryl A. Evans, Robert A. Holt, Jeannine D. Gocayne, Peter Amanatides, Richard M. Ballew, Daniel H. Huson, Jennifer Russo Wortman, Qing Zhang, Chinnappa D. Kodira, Xiangqun H. Zheng, Lin Chen, Marian Skupski, Gangadharan Subramanian, Paul D. Thomas, Jinghui Zhang, George L. Gabor, Miklos, Catherine Nelson, Samuel Broder, Andrew G. Clark, Joe Nadeau, Victor A. McKusick, Norton Zinder, Arnold J. Levine, Richard J. Roberts, Mel Simon, Carolyn Slayman, Michael Hunkapiller, Randall Bolanos, Arthur Delcher, Ian Dew, Daniel Fasulo, Michael Flanigan, Liliana Florea, Aaron Halpern, Sridhar Hannenhalli, Saul Kravitz, Samuel Levy, Clark Mobarry, Knut Reinert, Karin Remington, Jane Abu-Threideh, Ellen Beasley, Kendra Biddick, Vivien Bonazzi, Rhonda Brandon, Michele Cargill, Ishwar Chandramouliswaran, Rosane Charlab, Kabir Chaturvedi, Zuoming Deng, Valentina Di Francesco, Patrick Dunn, Karen Eilbeck, Carlos Evangelista, Andrei E. Gabrielian, Weiniu Gan, Wangmao Ge, Fangcheng Gong, Zhiping Gu, Ping Guan, Thomas J. Heiman, Maureen E. Higgins, Rui-Ru Ji, Zhaoxi Ke, Karen A. Ketchum, Zhongwu Lai, Yiding Lei, Zhenya Li, Jiayin Li, Yong Liang, Xiaoying Lin, Fu Lu, Gennady V. Merkulov, Natalia Milshina, Helen M. Moore, Ashwinikumar K Naik, Vaibhav A. Narayan, Beena Neelam, Deborah Nusskern, Douglas B. Rusch, Steven Salzberg, Wei Shao, Bixiong Shue, Jingtao Sun, Zhen Yuan Wang, Aihui Wang, Xin Wang, Jian Wang, Ming-Hui Wei, Ron Wides, Chunlin Xiao, Chunhua Yan, Alison Yao, Jane Ye, Ming Zhan, Weiqing Zhang, Hongyu Zhang, Qi Zhao, Liansheng Zheng, Fei Zhong, Wenyan Zhong, Shiaoping C. Zhu, Shaying Zhao, Dennis Gilbert, Suzanna Baumhueter, Gene Spier, Christine Carter, Anibal Cravchik, Trevor Woodage, Feroze Ali, Huijin An, Aderonke Awe, Danita Baldwin, Holly Baden, Mary Barnstead, Ian Barrow, Karen Beeson, Dana Busam, Amy Carver, Angela Center, Ming Lai Cheng, Liz Curry, Steve Danaher, Lionel Davenport, Raymond Desilets, Susanne Dietz, Kristina Dodson, Lisa Doup, Steven Ferriera, Neha Garg, Andres Gluecksmann, Brit Hart, Jason Haynes, Charles Haynes, Cheryl Heiner, Suzanne Hladun, Damon Hostin, Jarrett Houck, Timothy Howland, Chinyere Ibegwam, Jeffery Johnson, Francis Kalush, Lesley Kline, Shashi Koduru, Amy Love, Felecia Mann, David May, Steven McCawley, Tina McIntosh, Ivy McMullen, Mee Moy, Linda Moy, Brian Murphy, Keith Nelson, Cynthia Pfannkoch, Eric Pratts, Vinita Puri, Hina Qureshi, Matthew Reardon, Robert Rodriguez, Yu-Hui Rogers, Deanna Romblad, Bob Ruhfel, Richard Scott, Cynthia Sitter, Michelle Smallwood, Erin Stewart, Renee Strong, Ellen Suh, Reginald Thomas, Ni Ni Tint, Sukyee Tse, Claire Vech, Gary Wang, Jeremy Wetter, Sherita Williams, Monica Williams, Sandra Windsor, Emily Winn-Deen, Keriellen Wolfe, Jayshree Zaveri, Karena Zaveri, Josep F. Abril, Roderic Guigó, Michael J. Campbell, Kimmen V. Sjolander, Brian Karlak, Anish Kejariwal, Huaiyu Mi, Betty Lazareva, Thomas Hatton, Apurva Narechania, Karen Diemer, Anushya Muruganujan, Nan Guo, Shinji Sato, Vineet Bafna, Sorin Istrail, Ross Lippert, Russell Schwartz, Brian Walenz, Shibu Yooseph, David Allen, Anand Basu, James Baxendale, Louis Blick, Marcelo Caminha, John Carnes-Stine, Parris Caulk, Yen-Hui Chiang, My Coyne, Carl Dahlke, Anne Deslattes Mays, Maria Dombroski, Michael Donnelly, Dale Ely, Shiva Esparham, Carl Fosler, Harold Gire, Stephen Glanowski, Kenneth Glasser, Anna Glodek, Mark Gorokhov, Ken Graham, Barry Gropman, Michael Harris, Jeremy Heil, Scott Henderson, Jeffrey Hoover, Donald Jennings, Catherine Jordan, James Jordan, John Kasha, Leonid Kagan, Cheryl Kraft, Alexander Levitsky, Mark Lewis, Xiangjun Liu, John Lopez, Daniel Ma, William Majoros, Joe McDaniel, Sean Murphy, Matthew Newman, Trung Nguyen, Ngoc Nguyen, Marc Nodell, Sue Pan, Jim Peck, William Rowe, Robert Sanders, John Scott, Michael Simpson, Thomas Smith, Arlan Sprague, Timothy Stockwell, Russell Turner, Eli Venter, Mei Wang, Meiyuan Wen, David Wu, Mitchell Wu, Ashley Xia, Ali Zandieh, Xiaohong Zhu

Web Fig. 1: Annotation of the Celera Human Genome Assembly


Supplementary Material

Supplemental Table 1.Chromosomal distribution of intronless paralogs.
IPSOURCEIPSOURCEDEFINTION/PROTEIN NAME SOURCE GENBANK INDEX NUMBER
hCP48693hCP37465chr20chr6BCL2-antagonist/killer 1; cell death inhibitor 1 4502363
hCP45781hCP44963chr1chr13RNA polymerase I 16 kDa subunit 7705740
hCP45175hCP40330chr9chr6eukaryotic translation elongation factor 1 alpha 1 4503471
hCP47770hCP40330chr5chr6eukaryotic translation elongation factor 1 alpha 1 4503471
hCP48153hCP40330chr7chr6eukaryotic translation elongation factor 1 alpha 1 4503471
hCP45176hCP40330chr9chr6eukaryotic translation elongation factor 1 alpha 1-like 14 4503473
hCP47771hCP40330chr5chr6eukaryotic translation elongation factor 1 alpha 1-like 14 4503473
hCP48154hCP40330chr7chr6eukaryotic translation elongation factor 1 alpha 1-like 14 4503473
hCP43102hCP45627chr19chr8eukaryotic translation elongation factor 1 delta 4503479
hCP35783hCP52348chr2chr11eukaryotic translation initiation factor 3, subunit 5 4503519
hCP34934hCP39451chr2chr12heterogeneous nuclear ribonucleoprotein A1 4504445
hCP44529hCP39451chr13chr12heterogeneous nuclear ribonucleoprotein A1 4504445
hCP43335hCP44582chr12chr17non-metastatic cells 2, protein (NM23B) 4505409
hCP50486hCP40136chr8chr14proteasome (prosome, macropain) 26S subunit, ATPase, 64506215
hCP42158hCP46912chr12chr2prothymosin, alpha (gene sequence 28) 4506277
hCP45662hCP37866chr2chr18ras homolog gene family, member B; RhoB; 4757764
hCP36390hCP37104chr8chr7H2A histone family, member Z4504255
hCP43150hCP37104chr15chr7H2A histone family, member Z4504255
hCP46278hCP39295chr4chr3NADH dehydrogenase (ubiquinone) 1 beta6041669
hCP36897hCP34863chr5chr12RAP1A, member of RAS oncogene4506413
hCP46034hCP35342chr17chrXadaptor-related protein complex 1, sigma4506957
hCP35830hCP51363chr6chr4alcohol dehydrogenase 5 (class III),4501937
hCP44848hCP40594chr9chr1cell division cycle 204557437
hCP50246hCP49760chr4chr1cell division cycle 42 4757952
hCP48857hCP48263chr15chr7chromobox homolog 3 6005780
hCP46158hCP42461chr13chr11diacylglycerol kinase, zeta (104kD) 4503317
hCP37376hCP49621chr7chr11eukaryotic translation elongation factor 14503481
hCP36628hCP39069chr15chr8fatty acid binding protein 54557581
hCP43166hCP39069chr15chr8fatty acid binding protein 54557581
hCP49159hCP35287chr20chr19ferritin, light polypeptide; hypothetical protein4503797
hCP38447hCP34306chr19chr16glycine cleavage system protein H4758424
hCP51879hCP34306chr1chr16glycine cleavage system protein H4758424
hCP42685hCP43793chr20chr13high-mobility group (nonhistone chromosomal) protein4504425
hCP42865hCP43793chr3chr13high-mobility group (nonhistone chromosomal) protein4504425
hCP49883hCP43793chr20chr13high-mobility group (nonhistone chromosomal) protein4504425
hCP50559hCP43793chr15chr13high-mobility group (nonhistone chromosomal) protein4504425
hCP50984hCP43793chr22chr13high-mobility group (nonhistone chromosomal) protein4504425
hCP37795hCP42486chrXchr10phosphoglycerate mutase 1 (brain)4505753
hCP49866hCP42486chr12chr10phosphoglycerate mutase 1 (brain)4505753
hCP48871hCP33485chr4chr5pituitary tumor-transforming protein 14758980
hCP43053hCP201144chr3chr6pre-B-cell leukemia transcription factor 24505625
hCP47725hCP49987chr5chr14proteasome (prosome, macropain) activator subunit4506237
hCP38333hCP35418chrXchr7ras-related C3 botulinum toxin substrate5902042
hCP40764hCP36922chr3chr6ribosomal protein L10a; neural precursor6325472
hCP45150hCP34348chr17chr16ribosomal protein L13 4506599
hCP42475hCP35262chr10chr19ribosomal protein L13a 6912634
hCP43258hCP35262chr12chr19ribosomal protein L13a 6912634
hCP43889hCP35262chr13chr19ribosomal protein L13a 6912634
hCP48078hCP35262chr12chr19ribosomal protein L13a 6912634
hCP51648hCP35262chr10chr19ribosomal protein L13a 6912634
hCP40196hCP41680chr3chr18ribosomal protein L17 4506617
hCP35655hCP44971chr1chr13ribosomal protein L21 4506611
hCP39305hCP44971chr14chr13ribosomal protein L21 4506611
hCP41351hCP44971chr11chr13ribosomal protein L21 4506611
hCP47660hCP44971chr4chr13ribosomal protein L21 4506611
hCP47833hCP44971chr7chr13ribosomal protein L21 4506611
hCP48726hCP44971chr4chr13ribosomal protein L21 4506611
hCP49814hCP44971chr10chr13ribosomal protein L21 4506611
hCP50215hCP44971chr10chr13ribosomal protein L21 4506611
hCP34220hCP45368chr3chr17ribosomal protein L23a 4506615
hCP43068hCP45090chr12chr17ribosomal protein L26 4506621
hCP38068hCP41270chr11chr11ribosomal protein L27a 4506625
hCP38885hCP41270chr6chr11ribosomal protein L27a 4506625
hCP34480hCP51948chr3chr3ribosomal protein L29 4506629
hCP36437hCP43819chr7chr3ribosomal protein L32 4506635
hCP39494hCP43819chr6chr3ribosomal protein L32 4506635
hCP201498hCP44392chr1chr9ribosomal protein L35 6005860
hCP51162hCP44392chr7chr9ribosomal protein L35 6005860
hCP39467hCP43627chr14chr9ribosomal protein L7a 4506661
hCP41685hCP48439chr15chr4ribosomal protein L9 4506665
hCP201561hCP35286chr12chr19ribosomal protein S11 4506681
hCP42984hCP39006chr11chr6ribosomal protein S12 4506683
hCP42446hCP47279chr1chr16ribosomal protein S15a 4506689
hCP201365hCP52071chr19chr19ribosomal protein S16 4506691
hCP42269hCP52071chr1chr19ribosomal protein S16 4506691
hCP40118hCP42669chr5chr15ribosomal protein S17 4506693
hCP51382hCP42669chr22chr15ribosomal protein S17 4506693
hCP35240hCP42007chr8chr12ribosomal protein S26 4506709
hCP35551hCP42007chr2chr12ribosomal protein S26 4506709
hCP36459hCP42007chrXchr12ribosomal protein S26 4506709
hCP38950hCP42007chr4chr12ribosomal protein S26 4506709
hCP39533hCP42007chr8chr12ribosomal protein S26 4506709
hCP41719hCP42007chr13chr12ribosomal protein S26 4506709
hCP43701hCP42007chr9chr12ribosomal protein S26 4506709
hCP44708hCP42007chr9chr12ribosomal protein S26 4506709
hCP45758hCP42007chr15chr12ribosomal protein S26 4506709
hCP46725hCP42007chr7chr12ribosomal protein S26 4506709
hCP50862hCP42007chr8chr12ribosomal protein S26 4506709
hCP51315hCP42007chr10chr12ribosomal protein S26 4506709
hCP38914hCP34424chr16chr2ribosomal protein S27a 4506713
hCP50884hCP34424chr1chr2ribosomal protein S27a 4506713
hCP50962hCP38574chr22chr19ribosomal protein S9 4506745
hCP37989hCP35777chrXchr3teratocarcinoma-derived growth factor 1 4507425
hCP40506hCP35239chr10chr8tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation4507953
hCP50191hCP35239chr2chr8tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation 4507953

Supplemental Table 2. Examples of paralogs with disease associations on duplicated genome segments. (Both panels should be viewed in a linear format to appreciate the similarities in the diseases caused by the paralogous duplications)
Gene nameProteinChrDiseaseOMIM
complement component 8 hCP407791C8 deficiency, type II 120960
nuclear receptor subfamily 3, group ChCP482594Pseudohypoaldosteronism type I, 600983
complement component 9hCP475125C9 deficiency Immune120940
glucocorticoid receptorhCP475585Cortisol resistance Metabolic138040
hexosaminidase B (beta polypeptide) hCP481235Sandhoff disease, Neurological268800
homeo box A13hCP489157Hand-foot-uterus syndrome 142959
tyrosine hydroxylasehCP3570911Segawa syndrome, 191290
phenylalanine hydroxylasehCP3937412Hyperphenylalaninemia, 261600
connexin 26hCP3817013Deafness, autosomal dominant 121011
coagulation factor VIIhCP4344113Factor VII deficiency 227500
coagulation factor XhCP4344213Factor X deficiency 227600
insulin promoter transcription factor 1hCP4496613Pancreatic agenesis 600733
hexosaminidase AhCP5022815Hex A pseudodeficiency 272800
coagulation factor IXhCP35448XHemophilia B Hematological306900
coagulation factor IXhCP35448XHemophilia B Hematological306900
connexin 32hCP37674XCharcot-Marie-Tooth neuropathy 304040
Duplicated gene nameHomologChrDiseaseOMIM
complement component 9hCP475125C9 deficiency120940
glucocorticoid receptorCP475585Cortisol resistance Metabolic138040
complement component 8 hCP407791C8 deficiency, type II Immune120960
nuclear receptor subfamily 3hCP482594Pseudohypoaldosteronism type I 600983
hexosaminidase AhCP5022815Hex A pseudodeficiency Neurological272800
insulin promoter TF-1hCP4496613Pancreatic agenesis Gastrointestinal600733
phenylalanine hydroxylasehCP3937412Hyperphenylalaninemia, mild Neurological261600
tyrosine hydroxylasehCP3570911Segawa syndrome, recessive Neurological191290
connexin 32hCP3767423Charcot-Marie-Tooth neuropathy 304040
coagulation factor IXhCP3544823Hemophilia B Hematological306900
coagulation factor IXhCP3544823Hemophilia B Hematological306900
homeo box A13hCP489157Hand-foot-uterus syndrome Renal142959
hexosaminidase B hCP481235Sandhoff disease, infantile 268800
coagulation factor VIIhCP4344113Factor VII deficiency 227500
coagulation factor XhCP4344213Factor X deficiency 227600
connexin 26hCP3817013Deafness, autosomal dominant 3 121011


Supplemental Figure 2A. Karyotype analysis of donors.


Medium version | Full size version


Supplemental Figure 2B.


Medium version | Full size version


Supplemental Figure 2C.


Medium version | Full size version


Supplemental Figure 2D.


Medium version | Full size version


Supplemental Figure 2E.


Medium version | Full size version


Supplemental Figure 3 -Chromosome 1. Comparison of the CSA and the PFP assembly. To generate the figure, Celera fragment sequences were mapped onto each assembly. The PFP assembly is indicated in the upper third of each panel; the Celera assembly is indicated in the lower third. In the center of the panel, green lines show Celera sequences that are in the same order and orientation in both assemblies and form the longest consistently ordered run of sequences. Yellow lines indicate sequence blocks that are in the same orientation, but out of order. Red lines indicate sequence blocks that are not in the same orientation. For clarity, in the latter two cases, lines are only drawn between segments of matching sequence that are at least 50 kbp long. The top and bottom thirds of each panel show the extent of Celera mate-pair violations (red, misoriented; yellow, incorrect distance between the mates) for each assembly grouped by library size. (Mate pairs that are within the correct distance, as expected from the mean library insert size, are omitted from the figure for clarity.) Predicted breakpoints, corresponding to stacks of violated mate pairs of the same type, are shown as blue ticks on each assembly axis. Runs of more than 10,000 Ns are shown as cyan bars. Plots for each of the 24 chromosomes can be seen as separate files.


Medium version | Full size version


Supplemental Figure 3 -Chromosome 2.


Medium version | Full size version


Supplemental Figure 3 -Chromosome 3.


Medium version | Full size version


Supplemental Figure 3 -Chromosome 4.


Medium version | Full size version


Supplemental Figure 3 -Chromosome 5.


Medium version | Full size version


Supplemental Figure 3 -Chromosome 6.


Medium version | Full size version


Supplemental Figure 3 -Chromosome 7.


Medium version | Full size version


Supplemental Figure 3 -Chromosome 8.


Medium version | Full size version


Supplemental Figure 3 -Chromosome 9.


Medium version | Full size version


Supplemental Figure 3 -Chromosome 10.


Medium version | Full size version


Supplemental Figure 3 -Chromosome 11.


Medium version | Full size version


Supplemental Figure 3 -Chromosome 12.


Medium version | Full size version


Supplemental Figure 3 -Chromosome 13.


Medium version | Full size version


Supplemental Figure 3 -Chromosome 14.


Medium version | Full size version


Supplemental Figure 3 -Chromosome 15.


Medium version | Full size version


Supplemental Figure 3 -Chromosome 16.


Medium version | Full size version


Supplemental Figure 3 -Chromosome 17.


Medium version | Full size version


Supplemental Figure 3 -Chromosome 18.


Medium version | Full size version


Supplemental Figure 3 -Chromosome 19.


Medium version | Full size version


Supplemental Figure 3 -Chromosome 20.


Medium version | Full size version


Supplemental Figure 3 -Chromosome 21.


Medium version | Full size version


Supplemental Figure 3 -Chromosome 22.


Medium version | Full size version


Supplemental Figure 3 -Chromosome 23.


Medium version | Full size version


Supplemental Figure 3 -Chromosome 24.


Medium version | Full size version