Essays on Science and SocietyGenome-Sequencing Anniversary

The Human Genome at 10: Successes and Challenges

See allHide authors and affiliations

Science  04 Feb 2011:
Vol. 331, Issue 6017, pp. 546-547
DOI: 10.1126/science.1202812

Fifteen years ago, my team published the first complete genome sequence of a living species, that of Haemophilus influenzae, using our newly developed technique of whole-genome shotgun sequencing. This genome sequence was only 1.8 million base pairs in length and required 4 months of sequencing to produce. Five years later, as a result of dramatic changes in automation and massively parallel DNA sequencing, it was possible to sequence the human genome at 3 billion base pairs in only 9 months, a >1000-fold improvement. We published the first individual, diploid human genome sequence in 2007, and now, with single DNA sequencing instruments producing 100 million base pairs per day, individual genome sequences are becoming commonplace.


Most of the new generations of sequencing technologies, although faster and considerably cheaper, produce much shorter sequences (50 to 200 bp) from smaller DNA fragments than did the strategies to produce the first human genome sequences. Long DNA sequence reads (800 bp) from the ends of long DNA clones (>100 kb) provide scaffolding and extensive DNA assembly by linking together subassemblies. The short sequences can only produce small clusters; these make sequence assemblies of substantial length improbable. Because of these technical issues, some investigators only layer their short sequences against a “reference” and do not try to assemble a sequence, which makes it problematic to define scientific standards for a “genome sequence.”

As important as sequence quality standards are, a much larger issue rests with the current state of our ability or inability to interpret human genome sequence. Among the many improvements that are needed in human genome research, the most important is the collection of human phenotypes (according to agreed-upon parameters and standards), in conjunction with tens of thousands of accurate human genome sequences. Such data sets will be the foundation for accurately predicting clinical outcomes from DNA sequence information. This is true not only for diagnosis but also in foreseeing and avoiding drug side effects, as well as monitoring stem cell genome mutations and/or variations before cell therapies.

Although many “genome” companies and researchers are promoting personal genomics for medicine and/or life choices, regulation of data quality and standards is lacking, which has made deceptive marketing a reality in some instances. We have sequence and genetic data quality that is suitable for some scientific analyses but no standards adequate for clinical practice or even for informing individuals of results that exist. We have come a long way in genomics; however, for genome sequencing to reach its full potential we still have a long way to go.


Navigate This Article