Policy ForumGenetics

No Longer De-Identified

See allHide authors and affiliations

Science  21 Apr 2006:
Vol. 312, Issue 5772, pp. 370-371
DOI: 10.1126/science.1125339

As DNA sequencing becomes more afford able and less time-consuming, scientists are adding DNA banking and analysis to research protocols, resulting in new disease-specific DNA databases. A major ethical and policy question will be whether and how much information about a particular individual's DNA sequence ought to be publicly accessible.

Without privacy protection, public trust will be compromised, and the scientific and medical potential of the technology will not be realized. However, scientific utility grows with increased access to sequenced DNA. At present, ethical concerns about the privacy of subjects whose sequenced DNA is publicly released have largely been addressed by ensuring that the data are “deidentified” and that confidentiality is maintained (12). There is a large literature on the various data-management models and computer algorithms that can be used to provide access to genetic data while purportedly protecting privacy (36). We believe that minimizing risks to subjects through new developments in data and database structures is crucial and should continue to be explored, but that additional safeguards are required.

Scientists have been aware for years of the possibility that coded or “anonymized” sequenced DNA may be more readily linked to an individual as genetic databases proliferate (1, 3, 7, 8). In 2004, Lin and colleagues demonstrated that an individual can be uniquely identified with access to just 75 single-nucleotide polymorphisms (SNPs) from that person (9). Genome-wide association studies routinely use more than 100,000 SNPs to genotype individuals. Although individual identification from the public release of these data would currently require a reference sample, the privacy risk associated with public data release is fueled by the extraordinary pace of technological developments and the rapid proliferation of electronic databases. If protective measures are not adopted now, public trust will be compromised, and genomic research will suffer.

Genetic sequencing typically involves three phases of investigation: (i) subject recruitment and sample collection (primary clinical investigation), (ii) DNA sequencing and data broadcast (genomic sequencing study), and (iii) data retrieval and analysis (secondary-use research) (see figure, above). Institutional Review Board (IRB) oversight and informed consent are unambiguously required for the first phase of sample collection, because it clearly involves human subjects research. There are also detailed consent requirements for some large-scale sequencing studies, such as the HapMap project, that cover the second and third phases. However, it is our experience that, in general, the consent process for most disease-specific genetic research is not protective for these phases and that the privacy risks associated with public data-sharing are not stated. Consent for these studies is highly variable, and in most cases, subjects are simply told that genetic analysis will be performed, without any explanation of what that means or with whom the resulting data will be shared. Further, participants are typically not offered the opportunity to participate in the research if they do not want their data publicly broadcast (10).

From subject to data analysis.

A typical medical genomic sequencing study.


In the United States, there are now two federal regulations that could potentially apply to such studies—the Common Rule, which regulates all federally funded research and sets forth the federal policy for the protection of human research subjects (11) and the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule, which restricts certain unauthorized uses and disclosures of patients' identifiable protected health information by covered entities (12). Neither one specifically mandates IRB oversight or subject consent for the public release of sequenced data. The Common Rule would not apply if genomic sequencing studies were not considered to constitute human subjects research. Human subjects research is defined under the Common Rule as research involving “an individual about whom the investigator … obtains data through intervention or interaction with the individual, or identifiable private information” (11). According to a guidance document published in 2004 by the Office for Human Research Protections (OHRP), because the data are collected and coded by the primary clinical investigator, and the sequencing investigator is prohibited from deciphering the code, the data are not considered identifiable, and the sequencing study is not subject to federal regulation (13). Brief IRB review may be necessary to confirm that the research does not involve human subjects, but once that determination is made, no IRB oversight or informed consent is mandated.

Similarly, HIPAA does not provide unambiguous protection because it has not been clear that genomic data constitute readily identifiable protected health information (1416), and even if they do, institutions vary in whether the scientists who conduct sequencing studies are considered “covered entities” who must comply with HIPAA. Institutions may choose to impose more stringent requirements on investigators, but there is no federal mandate to do so now.

To resolve the tension between privacy protection and access to DNA data needed for progress in medical research, Lin and colleagues recommend a tiered data-access approach, with sensitive data layers masked from full public view (9). This approach has the advantage of minimizing privacy risks without unduly sacrificing progress, but suffers from a lack of flexibility to respond to individual preferences and judgments. It also threatens to slow the pace of research, because current policy calls for sequenced data to be released publicly within 24 hours of their generation (17), whereas obtaining approval to access restricted databases could take weeks, if not months. We believe that restricted access should be offered as an option to subjects, but that it not be adopted as a general approach for all genomic sequencing studies.

Kohane and Altman propose that researchers specifically seek out volunteers who are most willing to have their health data publicly shared and that these subjects have explicit control over who has access to their data (18). Relying only on such “information altruists” to participate in genetic studies would potentially create subject bias, influencing the ability of investigators to identify disease alleles relevant to the population at large.

We propose that general safeguards be put in place to encourage understanding of and trust in genomic sequencing studies. As an essential first step, genomic sequencing studies should be recognized as human subjects research and brought unambiguously under the protection of existing federal regulation. This would have the effect of mandating informed consent for public release of potentially identifiable sequenced data and requiring IRB oversight to ensure that risks to subjects are minimized and that informed consent, or a waiver of the requirement to obtain informed consent, is obtained (11).

Specifically, we recommend a stratified consent process in which all subjects who participate in future genomic sequencing studies are fully informed about how their DNA data may be broadcast and have the authority to decide with whom they want their data shared (19). A number of options could be presented; we propose three levels of confidentiality (see table above). A more rigorous assessment of the risks and benefits associated with each of these options will be necessary.

View this table:

Some of the practical challenges include providing adequate disclosure and education about a complex risk calculus, ensuring subject comprehension, coordinating a system of restricted access, and managing a complex database that accounts for subjects' informed disclosure preferences. Although it may represent a substantial departure from the traditional model of informed consent in research, stratified consent procedures are commonly used in clinical medicine, where patients frequently make informed choices about treatment options on the basis of individual values and judgments. Stratified consent procedures are also being considered in other areas of research where subjects have to make complicated decisions, such as what type of future research they are willing to participate in and to what extent they want research-related incidental findings reported back to them.

There may be concern about the added burdens on IRBs. McWilliams and colleagues have shown that there is currently considerable variability among local IRBs, particularly in how they deal with DNA banking, risk-benefit analysis, and consent for genetic research (20). They recommend centralized IRB oversight for multicenter research.

Although some might fear a negative impact on subject participation in genomic research, stratified consent merely restricts the ability to release sequenced data publicly. If anything, it may boost enrollment by providing an opportunity for even the most risk-averse members of society to participate in research, while ensuring optimal privacy protection.

Empirical study of these and other challenges associated with the implementation of a stratified consent model in research is essential for future policy development. In addition, federal legislation prohibiting genetic discrimination would significantly alter the risk-benefit calculus associated with public data release and should be enacted without delay (21). Although it would not obviate the ethical obligation to obtain subject consent, it may foster public trust and positively affect the willingness of subjects to participate in genomic research and to share their genetic data publicly.

References and Notes

  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
View Abstract

Navigate This Article