Identity inference of genomic data using long-range familial searches

See allHide authors and affiliations

Science  09 Nov 2018:
Vol. 362, Issue 6415, pp. 690-694
DOI: 10.1126/science.aau4832

eLetters is an online forum for ongoing peer review. Submission of eLetters are open to all. eLetters are not edited, proofread, or indexed.  Please read our Terms of Service before submitting your own eLetter.

Compose eLetter

Plain text

  • Plain text
    No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Author Information
First or given name, e.g. 'Peter'.
Your last, or family, name, e.g. 'MacMoody'.
Your email address, e.g.
Your role and/or occupation, e.g. 'Orthopedic Surgeon'.
Your organization or institution (if applicable), e.g. 'Royal Free Hospital'.
Statement of Competing Interests

This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.

Enter the characters shown in the image.

Vertical Tabs

  • RE: Identity inference of genomic data using long-range familial searches, Erlich et al.
    • Ellen McRae Greytak, Director of Bioinformatics, Parabon NanoLabs, Inc.
    • Other Contributors:
      • CeCe Moore, Lead Genetic Genealogist, Parabon NanoLabs, Inc.
      • Steven L. Armentrout, CEO, Parabon NanoLabs, Inc.

    Y. Erlich et al. (1) recently demonstrated the power of genetic genealogy to find distant relatives. However, there are several aspects of the authors’ conclusions (and subsequent press coverage) that are misleading.

    While not explicitly stated in the paper, the authors’ press release states that “over half of US individuals – approximately 60% in the case of individuals of European descent – could be identified using open genetic genealogy databases.” This sensationalized announcement significantly overstates what can be concluded from the results of the paper. We concur that ~60% of US individuals of European descent can expect to find a third cousin or closer match in genetic genealogy databases of ~1M subjects, as we have reported similar numbers based on our law enforcement casework (2). However, determining an individual’s identity from such a match is extraordinarily complex, and it is inappropriate to equate the probability of a match with the probability of identification.

    Unlike in simulations, in reality, a third cousin match does not come with a list of all 855 possible relatives from which to filter. Every entry in a family tree must be identified through painstaking research of public records, which could take thousands of hours of research (3). In many cases, generating a complete list of relatives may be difficult or impossible due to immigration, misattributed paternity, unrecorded adoption, unknown parentage (highly overrepresented i...

    Show More
    Competing Interests: Parabon NanoLabs, Inc. provides genetic genealogy services to law enforcement.