Identity inference of genomic data using long-range familial searches

Science  09 Nov 2018:
Vol. 362, Issue 6415, pp. 690-694
DOI: 10.1126/science.aau4832
  • Fig. 1 The performance of long-range familial searches for various database sizes.

    (A) The probability of finding at least one relative for various IBD thresholds (top) with 1.28 million searches of DTC-tested individuals (red) and 30 random GEDmatch searches (gray). Light gray shading indicates the 95% CI for the GEDmatch estimates. The dashed line indicates the probability of a surname inference from Y chromosome data (17). The bottom panel shows the 95% CIs (circles) and average total IBD length (squares) for a first cousin once removed (1C1R) to a fourth cousin once removed (4C1R) (20). (B) A population-genetic theoretical model for the probability of finding relatives up to a certain type of cousinship as a function of the database coverage of the population. 1C to 4C indicate first to fourth cousins.

  • Fig. 2 Tracing a person of interest from a distant match using demographic identifiers.

    (A) The possible relatives of a match (green) in a database. Each square represents a potential degree of relatedness. The range corresponds to the 5th to 95th percentile of shared IBD in centimorgans from (16). Red indicates relatives that could fit a bona fide 3C match (~100 cM). The average number of relatives is indicated in the top-left corner of each square on the basis of a fertility rate of 2.5 children per couple. Only genealogical relationships that are within 100-cM range include the average number of relatives. Nie/Nep, Niece/Nephew; G, Great; G2, Great-great; G3, Great-great-great; A/U, Aunt/Uncle. (B) An example of the geographical dispersion of third cousins or second cousins once removed around the matched relative. Every circle indicates 100 km. (C and D) The distribution of the expected age differences between matches and their potential relatives with a genetic distance of third cousins. The main text reports a conservative scenario, in which the age estimator of the target is in the highest bin of each histogram (red arrow). The age distribution is shown at a 10-year resolution (C) and at a 1-year resolution (D). (E) The entire pipeline of using demographic identifiers along with a long-range familial match to identify a U.S. person (blue type indicates the average number of people after incorporating each piece of information.).

  • Fig. 3 Tracing a 1000Genomes sample using a long-range familial search.

    The CEU pedigree is shown in black. To respect the privacy of the family, we omitted the sample identifiers and the exact pedigree structure. A GEDmatch search of the person of interest (black circle) returned two males (squares with gray dots) with a total IBD sharing of 180 and 171 cM to the target, respectively, and 62 cM between themselves. Using public genealogical records, we identified the ancestral couple (asterisk) of the matches and the person of interest.

  • Table 1 Public cases of long-range familial cases.

    A “−” indicates data not available.

    CaseAnnouncementSolved byClosest matchComments
    Buckskin Girl9 April 2018DNA Doe ProjectFirst cousin once removed
    Golden State Killer24 April 2018Barbara Rae-VenterThird cousin
    Lyle Stevik8 May 2018DNA Doe ProjectSecond cousinInbreeding complicated
    the estimation of the match.
    William Earl Talbott II21 May 2018ParabonHalf–first cousin once removedSecond cousins were
    identified as well.
    Joseph Newton Chandler III21 June 2018DNA Doe ProjectSecond cousin once removed
    Gary Hartman22 June 2018ParabonHalf–first cousinGenealogists were able to
    overcome a nonpaternity event in
    the family tree of the suspect.
    Raymond “DJ Freez” Rowe25 June 2018Parabon
    James Otto Earhart26 June 2018ParabonSecond cousin
    John D. Miller15 July 2018Parabon
    Matthew Dusseault and Tyler Grenon28 July 2018Parabon
    Spencer Glen Monnett29 July 2018ParabonThis was an active case
    for a crime that occurred
    in April 2018.
    Darold Wayne Bowden23 August 2018Parabon
    Michael F. Henslick29 August 2018Parabon

Supplementary Materials

  • Identity inference of genomic data using long-range familial searches

    Yaniv Erlich, Tal Shor, Itsik Pe’er, Shai Carmi

    Materials/Methods, Supplementary Text, Tables, Figures, and/or References

    Download Supplement
    • Materials and Methods
    • Figs. S1 to S6
    • Tables S1 to S4
    • References

