by Raehoon Jeong
figures by Jovana Andrejevic

Like fingerprints, each person’s DNA, or genetic code, is unique. Therefore, DNA evidence from traces like hair or blood found at crime scenes can be used to exonerate or incriminate suspects. However, DNA evidence is generally only helpful when it matches the DNA of a suspect or of someone in the FBI’s criminal database. Oftentimes, this is not the case, as with the Golden State Killer investigation. The success story behind his arrest demonstrates how improved DNA profiling technologies and an abundance of genetic data can be leveraged to capture violent criminals.

The Golden State Killer

The Golden State Killer is a rapist and a serial killer who operated between the late 1970s and the mid 1980s. The investigation was at a stalemate for decades, but in the early 1990s, an investigator named Paul Holes decided to utilize the emerging DNA testing technology to solve the case. However, the Golden State Killer’s DNA evidence did not match any in the criminal database, so identifying a suspect proved impossible and the case hit a dead end.

In the decades since the Golden State Killer case was shelved, genetic profiling technologies have taken significant leaps forward. For example, it is now commonplace for people to obtain a glimpse of their own genetic profiles using direct-to-consumer (DTC) genetic testing services, like 23andMe and AncestryDNA. Many people use this information to find unknown relatives by comparing their genetic profiles to those contained in large online databases, like GEDmatch. Brilliantly, Detective Holes, had the idea to create a profile using the Golden State Killer’s DNA and search for the killer’s relatives in the GEDmatch database, where he actually found a distant relative who shared a great-great-great-grandparent with the serial killer (Figure 1).

Figure 1. The genetic basis of genealogy analyses. This is an example of a family tree where each individual has two gray bars representing one pair out of the twenty-three pairs of the chromosome. The horizontal lines represent marriage, and the vertical ones, offspring. During one generation, each pair of chromosomes is scrambled before one from each parent is passed down to the offspring. An example is portrayed as the red segments, which are portions of the grandfather’s chromosome that were passed down to his children and grandchildren. Tools like GEDmatch are able to find long-lost relatives by computing the degree of shared genetic code between users’ profile.

Afterwards, Detective Holes spent about four months reconstructing a family tree of five generations in order to track down the criminal. Finally, the search narrowed to a retired police officer named Joseph James DeAngelo. To find definitive proof, they acquired DNA samples near his residence and compared it to the evidence from the case. Lo and behold, the DNA profile matched, and on April 24th, 2018, the suspect was arrested to be put on trial. Following the footsteps of this victorious arrest, investigators across the nation began to apply the same method, and as of July 2018, at least five additional cold cases were solved.

Why was the public genealogy database so effective?

Despite the existence of DNA profiling technology since 1984, the investigation of the Golden State Killer case remained stagnant until 2018 because of limitations in the ways that DNA profiling was traditionally performed and used. For example, although the FBI’s criminal database contains over 17 million entries, it does not contain entries from perpetrators who were never previously arrested, like the Golden State Killer. As such, even though Detective Holes had the killer’s DNA profile on file back in 1994, the year he began to investigate the case, he was left without a suspect due to gaps in the criminal database.

Another, subtler, reason that DNA profiling initially failed to crack this case is that traditional forensic profiling does not survey the entirety of a person’s genetic code, but rather takes small snapshots of a person’s DNA. Specifically, the FBI’s standard was to probe twenty designated sites on the genome, called 20 CODIS Core Loci, which were chosen due to their high variability between individuals. These sites contain short tandem repeats, or STRs, which are short DNA sequences repeated up to hundreds of times in a row (Figure 2). Different people have different numbers of repeats, and it is highly unlikely for two unrelated individuals to have matching number of repeats at all twenty sites.

Figure 2. Traditional forensic profiling. 20 CODIS Core Loci are 20 sites in the genome that contain short tandem repeats (STR), which are short pieces of the genetic code in repeats. Different individuals have variable STR lengths, but two DNA samples from the same person would have the same number of repeats at all these sites.

STR analysis is very effective at determining if two DNA samples are from the same individual while being significantly more affordable than reading the entirety of the genome. It simply involves selectively analyzing repeated sequences and differentiating the length of the repeats by their mass. However, it is not extensive enough to be informative in a genealogy search for distant relatives. This is because the shared DNA segments (like the red segments in Figure 1) between two distantly related individuals might not happen to span these 20 locations in the genome, and thus their relationship would go undetected even if both had entered DNA profiles into the same database.

In contrast to STR analysis, DTC genetic tests take a much more comprehensive look at the genome. To achieve this increased genomic coverage, companies like 23andMe use a small chip called a DNA microarray that surveys 600,000 sites in the genome, rather than just 20. These sites are called single nucleotide polymorphisms or SNPs, and like STRs, they tend to vary from person to person. In order to survey these 600,000 sites, the DNA microarray chips have hundreds of thousands of tiny wells containing short DNA molecules, called probes. These probes are able to physically bind to specific SNPs in a person’s DNA. When a SNP binds to a corresponding probe, it gives off a detectable flash of light, thereby making it possible to track and record which SNPs are present in a particular person’s DNA. When this process occurs for 600,000 different sites, the result is a genetic profile that is unequivocally unique for each person.

Since a genealogy search is performed by determining how long the shared DNA segments are between genetic profiles, it requires a profile with sufficiently thorough coverage of the genome, like the one generated by a microarray. As such, the limited STR analysis obtained by Detective Holes back in 1994 failed to yield informative results in the genealogy search, but re-analysis of the DNA evidence with microarray technology allowed him to successfully search the GEDmatch database. Further, because of the recent popularity of DTC genetic tests and genealogy search engines, GEDmatch contained an estimated million genetic profiles not found in the FBI’s database. In combination, the microarray technology and the increased data availability allowed Holes to catch the killer, even when previous efforts for more than 20 years had failed.

New technology brings new concerns

Although using a public genealogy database was a brilliant idea that led to the capture of a heinous criminal, some concerns over genetic privacy arose amongst enthusiasts and genealogists.

One concern is that the genetic information people upload for the purpose of finding relatives can be accessed without warrants for criminal investigations. Currently, the database is only used to transiently search for partial genetic matches. In the foreseeable future, however, investigators with warrants may be able to access the entire database and hold on to innocent people’s genetic information.

Secondly, as this case demonstrated, a person’s genetic data contains a lot of information on their relatives as well. Although websites like GEDmatch warn people of the possibility of having their uploaded data used in the investigation of violent crimes, their relatives have part of their genetic information accessible to the public without their knowledge.

With such potential ethical issues surrounding this approach, there needs to be a discussion about how to keep this technology in check while also allowing criminal investigators to utilize it to bring violent criminals to justice.

Raehoon Jeong is a second-year Ph.D. student in the Department of Biomedical Informatics at Harvard University

For more information:

  • To hear about the investigation of the Golden State Killer case from Paul Holes himself, listen to this podcast episode from The Daily
  • To learn more about privacy and ethical issues surrounding this practice, see this article from The New York Times
  • To learn more about crimes the Golden State Killer committed and a more detailed account of how he was captured, read this article from The Washington Post

Leave a Reply

Your email address will not be published. Required fields are marked *