How the Golden State Killer’s DNA Search Is Like the Cambridge Analytica Scandal

Your data is never just yours.

Joseph DeAngelo, the suspected Golden State Killer, appears in court for his arraignment on April 27 in Sacramento, California.
Joseph DeAngelo, the suspected Golden State Killer, appears in court for his arraignment on April 27 in Sacramento, California.
Justin Sullivan/Getty Images

The DNA matching site that helped California authorities track down the Golden State Killer deserves credit for one thing, at least: Its privacy notice is more honest than most.

“If you require absolute privacy and security, we must ask that you do not upload your data to GEDmatch,” says, an open-source database that lets people upload their DNA data and conduct genealogical research. “If you already have it here, please delete it.” GEDmatch’s notice also warns that your information will be shared with other users of the site, and that measures to protect it from third parties “will never be perfect.”

Warning people in clear language about what data they’re giving up when they sign up for an online service is important. It’s something that most big internet platforms fail to do, opting instead to hide their privacy policies in layers of impenetrable legalese that they know almost no one will read. That’s a problem that U.S. legislators finally seemed concerned about—and potentially even interested in fixing—when they grilled Facebook CEO Mark Zuckerberg last month over the Cambridge Analytica data leak.

But law enforcement’s use of GEDMatch in the Golden State Killer search has something else in common with Cambridge Analytica’s harvesting of Facebook user data. And it reminds us of a second major problem with online privacy that might prove even harder to solve: When you sign up for an online service, it’s rarely just your own data that you’re handing over. In many cases, you’re also giving up the goods on people you know—often, without their knowledge, let alone consent.

Admirable as GEDmatch’s privacy notice may be, the man now suspected of being the Golden State Killer probably never saw it, because there’s no evidence that Joseph DeAngelo ever used GEDmatch. Rather, investigators uploaded a DNA sample from one of the killer’s crime scenes to the database and found a partial match with the DNA of some distant relatives of DeAngelo’s.

Those distant relatives may not have even known who DeAngelo was. They may never have considered that uploading their own DNA to GEDmatch would make it possible for police to identify a member of their (very) extended family as a suspect in a crime. But that’s exactly what happened, because the whole point of DNA analysis is that relatives share chunks of their genomes in common. So whenever you upload your DNA or allow it to be uploaded by a service like Ancestry or 23andMe, you’re essentially uploading identifiable portions of your whole family’s DNA.

The analogy to Cambridge Analytica is that most of the people whose Facebook data was swept up by researcher Aleksandr Kogan had not actually signed up for Kogan’s app, called This Is Your Digital Life. Like GEDmatch, Kogan’s app directly collected data from a relatively small number of people, who knowingly, if cavalierly, signed up for it and agreed to its terms of service. But those terms gave it access to the data of those users’ Facebook friends, too—even if those friends had never heard of it. That’s how Kogan collected data on tens of millions of Facebook users, even though only about 270,000 reportedly used his app.

Facebook has since clamped down on third-party app developers’ access to the data of people who don’t sign up for their apps. But collection of data on users’ friends and real-world contacts still happens on a mass scale on Facebook and many other platforms.

For instance, the “find friends” feature on many social apps asks you to upload data from your phone’s contacts list. The app then uses those phone numbers, email addresses, etc. to find people on their network who are also in your phone. But apps can also use this information to build so-called shadow profiles of people based on information that they themselves may or may not have chosen to upload. This practice gained attention in 2013 when Facebook accidentally exposed the personal contact information of users who had never given Facebook that information. It turned out Facebook had collected it from their friends and stored it without their knowledge.

No one’s going to shed a tear for the privacy of the suspected Golden State Killer, assuming authorities have the right person this time. And the case dramatically illustrates the potential benefits of publicly available DNA information.

The problem is that correctly identifying a serial rapist and killer isn’t the only way this data can be used. For just one example, it can also be used to mistakenly identify someone as a suspect. Investigators looking for the Golden State Killer did just that in 2017, according to the Associated Press, when they obtained a court order to draw a DNA sample from a 73-year-old Oregon man in a nursing home. He turned out not to be a match, and experts in genealogy say such false positives are extremely common. And, of course, there’s always the possibility of DNA matching being used for more nefarious purposes by hackers and others working outside the legal system.

We’ve long known that the privacy policy system for online services was broken, in ways that encouraged people to give up control of their data without knowing how it could be used or misused. Requiring clearer notifications would help with that. But as U.S. legislators prepare to seriously tackle online privacy, Cambridge Analytica and GEDmatch are a stark reminder that the problems go deeper than just better informing users of what they’re giving up about themselves. They also need to understand what they’re telling online service providers about their friends, family, and contacts—and what all those people might be telling online service providers about them.