by Julian Segert
figures by Aparna Nathan

In Mountain View, California, near the headquarters of Facebook and Google, lies 23andMe, a company that set out to make genetic testing approachable and affordable for the general public. The company started with the goal of providing risk assessments for genetic diseases, but has recently gained more popularity by offering insights into geographical ancestry. 23andMe is unique among other tech companies because it collects physical customer samples (saliva) in order to provide information that purely digital services cannot. This new age of collecting and sharing genetic data has contributed to our understanding and treatment of genetic diseases, but it also brings with it new concerns for privacy and security.

How do consumer genetic tests work?

Every cell in your body contains your complete genetic code, or genome, which comprises all of your DNA and thus all of your genes. Some portions of the genome are essential for life, so they are shared between all people. However, everybody also carries a number of “genetic variants” in areas of the genome that are not directly essential for life; these variants make people unique and can tell us a lot about a person’s traits and disease propensity.

For years, specially trained genetic counselors have been able to perform tests to look for disease-related variants and provide emotional support for patients and families if there is bad news. Recently, though, more people have been turning to commercial kits like 23andMe or Ancestry DNA. These kits are referred to as “direct to consumer” (DTC) because they are advertised and sold directly to consumers, as opposed to being requested by a doctor. These kits tend to be relatively affordable at about $100-200, in part because they only look at a fraction of a percent of the genome. For comparison, it would cost about $1000 to read all of a customer’s genetic code. The other reason companies can afford to sell kits for so cheap is that they can sell their customer’s genetic data to pharmaceutical companies for a profit. 23andMe, for example, has a contract to license customer data to the biotech giant Genentech for their research efforts into Parkinson’s disease.

What exactly do companies do with your data?

Companies share customer data only on an opt-in basis, and 80% of 23andMe customers agree to participate. 23andMe data have been useful in determining the disease risks associated with certain genetic variants; this utility stems from the shear popularity of this test and the wealth of data that it has generated. The breadth of genetic data becomes important when we consider that most genetic variants do not actually cause disease. So, if someone with Alzheimer’s, for example, had their DNA tested, it would be impossible to tell which variants are dangerous and which variants are unrelated. However, if you tested many people with and without Alzheimer’s and looked for variants that are shared between people with the disease and largely absent among people without the disease, you would find variants that are likely disease-related.

In practice, these kinds of genetic disease studies often require hundreds of thousands of people and end up costing on the order of $250 million. 23andMe represents a valuable resource because it has already done the work of collecting genetic information and user-reported health information. This wealth of data has been particularly beneficial for identifying the genetic underpinnings of diseases that have eluded genetic analyses on smaller numbers of people, particularly psychiatric conditions. 23andMe user data has recently been leveraged to investigate the genetics of attention deficit/hyperactivity disorder (ADHD), neuroticism, and depression.

Where else can genetic data end up?

Testing companies share data only with explicit consent and under conditions of anonymity. The same cannot be said for a number of public online services. Websites like GEDmatch allow anyone to upload genetic information to search for relatives. GEDmatch privacy policy is starkly different from 23andMe’s in that it is publicly searchable and includes real names. GEDmatch rose to prominence when it was used by law enforcement to solve the decades-old Golden State Killer cold case. Since that case was cracked in April, a total of 25 cases have been solved using public genealogy databases that can be queried without a warrant, a practice that is actively encouraged by GEDmatch.

Although sharing one’s own data is opt-in, there are no systems in place to protect the genetic privacy of relatives. Whenever a person makes the choice to publicize their own data, they implicitly publicize data pertaining to their relatives, as related individuals share portions of their genetic code. Data from relatives as far removed as third cousins can be used to identify individuals (figure 1). As of right now, 60% of Americans with Northern European heritage can be identified by data a relative uploaded to a public database. This number is expected to rise to over 90% within a few years. Although far from being implemented, there have been calls for national forensic DNA databases in the US that include data for all citizens regardless of criminal history.

Figure 1: A hypothetical family tree showing the reach of public genetic databases like GEDmatch. In this example, one of your third cousins, who shares the same great-grandparent as you, posts their test results. This information makes you identifiable.

In addition to the ability of genetic testing to infringe upon the privacy of related individuals, there is also a valid concern that public data may reveal a hidden disease risk that also pertains to a relative who would rather not know. There are many people who know that they may carry a genetic disease and choose not to be tested so that they can continue to live their lives without being defined by a diagnosis. Some also fear that their insurance provider will increase rates if they catch wind of a disease risk (in the US, this is illegal for health insurance providers under the Genetic Information Nondiscrimination Act, but this does not apply to life or disability insurance). This “right to not know” may become threatened when a close relative shares DNA test results that implicate a disease.

How effective is de-identification of genetic data?

All of the data sold by genetic testing services has been de-identified to remove names, but it is not clear if this is entirely effective because genetic data is intrinsically identifying. This is because each person’s genome is unique and may be traced back to them, similar to a thumb print. Moreover, studies generally require some amount of health and demographic information in order to make use of the genetic data (e.g. they have to know if you have the disease in study and what environmental factors you’ve been exposed to). In one study, researchers were able to infer the last names of anonymous subjects using only a small part of their genetic data along with information like date of birth and home state. This technique is not powerful enough to re-identify the majority of people, but it is a startling demonstration of the power of genetic inference.  Re-identification is more likely to work when the subject has a very rare disease because this narrows down the possible number of people. This could mean that fewer rare disease patients volunteer their data out of fear for being identified, which would be detrimental to research efforts.

The future of consumer testing

A few companies are striving to change the paradigm by giving participants sole ownership of their own data and letting them sell it anonymously, all with the prospect of financial gain. Nebula Genomics, founded by Harvard geneticist George Church, gives customers a complete readout of their genome as opposed to the small sample read by 23andMe. It also gives customers sole ownership of their own data and the ability to anonymously share it with companies of their choosing using a secure data transfer network (figure 2). Nebula aims to provide data at no cost to customers under the condition they provide some information about their health. Similarly, the genomic database LunaDNA offers shares of the company to participants who anonymously license personal data through their network. Participants currently receive about $3.50 for a 23andMe style test, while a complete genetic readout, still barely under $1000 to obtain, is worth about $21. The amount of money at stake may seem disappointingly small, but ownership of genetic data also gives consumers the power to decide exactly who can and can’t use their data. This encourages companies to be more responsible with customer data, because people will be less likely to volunteer their data to a company with a history of data breaches and more likely to send their data to companies doing meaningful research.

Figure 2: DTC genetic tests vs. consumer-owned genetic data. This flowchart shows the differences between direct to consumer DNA test services like 23andMe (top) and services like Nebula that allow customers to own and license their own data (bottom).

Direct to consumer genetic testing is here to stay. 23andMe alone has tested over 5 million people and shows no signs of slowing down. They are gaining FDA approval to test for a growing number of genetic diseases and are leveraging their data to uncover the genetics of conditions ranging from mundane to devastating. In the end, it should be up to the people with genetic information at stake to decide where to place their trust. This is a critical time when policies surrounding the sharing and usage of genetic information are going to be decided for years to come. As participants or relatives of participants in this new age of genetic information, we have the right and the responsibility to make sure personal data are used responsibly.

Julian Segert is a first year graduate student in Biological and Biomedical Sciences at Harvard Medical School where he studies genetics and genomics. You can follow him on Twitter @JulianSegert.

Aparna Nathan is a second year PhD student in the Bioinformatics and Integrative Genomics PhD program at Harvard University. You can find her on Twitter as @aparnanathan.

For more information:

  • To learn more about privacy concerns in the era of genomics, check out this article from the National Institute of Health
  • To learn about genetic variants and their association with disease, check out this article from Nature Education

7 thoughts on “Understanding Ownership and Privacy of Genetic Data

  1. So basically just use google, I’m not dumb i know that that inter web is stealing my info and giving it to the USSR, you fancy folk don’t know anything, gen z is so soft and dumb.

  2. Great points Julian, especially on the privacy bit. After the recent Gedmatch hack,one has to be even more careful with uploading data to third-party sites.

  3. Data is confidential so there is nothing to be worried about that, but make sure you choose a trusted company.

  4. Really awesome info. I am really amazed to know that they have already tested 5m people? Can you please share some info/link related to their studies or finding. I would really like to study them more

Leave a Reply

Your email address will not be published. Required fields are marked *