by Ryan L. Collins
figures by Brad Wierbowski
Thanks to modern genetics, “precision medicine” is slowly becoming a reality: doctors can perform genetic tests to determine your risk for dozens of diseases, like stroke or liver disease, and can prescribe treatments or therapies tailored to your individual genetic makeup. Yet before doctors can provide you with precision medicine in practice, they first need to understand the genes of tens of thousands of other people. Excitingly, recent breakthroughs in genetics research have made it possible to study the genes of whole populations at once, and the lessons we are learning from those studies are rapidly changing our approach to diagnosing and treating disease.
Learning to read your DNA
You can picture your genome as a book with 3.1 billion letters, known as nucleotides, that encode a list of the ~20,000 different molecular parts, or proteins, that comprise every cell in your body. Much like a book with 3.1 billion letters, your genome isn’t exactly a light read: for example, the first time a human genome was ever read (“sequenced”) in its entirety required the combined efforts of more than 200 scientists, a timespan of 12 years (finally finishing in 2003), and a staggering total cost of $2.7 billion.
In the subsequent 14 years, massive advances in sequencing technologies have transformed scientists into genetic speed-readers, with cutting-edge sequencing methods able to process your entire genome in under two days for a mere ~$1,500. Additional technologies have improved our efficiency by allowing targeted sequencing of the bits of your genome that spell out the blueprints for all human proteins, known as the exome. Surprisingly, the exome takes up barely more than ~1% of all of the nucleotides in your whole genome, which means exome sequencing is both faster and cheaper than genome sequencing (see Figure 1 for a comparison of exome vs genome sequencing). While the other ~99% of your DNA plays various “helper” functions in your cells, it does not code for proteins. Since proteins are the most important cellular building blocks, and thus the most important determinants of disease, exome sequencing is a great way to hone in on the most critical sections of your DNA.
Scaling up genetic sequencing studies
The development of these groundbreaking sequencing technologies has opened countless promising research avenues. Not least among these is an effort known as “population sequencing,” or the process of sequencing the exomes or genomes of entire human communities, illustrated in Figure 2. Early examples of population sequencing, including the 1000 Genomes Project or the Exome Sequencing Project, combined genetic data from thousands of volunteers into vast datasets bursting with new knowledge about human biology.
The initial successes of these population sequencing projects triggered a tidal wave of similar studies. Dozens of research groups rushed to apply similar methods, and the results came pouring in. For example, one 2014 study sequenced the genomes of several thousand people from Iceland, identifying specific genes that may predispose Icelanders to early-onset heart disease and liver disease. As a second example, multiple studies have used exome and genome sequencing in children to pinpoint over 60 genes strongly linked to autism. The list of these sequencing success stories is already lengthy, and continues to grow every month. Even more importantly, sequencing studies like these produce the information doctors and researchers need to screen your genome and provide a more complete picture of how your genes influence your individual health.
Data sharing: ExACtly what the doctor ordered
As of 2017, well over one million human exomes and genomes have been sequenced worldwide. In the realm of human genetics, bigger datasets are almost always more informative, so analyzing all of these data together seems like an obvious choice. These population-scale studies depend on volunteers like you to contribute their DNA, but unfortunately this process isn’t always straightforward. Genetic data sharing, even if performed strictly in a research context where no personal health information is transferred, is still fraught with ethical, legal, practical, and bureaucratic hurdles.
In 2014, a large international alliance of researchers, led by Daniel MacArthur at The Broad Institute of M.I.T. and Harvard, set out to tackle these obstacles. They formed a collaborative group known as the Exome Aggregation Consortium (abbreviated “ExAC”), and combined exome sequences from over 60,000 healthy individuals from more than two-dozen independent studies conducted around the world to build a dataset nearly ten times larger than any other ever assembled. Their results, reported in the journal Nature, outlined the most comprehensive atlas of human genetic diversity to date, including individuals from nearly all major global populations and uncovering nearly five-and-a-half million genetic changes, known as mutations, never seen in any previous studies. This detailed mutation map immediately changed the landscape of human genetics research: in the short span since the team publicly released a draft of their results in late 2015, hundreds of scientific groups around the world have used the ExAC dataset, with over 600 peer-reviewed scientific publications citing ExAC in the last two years alone.
The effects of ExAC on human genetics research have been profound. For instance, specific mutations in important, disease-causing genes might not always result in disease for certain people; researchers have now used the ExAC dataset to decipher why this might happen for a peculiar gene, known as PRNP, that causes multiple neurological disorders, such as fatal familial insomnia. A different 2016 study performed exome sequencing on 14,133 individuals from northern Europe to identify “ultra-rare” mutations—genetic changes never seen in any of the 60,708 ExAC participants—and showed the number of these ultra-rare mutations in genes important for brain development can partially predict how many years an otherwise healthy individual is likely to stay in school. These and similar discoveries are already having a palpable impact in translational research and clinical medicine, and these advances wouldn’t have been possible without population-scale resources like ExAC.
Combining genetics with medical records to make DiscovEHRies
Like ExAC, a recent collaboration between Regeneron Pharmaceuticals, Inc., and Geisinger Health System, dubbed DiscovEHR, combined the exome sequences and full medical records of over 50,000 volunteers recruited at one of Geisinger’s clinics in Pennsylvania. As depicted in Figure 3, this fusion of genetic and medical data proved to be an even more powerful approach than analyzing just the genetic data alone. The DiscovEHR study, published in 2016 in the journal Science, compared medical data between patients with and without mutations in certain genes, and found that patients with mutations that disabled a small group of specific genes had lower cholesterol levels, which lowered their risk of serious heart disease. Current pharmaceutical strategies involve identifying such genes as “targets” for drug development, aimed at recreating the lower cholesterol levels caused by the mutation that disabled the gene in patients, with the ultimate goal of providing those drugs to individuals at high risk for heart disease but who lack these rare, protective gene mutations.
Population sequencing lands a knockout punch
Except for genes on the sex chromosomes (X & Y), there are two copies of every gene in your genome, one inherited from your father and one from your mother. Gene-inactivating mutations are generally rare events, so it is extremely uncommon for a single individual to inherit disabled copies of the same gene from both parents. When this does occur, it’s called a “gene knockout,” and means that individual lacks the ability to produce any of the protein encoded by that gene. Not surprisingly, gene knockouts are the known cause for hundreds (if not thousands) of rare diseases.
In some cultures, marriages between first-cousins is commonplace. Since first-cousins are genetically closely related, their children are at a much higher risk of inheriting gene knockouts, making those children ideal individuals to study the effects of gene knockouts in humans (such studies are usually conducted in mice or other “model organisms”). Last month, a team of researchers led by Sekar Kathiresan at the Massachusetts General Hospital reported in Nature on a population sequencing study of over 10,000 individuals from Pakistan, known as the PROMIS study, where the rate of first-cousin marriages is particularly high. Remarkably, the team found that at least 7% of all known protein-coding genes were knocked out in at least one individual without resulting in any obvious medical issues, meaning these genes might represent safe drug targets with little side-effect risks, as shown in Figure 4. Conversely, the PROMIS study also reported on a subset of individuals who were knockouts for current drug target genes that are thought to protect against heart disease, but those individuals developed heart disease at the same rate as the general population. Whatever the conclusion, this study drives home the point that population sequencing can inform—and, in some cases, correct—drug development and prescription of clinical treatments.
The future: population sequencing, precision medicine, and personalized therapies
Population sequencing is changing the way companies design drugs and doctors diagnose diseases and choose therapies. Landmark large-scale studies like ExAC and DiscovEHR have proven that pooling genetic data across tens of thousands of individuals dramatically improves researchers’ abilities to make new discoveries about the causes of disease. Cataloguing healthy individuals with rare gene knockouts, like the PROMIS study, can advance our understanding of human physiology, produce new drug targets, and shed new light onto why drugs might fail in certain patients. Yet numerous issues continue to hinder advances in medical genetics: we still know very little about the genetics of disease in people of African or Asian ancestry, genetic data remains difficult to share between research groups, and we have still only sequenced less than 0.01% of all people on earth.
Even despite these challenges, our knowledge of the genome has already revolutionized medicine. Modern clinics can now perform dozens of genetic tests to evaluate your risk for cancer, Alzheimer’s disease, and heart attacks, or can tell you the odds your future children might have autism, epilepsy, or physical birth defects. For certain diseases, especially cancer, having your genome sequenced helps doctors select drugs designed specifically for your genetic makeup, making sure you get the most effective personalized treatment possible.
Just like the famous Greek philosopher Socrates once quipped, “to know thyself is the beginning of wisdom.” Today’s geneticists and doctors would probably agree with him: genetic testing can teach you more about yourself, and in the process, help you live a longer, happier, healthier life.
Ryan L. Collins is a Ph.D. Candidate in Bioinformatics and Integrative Genomics at Harvard Medical School.
For more information:
Human Genome Project Completion FAQs
NIH Primer on Human Whole-Genome Sequencing
NIH Genetics Home Reference on Inheritance Patterns
1000 Genomes Project, “A global reference for human genetic variation,” Nature (2015)
Lek et al., “Analysis of protein-coding genetic variation in 60,706 humans,” Nature (2016)