The human genome is long: 3 billion letters in total. How can scientists tell which parts are important? One way is to identify spots with suspiciously few mutations. Mutations occur randomly throughout the whole genome, changing one letter to another; if any mutation hits an important spot, it’s likely to cause some issue and die out over generations, a part of natural selection. So, regions that are unusually consistent are likely to be important. Scientists quantify this consistency with a metric called “constraint.” By looking for constraint across different species, researchers can hone in on regions that are important for proper function, so they’re more likely to harbor mutations involved in disease. A new study has unveiled an unprecedented look at constraint across primates.
A group of international researchers collected 239 primate species’ genomes, 187 of which were found by this team. Then, at each spot in the human genome, they found the corresponding spot from as many species as possible, measuring constraint at each location. Around 5% of the whole human genome was constrained across primates, and about 2.4% was exactly identical for every species they looked at. Many of these regions corresponded to those with known importance in experimental data, like regions that affect gene activity levels or regions associated with disease. These regions were also particularly associated with genes active in embryos, the heart, and the brain.
Many signals for disease are sparse and map to many different locations in the human genome. Lab-generated data can also label large swaths of the genome with specific biological properties, but it’s hard to delve deeper without other measurements. By using these new regions of constraint that have been important in the evolutionary journey to making us human, scientists can hone in on the most significant spots.
The lead researchers Lukas Kuderna, Jacob Ulirsch, and Sabrina Rashid are scientists in the Illumina Artificial Intelligence Laboratory.
Corresponding Author: Alex Yenkin
Original Article: “Identification of constrained sequence elements across 239 primate genomes,” Nature
Press Article: “Comparison of 239 Primate Genomes Reveals Conserved Regulatory Sequences,” GenomeWeb