We cannot predict how long we each live, but can our genes? For as long as longevity has been a desirable good, it has never been equally distributed across humanity, not even within families. The role of heritable traits in longevity is still debated. Previous genomic studies have reported a low heritability for longevity. However, inadequate sample sizes prevent these studies from examining the influence of environmental factors, for example, and therefore their conclusions are often incomplete or inconclusive.
A new study led by computer scientists at Columbia University uses a novel approach to probe this question: crowdsourcing data. Extracting 86 million profiles from genealogy-driven social media (Geni.com), they construct a family tree of 13 million individuals, spanning 11 generations on average. This population-scale data set is large enough that robust statistical methods are trustworthy. Its large size also gives us the opportunity to model the effects of environmental factors and war, for example, on longevity. With these factors accounted for, the study finds that heredity only affects 16% of differences between individuals’ longevity – much below the literature value of 25%. By closing the loopholes in previous works, this new study suggests we should expect an even lower genetic predictability of human lifespans.
With our fast-growing abilities to manipulate big data sets and extract meaningful insights, the sources of big data sets will become more important in future scientific pursuits. Crowdsourcing is a fast way to gather vast amounts of information at a low cost, but it is difficult to know how much we can trust information available on the web. Before crowdsourcing becomes a mainstream way of gathering data, we will need to construct and impose standards regarding how and where this information is collected.
Original Research Article: