by Dawn Chen
figures by Daniel Utter

Did you know that the divorce rate in Maine strongly correlates with the per capita consumption of margarine? Wow, maybe abstaining from margarine prevents divorce! I can definitely imagine a pop-media article with this eye-catching title. Before throwing out all margarine to save your marriage, an intelligent reader like you would probably think to yourself: “what absurdity, it’s probably a coincidence that the trends match, and there is no causal relationship between them after all.” 

Figure 1: Divorce rate in Maine is correlated with per capita consumption of margarine. Is this a relationship that occurred randomly, or is there something here worth digging further? (Source: Spurious Correlations)

Such coincidental-but-unverified associations can be found in scientific research too, especially in recent news covering the microbiome field that explores microbes (also known as microorganisms) that inhabit our bodies. “Good” microbes in our gut can help us absorb nutrients better and protect us against infections, while “bad” microbes can make us sick. As researchers dig deeper into our microbiome, they have found that microbes in our body are linked to a wide range of health outcomes and diseases, including obesity, diabetes, Alzheimer’s disease, depression, Multiple Sclerosis, ALS, and autism

Ostensibly, these results suggest that a new series of therapeutics is on the horizon; if we change our diet, eat more probiotics like yogurt, or replace our microbiome with “good” microbes, we are on-track to alleviating these diseases, right? However, the truth is more complicated than it seems. Most of these studies only suggest that a relationship exists between the microbiome and disease. We don’t yet know for certain how exactly the microbes caused the patients to be sick, or if the microbes caused the illness at all. Very often, news on the microbiome field falls into the “correlation does not imply causation” trap, where a relationship between two variables does not imply a direct cause-and-effect. 

Correlation does not imply causation

To critically evaluate existing scientific findings, we must first understand the difference between correlation and causation. Correlation means that there is a relationship, or pattern, between two different variables, but it does not tell us the nature of the relationship between them. 

In contrast, causation implies that beyond there being a relationship between two events, one event causes another event to occur. For example, if we don’t sleep, we will feel sleepy. The former (not sleeping) directly causes the latter (feeling sleepy).

The distinction between correlation and causation seems to be straightforward, but it’s easy to wrongly assume causation from correlation, especially when there is a complex interplay of variables. Here are some common themes of wrongly inferring causation from correlation, or why “correlation does not imply causation”:

Figure 2: Common misconceptions between correlation and causation. (1) The relationship between 2 events may be coincidental. (2) The cause and effect between 2 events may be reversed. (3) There may be a third, unknown, variable that confounds the relationship. 
  1. The relationship between both variables is coincidental

The correlation between unrelated variables can occur by chance. One example is the “Redskins Rule”, where the result of the last NFL game of the Washington Football Team before the US presidential elections accurately predicted every election result from 1936 – 2000. Intuitively, we know that the outcome of a football game has nothing to do with presidential elections – this observation is merely a coincidence. The more variables we examine, the more likely we will find unrelated variables that are correlated by random chance. 

  1. Reverse causality

Reverse causality means that there is a causative relationship between events A and B, but not in the order that you would expect – the cause and effect are reversed. For example, if we observe that the faster the windmill rotates, the more wind there is, we might falsely conclude that the windmills rotating causes the wind. However, we know that it is the wind that causes the windmills to rotate.

  1. A common (third) confounding variable causes both events

In some cases, there may be a hidden, underlying variable that causes events that appear to be correlated. We might assume that event A causes event B when in reality, there is another event C that causes both events A and B. For example, many researchers have previously found that alcohol consumption is associated with an increased risk for lung cancer. However, smoking was later shown to be a confounding factor. Individuals who consume more alcohol also happen to smoke more, which increases their risk for lung cancer. 

Observational studies can’t prove causation

While correlation is easily observable, determining causation is much more complicated and requires an appropriate experimental design. Ideally, we would want to conduct experiments in the lab, where we tightly control all variables except for the one that we are interested in. However, this is nearly impossible in human studies. To conduct a most rigorous randomized-controlled experiment, we probably would need participants to live in the same place, eat the same food, exercise and sleep at the same time, just to name a few variables. As a result, most human microbiome research has been largely observational. 

Figure 3: Workflow in most microbiome population studies. Researchers collect stool samples from healthy and sick participants, find the composition of these samples by sequencing, then analyze the data to find pattern differences between samples from healthy and sick participants. 

In most large-scale human microbiome studies like the Human Microbiome Project or American Gut Project, researchers recruit a group of participants, collect and sequence their feces samples, and simultaneously gather information on participants’ lifestyle, diet, and health statuses. By analyzing differences in the microbiome between individuals suffering from disease and healthy individuals, we can find correlations between microbiome composition and the disease of interest (Figure 3). 

It’s worth noting that the direction of causality in these relationships is often ambiguous. Specifically, scientists have found that patients, such as those suffering from inflammatory bowel disease, have different gut bacteria compared to healthy individuals. Did differences in the gut microbiome make the patient sick, or did the patient’s disease state itself (e.g. more diarrhea or inflammation) lead to differences in the gut microbiome? We are often quick to assume the former, that the bacteria have caused the disease, though the direction of this causal relationship is not so easily determined. Researchers tend to call this the “chicken and egg” problem. Furthermore, lifestyle is a big confounding factor. Patients who suffer from diseases often change their diet upon diagnosis or take drugs for treatment, which can change their gut microbiome composition. 

Figure 4: Removing confounding variables to find a true relationship in population studies. A one-on-one matching method, where each diseased patient is matched to a healthy control with a similar lifestyle, can help us better understand relationships between the gut microbiome and human disease. 

In an attempt to solve the problem of confounding variables, a recent publication in Nature by Ivan Vujkovic-Cvijin and co-workers picked out lifestyle differences that might be associated with microbiome composition. They found that gender, age, body mass index, and levels of alcohol consumption are the biggest confounders associated with both microbiome composition and disease status. To remove the effects of these confounders, the researchers used the approach of one-to-one matching, where a sick individual was matched with a healthy individual who had the same age, gender, and lifestyle habits. This is a common technique used in observational studies, where researchers cannot control for all variables under perfect experimental conditions (Figure 4).  Using this technique, the researchers discovered that many associations found previously between gut bacteria abundance and disease status are no longer statistically significant, suggesting that some gut microbiome changes attributed to disease might be a result of underlying confounders.

Stay healthy, stay skeptical

Despite the ambiguity surrounding causation, a growing number of commercial companies like Viome, uBiome (which was raided by the FBI last year for multiple insurance billing) or DayTwo have started marketing interventions for the microbiome. Customers would mail in a feces sample for sequencing, then based on the types of bacteria present in the sample, the companies will prescribe personalized nutritional information or provide customers with risk scores for different diseases. While these companies have good intentions of helping consumers understand their bodies, we need to critically evaluate their claims. 

The microbiome is undoubtedly important for our health. However, we are still not completely sure how exactly the microbiome does so or fits into disease progression, despite the hype surrounding largely correlative studies. To determine if the microbiome causes disease, some researchers are exploring the molecular mechanism of individual bacterial strains. Other researchers are working on designing experimental studies with a larger sample size and a more rigorous methodology. With better analysis tools and datasets, we will be able to uncover the complex functions these tiny living organisms hold in our bodies soon. In the meantime, grab a cup of yogurt, just because it’s tasty.  


Dawn Chen is a first-year Ph.D. student in Systems, Synthetic and Quantitative Biology at Harvard University. 

Daniel Utter is a 6th year Ph.D. student in Organismic and Evolutionary Biology at Harvard University.

Cover Image: “silver bullet” by eschipul is licensed under CC BY-SA 2.0

For More Information:

One thought on “When Correlation Does Not Imply Causation: Why your gut microbes may not (yet) be a silver bullet to all your problems

  1. In retrospect, doing scientific research and practicing clinical medicine, may actually require irreconcilable mind sets. My intellectual start began in the arts and moved to philosophy (I do not mean in the scholastic sense, as that followed a strict protocol, rather in the exchange of ideas in coffee houses European setting. Academics, however, involved a pot potpourri of endless study of primary research, and I couldn’t say how much of it was attraction like that of a moth to fire and how much was just formalism to earn degrees; but all, to me, was just systematic giving in to fascination and curiosity. However, the older I get, the more “science” scares me. First of all, until advanced degrees, I was taught that “science” is nothing but, “a method of inquiry used to falsify theories.” Hence, “scientific truth” is more a target to shoot for, resulting in a false sense of “progress”– generally with a life span these days of about a decade– rather than anything resembling “Eureka!” Thus, on the clinical side, most of the “chief complaints” of patients seemed to be treated based on who was the last Pharma Reps to get to physicians, giving them “mini-lectures” like late night infomercials. But in the doing of science, where one devotes some 25 years just to obtain a PhD, one’s research funding can suddenly be cut off and, because of the highly specialized nature of research, today, one can find oneself only qualified for working as an uber-driver once funding dries up. It’s so desperate out there, that some guardians of the veracity of science claim that today about 1/3 of published papers are “crap” imaginatively pasted up so as to stay alive funds wise in the game. In adolescence I was formally introduced to statistics. The first thing I was taught was the old saying: FIGURES DON’T LIE BUT LIARS FIGURE! Now, many decades later than I care to imagine, I find myself delving into primates evolution. Here too, as a neuroscience researcher, I find myself annoying academics by asking them how they reconcile IMPRESSIONISTIC SCIENCES like paleoantropology with MECHANISTIC SCIENCES like genetic evolutionary molecular biology. Systematics a la Darwin always kind of seemed, “as you like it,” but the DNA/RNA studies do not seem consonant with the impressions of the bone chips unearthers. I am still desperately trying to hang on to the notion that “phylogeny recapitulates ontogeny.” All those wonderful courses, for example, suggesting the impression that the Diploe Venous System of the skull keeps the brain 0.5C degrees cooler than the body’s core, and the relation of the CNS embyologically to its phylogenetic lineage, seem as conjured up as the brilliant notion of Eccles that the brain is an ever burning furnace of ionic fluxes that is a metabolic fire that is sculpted into actions by GABAnergic inhibition into the behaviors and thoughts we observe. But, is any of all this “FACT”? Were the ancient Greeks right in that the Gods are punishing us for the hubris of assuming that “scientifically” we can “discover” what is truth? It seems to me that in every subfiel of bioscience, about every decade, the “advances” come undone and we all have to go back to the drawing board. To this day, only the DNA/RNA genetic molecular biology theories seems to either under dogmatic doctrinaire control of Mephistopheles-like geniuses in DNA/RNA sequencing with Messiah like insight, Maybe we are just being duped by a massive statistical subterfuge at the hands of 23 & Me or Ancestry Inc’s statistical analysis– not only backwards in the sense of genealogy, but also forward in the sense of evolutionary control. It’s all moving so fast, and sectarian journals are appearing to pop up like mushrooms, that the faiths of science seem to be cut throat denomination catechisms. So thank you for reminding us that CORRELATION IS NOT DIVINATION OF TRUTH. I personalty am afraid that we’re falling for all the AI gods making Science more “take it or leave it” than analytic. After all, it is now more than ever because of the stingy “publish or perish” mania and no one is allowed to print the words “maybe” and “perhaps” as was more common in the last century.

Leave a Reply

Your email address will not be published.