Science produces knowledge about our world, while also forming the basis for technologies, medicines and applications that are instrumental to tackling the many problems facing our society. Because of its enormous impact, it is critical that science be built on firm foundations. One of the most crucial differences distinguishing science from pseudoscience (such as many forms of alternative medicine) or non-scientific forms of evidence (such as anecdotes) is reproducibility – the idea that different individuals following the same protocol, or perhaps even using different but comparable approaches, should be able to arrive at the same conclusion. But all may not be well with the edifice of science, especially in such popular, rapidly moving fields as biomedical research. A number of recent analyses have generated concerns that some of the research we rely on as the basis for our drugs and medical treatments may not be as reproducible as we would hope. At best, this could lead to a waste of time and resources as we try to build on previous research; at worst, it might be indicative of systemic problems in the scientific enterprise and could result in an erosion of public trust in science.
A crisis in reproducibility
The process of drug development often begins with academic researchers discovering biological pathways or molecules that might be involved in diseases, and these findings are published in scientific journals. Pharmaceutical and biotechnology companies scour these journals for pathways or molecules that have clinical (and market) potential, but before sinking millions of dollars into developing a particular drug, they will attempt to replicate and validate these published results. In August 2011 and March 2012, two groups of industry scientists, from German drug giant Bayer and California-based biotech company Amgen, reported their attempts to replicate experiments from dozens of biomedical publications, and the results raised major concerns. The Bayer scientists tried to validate published data for 67 projects, in the fields of cancer biology, women’s health, and cardiovascular diseases . Of these, 43 (or 64%) produced results that were inconsistent with what was published (Figure 1). At the same time, the Amgen team looked at 53 “landmark” cancer-related preclinical studies (where a potential drug molecule is tested to determine correct dosage and whether it is toxic, usually in animal models) . Many of these landmark studies were published in the most prestigious journals and have been cited by hundreds to more than a thousand follow-up articles. Unsettlingly, the Amgen scientists could reproduce the results of only 6 (or 11%) of these studies.
Figure 1 Reproducibility of published data for 67 projects analyzed by Bayer. Validation attempts produced results that were completely in line with published data, that reproduced the main data set of the published studies, that reproduced some of the central aspects of the published data, or that were inconsistent with the published data and led to termination of the project by the company. Number of projects in each category is indicated (and expressed as a percentage of the total number of projects in brackets). (Adapted from )
A common objection raised by the original authors about these failed replication attempts was that the groups doing such replications might not have the necessary technical skills to do the experiments, or there might be subtle differences in the materials used by each group. The Amgen scientists tried to deal with this issue by contacting the original labs that conducted the studies, exchanging protocols and materials, and even, in some cases, going to the original lab to perform the experiments, but the results still could not be replicated (for legal reasons, they could not name the actual landmark papers and their original labs ).
As both teams of industry scientists pointed out, the results of the replication attempts were surprising, and not in a good way. Irreproducibility of scientific results is not just an academic concern, but could have real-world consequences. The failure rate in drug development is alarmingly high and continues to rise every year. Almost 80% of potential drugs fail at the preclinical stage while an additional 85% fail during early-stage clinical trials, over half of which is due to previously unknown safety issues or lower efficacy than expected (Figure 2) . Of course, a product can fail for many reasons, including the simple fact that testing the product in progressively larger number of people (as occurs during clinical trials) could uncover rare problems, but given the sheer number of failed trials, both teams suggest that questionable research results could likely be at least part of the problem.
Figure 2 Reasons for failure of pharmaceutical projects during Phase II clinical trials. Projects fail either due to low efficacy, safety issues, or strategic (business) reasons. Data from an analysis of 87 projects from 16 global pharmaceutical companies that reported failure from 2008-2011. Analysis performed by Thomson Reuters Centre for Medicines Research, as reported in Arrowsmith 2011, Nat Rev Drugs Discovery (http://www.nature.com/nrd/journal/v10/n5/full/nrd3439.html).
The creeping reach of bias
While these two reports focused widespread attention on the issue of irreproducibility, its possible causes and the effect it is having on the translation of basic research to clinical applications have been analyzed and discussed for many years. In 2005, public health researcher John Ioannidis published a theoretical statistical analysis showing that “most published research findings are false” . The reasons for this are multifarious, but studies that are small scale, that look for small effects, that have high flexibility in what is tested and how results are assessed, and that are in a “hot” field are all more likely to report positive results when they shouldn’t. This is not an implication of fraud or scientific misconduct; instead, a fundamental reason behind this is that such studies are more likely to be influenced by biases.
Confirmation bias, the phenomenon in which people tend to notice evidence that confirm their prior beliefs, or interpret them in such a way, is well supported by studies in social science, and scientists, being human, are not completely immune . The “gold standard” of clinical research — the double-blind, randomized clinical trial — is designed specifically to minimize the effect of investigator biases. These trials randomly assign subjects to receive either the experimental treatment (for example, a new drug being tested for efficacy) or a comparator (either an existing treatment or a placebo), such that even the investigators do not know until the very end which subject received what treatment, preventing their observations from being biased by prior expectations. Unfortunately, there are many ways that the integrity of trials can be compromised (e.g., by using small number of subjects, stopping the trial early or late, or testing the treatment against improper comparators) . Moreover, it is not common practice for basic academic research to be conducted with blinding or randomization.
Another issue could confound attempts at objectivity — publication bias. Ask yourself, how often do you see a news headline or a scientific paper that says “such-and-such novel treatment has no effect on this disease”? As early as 1959, statistician Theodore Sterling found that 97% of nearly 300 psychology studies published in major journals reported positive results , which could not possibly be a reflection of reality. This publication bias has since been reported over and over again in various scientific fields, where often only a few percent of the published studies would report a negative result or a lack of significant result . The pharmaceutical industry, a major sponsor of clinical trials, has actually been one of the worst offenders. As documented in the recently published book Bad Pharma by British medical doctor Ben Goldacre, multiple surveys and systematic reviews showed that clinical trials that show negative or unfavourable results are overwhelmingly underreported compared to those that show positive results. In some cases, unreported or, worse, intentionally withheld experimental data might have led to harm or even death for trial subjects and patients . Regulatory agencies and academic journals have tried to deal with this by requiring all trials to be registered before they begin, with their complete protocol clearly recorded, so that crucial data, especially negative data, would not go unreported. However, these registries are poorly enforced and their success so far is limited. And of course, no such requirements are currently in place for preclinical or basic academic research, although calls are growing for setting up similar registries for preclinical animal studies .
What can we do?
As many observers have pointed out, one of the possible causes of this reproducibility crisis is the increasing pressure on scientists to “publish or perish”. In an economic reality where the number of science PhDs continues to rise while funding and jobs remain stagnant, scientists feel the need to publish the most dramatic and interesting results as quickly as possible and in the most prestigious journals. Many have urged for changing the reward structure of science so that factors other than publication of novel, positive results, such as the scientists’ work on mentoring students or replicating important scientific results, would carry more weight in decisions of who gets tenure or grant funding [2,8]. But in the current, resource-strapped economic climate, this may be easier said than done.
There is also a need to actively improve the reproducibility of research and encourage publication of negative results. In August 2012, California-based startup Science Exchange, in partnership with open-access publisher Public Library of Science, established the Reproducibility Initiative, which allows investigators to submit their studies for independent validation by other labs. The long term success of this initiative, however, will depend on stable sources of funding and uptake by the scientific community (as of March 1st, 2013, the Initiative reports that close to 2000 investigators have “opted-in” ). As for tackling publication bias, a few journals have been established in recent years, such as the , to provide an avenue for publishing rigorous, peer-reviewed studies that report negative results. However, until it becomes standard for major journals to publish similar studies, the incentive for researchers to publish negative results will remain low.
Finally, published scientific papers must permit replication more readily. It may sound obvious that the “methods” section of papers should detail the exact protocol by which the experiments were conducted. However, partly because of word limits, these sections are often incomplete, inconsistent, and tell too simple of a story of how the science was actually performed. Some journals are now moving to address this issue of incomplete/inaccurate protocols; in April 2013, Nature Publishing Group announced that all Nature journals will abolish word limits on methods sections, “encourage” authors to disclose as much of the raw data as possible, and “prompt” them to lay out statistical analyses and technical details of materials and reagents . Again, whether such measures will successfully address the issues at hand will depend on how diligently they are enforced, but this is at least a step in the right direction.
Despite the problems outlined here, science is still a search for truth about the world we live in, and many scientific theories and applications are still supported by robust evidence generated by many scientists working independently. What everyone has to keep in mind is that, despite what the media or university press releases might say in their attempts to generate hype, any single new study that comes fresh off the press should not be accepted wholesale, no matter how novel, interesting, or surprising the results might be, and that we should always wait for the process of science to work to replicate, validate and build on the results. To adapt a famous quotation by Sir Winston Churchill on democracy, science as it is currently practised may not be perfect, but it is still arguably the best tool we have to understand the world and produce tangible results that can improve human wellbeing. However, there is no doubt that problems do exist that have the potential to undermine science or, at the very least, public trust in it. Society cannot afford to waste further billions of dollars, and the goodwill of citizens and donors, to chase false leads in clinical development, and we certainly do not want to wait until deaths of patients or clinical trial subjects are directly attributed to faulty research results before we do something to address these problems.
Johnny Kung is a PhD student in molecular biology at Harvard Medical School. On the side, he also sub-specializes in programs on Human Biology and Translational Medicine, and on Science, Technology and Society.
 Prinz, F, T Schlange, and K Asadullah. (2011) Believe it or not: how much can we rely on published data on potential drug targets? Nat Rev Drug Discov 10:712. http://www.nature.com/nrd/journal/v10/n9/full/nrd3439-c1.html.
 Begley, CG, and EM Ellis. (2012) Drug development: Raise standards for preclinical cancer research. Nature 483:531-533. http://www.nature.com/nature/journal/v483/n7391/full/483531a.html.
 Editorial note on  http://www.nature.com/nature/journal/v485/n7396/full/485041e.html.
 Ledford, H. (2011) Translational research: 4 ways to fix the clinical trial. Nature 477:526-528. http://www.nature.com/news/2011/110928/full/477526a.html.
 Ioannidis, JPA. (2005) Why most published research findings are false. PLoS Med 2:e124. .
 MacCoun, RJ .(1998) Biases in the interpretation and use of research results. Annu Rev Psychol 49:259–87. http://socrates.berkeley.edu/~maccoun/MacCoun_AnnualReview98.pdf.
 Goldacre, B. (2013) Bad Pharma: How Drug Companies Mislead Doctors and Harm Patients. New York: Faber and Faber, Inc.
 Sterling, TD. (1959) Publication decisions and their possible effects on inferences drawn from tests of significance – or vice versa. J Am Stat Assoc 54:30–34.
 Sena, ES, et al. (2010) Publication bias in reports of animal stroke studies leads to major overstatement of efficacy. PLoS Biol 8: e1000344. http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1000344.
 Landis, SC, et al. (2012) A call for transparent reporting to optimize the predictive value of preclinical research. Nature 490:187-191. http://www.nature.com/nature/journal/v490/n7419/full/nature11556.html.
 Kimmelman, J. (Dec 19, 2012) In search of genomic incentives. The Globe and Mail. http://www.theglobeandmail.com/news/national/time-to-lead/in-search-of-genomic-incentives/article6534106/.
 Reproducibility Initiative: Updates, Opt-Ins, and Validations. http://blog.scienceexchange.com/2013/03/reproducibility-initiative-updates-optins-and-validations/.
 Editorial. (Apr 24, 2013) Announcement: Reducing our irreproducibility. Nature 496:398. http://www.nature.com/news/announcement-reducing-our-irreproducibility-1.12852.