Are Costly Experimental Failures Causing a Reproducibility Crisis?

How to Improve the Success and Reproducibility of Your Results

In life science research, the gold standard for experimentation is a protocol that is reproducible. Ideally, this protocol gives the same result each time it is performed and is sufficiently detailed that another person, even someone in a lab clear across the globe, could replicate its results precisely.

Scientific progress depends on reproducibility, which allows multiple scientists to build on a given thread of previous research over months, years, or even decades, and to trust that their findings are sound.

$28 Billion

per year spent on preclinical results that cannot be repeated in the U.S. alone.

Unfortunately, reproducibility is often more a pie in the sky than a reality. A 2015 study in PLoS Biology found that half, perhaps more, of preclinical research in therapeutic drug development is not reproducible (Freedman et al. 2015). This has led to an estimated $28 billion per year spent on preclinical results that cannot be repeated in the U.S. alone. And although many experts say a failure to reproduce does not necessarily mean the result is wrong (Baker 2016), irreproducibility has big implications for research in therapeutic development, where future patients depend on consistent, reproducible results (Relias Media, 2022). In fact, researchers have coined the phrase "reproducibility crisis" to describe the persistent presence of nonrepeatable research across many different areas of science.

Barring any major systemic changes, the cost of irreproducibility has probably gone up since 2015. The American pharmaceutical industry spent an estimated $83 billion on research and development (R&D) in 2019 (Congressional Budget Office 2021), and if half of that research was irreproducible, that would equate to more than $40 billion in excess costs. Considering the U.S. accounts for roughly 45% of global early-stage R&D (IQVIA Institute 2021), and assuming reproducibility rates are similar around the world, a back-of -the-napkin calculation indicates that some $90 billion is spent on irreproducible research globally each year. Some variation is unavoidable, and there are studies that by nature cannot be repeated. But if a study cannot be reproduced because of poor design, execution, or reporting, real people can be put at risk without giving us any useful information (Relias Media 2022).

Meanwhile, a 2019 report from Deloitte found that the biopharmaceutical industry is facing major productivity challenges and diminishing returns. Biopharma companies are taking longer than ever to develop new drugs, and as medicine becomes more and more personalized, these drugs are becoming more expensive while reaching smaller groups of patients (Deloitte 2019).

Most preclinical research is replicated before it moves on to a clinical context, and each of these preclinical studies takes anywhere from three months to two years to complete (Freedman et al. 2015). These timelines, as well as the clinical trials that follow, are lengthening as drug development and drugs themselves continue to become more complex (Deloitte 2019).

Experts agree that the current high-risk, high-cost R&D model for biopharma is unsustainable. Together with the reproducibility problem, many would argue that we are in a research crisis (Baker 2016; Stuppleet al. 2019).

Reproducibility Solutions

The 2015 PLoS Biology study estimated that 27.6% of failures are due to flawed study design, 25.5% to data analysis and reporting, and 10.8% to poor laboratory protocols (Freedman et al. 2015). But poor tools were the biggest culprit. The study estimated that over a third of irreproducible research – 36.1% – fails because subpar biological reagents and reference materials are used.

Improving research reproducibility, especially in the age of large-scale drug development, is an area of intense study. Several individual and systemic solutions have been proposed to combat the reproducibility crisis. These solutions serve both to prevent researchers from pursuing irreproducible studies and to help them improve the quality of their own work.

Combating Data Misuse

Some failure is inevitable. That is why researchers use a conventional 5% false-positive rate – in other words, a statistical significance, or p-value, of 0.05 (Vidgen & Yasseri 2016).

But in the “publish or perish” culture of research (Everett & Earp, 2015), scientists, especially early in their careers, feel the pressure to squeeze out novel, groundbreaking results with low p-values. Noteworthy studies are more profitable, both in academia and in industry. And while flat-out scientific fraud is probably uncommon, there are other research behaviors that fall in a gray area. These include data dredging (or p-hacking) (Parry 2021), selective reporting (Department of Health & Human Services 2015) and hypothesizing after results are known (HARKing) (Wilson 2021).

Cherry-picking data – intentionally or otherwise – is especially prevalent in biopharmaceutical research, where, during the exploration of huge swathes of data, it can be easy to find associations that may appear statistically significant but are not actually meaningful.

Some experts think more stringent cutoffs are necessary in deciding whether a finding is noteworthy. For example, lowering conventional significance levels from 0.05 to, say, 0.005 could make p-hacking more difficult (Benjamin, et al., 2017). Others disagree, saying that rather than redefining the threshold, p-values and other ways of calculating significance should be justified on a case-by-case basis (Lakens et al. 2018).

Many experts and organizations have also proposed a paradigm shift to prioritize replication studies rather than just original ones. Concrete solutions include the provision of more funding specifically for replication studies (De Vrieze 2017) and repeat clinical trials (National Institutes of Health 2014), as well as requiring that more research students perform high-quality replication studies for their coursework (Frank & Saxe 2012), thesis (Quintana 2021), and early-career publications (Everett & Earp 2015).

Improving Study Design

Experts are calling for more rigorous training programs at academic institutions for teaching best practices in basic research skills and experimental planning, as well as requiring continuing education and certifications for principal investigators to receive funding. Some life science technology companies also offer teaching resources, webinars, tutorials, practical guides, and protocols to improve research literacy in science students and investigators alike.

Data misuse or manipulation can be a part of flawed study design. But a potentially irreproducible paper may also lack the appropriate controls, use the wrong statistical tests altogether, fail to repeat an experiment within a given study, or omit outlier experimental runs without justification (Begley 2013). Investigators should look carefully for these hallmarks of questionable studies on a case-by-case basis before they try to build on that research.

Eliminating bias is another important component. Studies have shown that when the same investigators have tried to reproduce their own experiment, they often could not. I n many cases, the only difference was that, in the second time around, they were blinded to which samples were test samples and which were controls (Begley & Ioannidis, 2015). Experts think that making blind experiments the standard would have a big impact on the reproducibility crisis.

Bias towards positive results also factors into improving peer review. Some experts support the concept of a “result-blind” peer evaluation process (Locascio, 2017), where reviewers would first be given only the introduction and methodology for a given paper, without the results. In theory, this would ensure that reviewer recommendations are based purely on the rigor of the paper’s experimental design andthe importance of the research question itself, not on whether the results were positive or “newsworthy.”

A lack of standardization can also cause major inconsistencies in pharmacological research. Experts have estimated that a more stringent use of standards and best practices could save billions per year (Freedman, Cockburn, & Simcoe, 2015). This includes adopting standard practices not just for protocols and reagents, but for methods of analysis as well (Haibe-Kains, et al., 2013).

Improving Transparency

The intense pressure to produce attention-grabbing, positive findings has resulted in what psychologists call “the file drawer effect” – the filing away of results that does not support researchers’ hypotheses (Apple 2017). More transparency from the very beginning, experts say, would make results more difficult to manipulate, experiments easier to replicate, and fruitless efforts less likely to be duplicated (Vidgen & Yasseri 2016).

These experts have suggested making proposed hypotheses, methods, and analyses open access via the Open Science Framework data repository, for instance, before studies or clinical trials are initiated. Thischange would hold researchers accountable if, for example, they preregistered a p-value goal of 0.01 but later reported 0.05, or if they failed to publish the results of a preregistered clinical trial simply because they didn’t find anything noteworthy (Dickersin & Rennie, 2003).

Furthermore, when multiple teams of researchers can access the same data sets through data sharing platforms, their research becomes a collaborative effort rather than a competitive one. In fact, both the National Science Foundation and the National Institutes of Health have issued firm statements that investigators should disclose data sets, but these statements are not strictly enforced (Begley & Ioannidis 2015).

Downstream Effects of Experimental Failures

Even with meticulous methodology, a chef is only as good as their ingredients.

Antibodies, for example, are the foundation of a huge proportion of life sciences research. This fact is especially true as the market for therapeutic antibodies has boomed in recent years (Deloitte 2019). Experts have called antibodies a major driver for the reproducibility crisis (Baker 2015), both because there are so many poor-quality products out there that lead to inconsistent results and because many labs are not validating antibodies after purchasing them.

Cell lines are another huge part of preclinical research. However, they can often be misidentified or cross-contaminated. The use of a faulty or incorrect cell line can derail an entire study, yet many labs still do not authenticate their cell lines, despite the fact that it only costs a few hundred dollars per assay – a low cost for a potentially critical safeguard (Freedman et al. 2015).

Investigators should use only those vendors that offer validated reagents, including antibodies, primers, and other assays and kits. Investigators should also use validated equipment that is held to U.S. Food and Drug Administration standards to ensure their software is regulated and reliable.

Vendors should also provide quality control reports and certificates of analysis so that investigators can keep track of batches and lots. When producing biologics, which comprise a fast-growing proportion of the drug development market, maintaining purity and minimizing batch-to-batch variability can be difficult (Deloitte 2019). When attempting to scale up the production of these complex therapies, consistency and reliability of reagents is critical to minimize unnecessary cost.

Automation and Digital Technology

As technology advances, experts expect automation to become a bigger part of R&D. As pharmaceuticals – which make up some 60% of life sciences research spending in the U.S. – lean more towards huge data sets, automation and digital technologies can significantly reduce timelines (Deloitte 2019).

Take therapeutic antibodies, for example. The development of antibodies can be streamlined at several stages with the right equipment. Digital PCR can be used to more efficiently develop stable cell lines while using less supplies, and flow cytometers can be upgraded to allow automated, high-throughput antibody screening. Digital PCR and automation also have applications in the development of other biologic stalwarts of the biopharmaceutical market, including cell and gene therapies and vaccines (Deloitte 2019).

Digital technology also applies to data curation and everyday laboratory life. Replacing or supplementing paper lab notebooks with digital ones allows investigators to easily find and reproduce data, connect it to their instruments using cloud-based software, work remotely, and consolidate resources as well as easily share them with collaborators.

Using cloud-based instruments also reduces the need to keep upgrading computers’ operating systems to keep up with cutting-edge software. Streamlining the lab and reducing human error through digital technology has big long-term implications both for research quality and efficient spending.

Bringing Reagents Up to Par

A reproducibility crisis affects more than the bottom line. Each time a promising life science paper is released, it gives hope to clinicians, patients, and their families who are waiting for disease cures. But flawed experimental designs, subpar reagents, and overinflated results send other scientists on a wild goose chase. The more time we spend pursuing irreproducible experiments, the more we delay the release of a lifesaving drug.

Experimental failures also undermine public trust in the value of research, the strength of peer review, and the soundness of the scientific process in general. Furthermore, even with the successful release of a product, the more time and resources are spent developing it, the more expensive it will be for hospitals and patients alike – adding to the already critical problem of healthcare affordability.

With reliable equipment, dedication to better study design, and careful selection of high-quality reagents, researchers can play a big part in improving life sciences research and, ultimately, deliver the best products to patients in need.

References

Apple, S. (2017, January 22). John Arnold Made a Fortune at Enron. Now He's Declared War on Bad Science. Retrieved from Wired: https://www.wired.com/2017/01/john-arnold-waging-war-on-bad-science/, accessed August 18, 2022.

Baker, M. (2015). Reproducibility crisis: Blame it on the antibodies. Nature.

Baker, M. (2016). 1,500 scientists lift the lid on reproducibility. Nature.

Begley, C. G. (2013). Six red flags for suspect work. Nature.

Begley, C. G., & Ioannidis, J. P. (2015). Reproducibility in Science. Circulation Research.

Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E.-J., & Berk, R. (2017). Redefine statistical significance. Nature Human Behavior.

Congressional Budget Office. (2021, April). Research and Development in the Pharmaceutical Industry. Congressional Budget Office. Retrieved from Congressional Budget Office: https://www.cbo.gov/publication/57126, accessed August 18, 2022.

De Vrieze, J. (2017, July 11). 'Replication grants' will allow researchers to repeat nine influential studies that still raise questions. Retrieved from Science: https://www.science.org/content/article/replication-grants-will-allow-researchers-repeat-nine-influential-studies-still-raise, accessed August 18, 2022.

Deloitte. (2019). Ten years on: Measuring the return from pharmaceutial innovation 2019. London: 368.

Department of Health & Human Services. (2015). Selective Reporting of Results. Retrieved from Office of Research Integrity: https://ori.hhs.gov/selective-reporting-results, accessed August 18, 2022.

Dickersin, K., & Rennie, D. (2003). Registering clinical trials. JAMA.

Everett, J. A., & Earp, B. D. (2015). A tragedy of the (academic) commons: interpreting the replication crisis in psychology as a social dilemma for early-career researchers. Frontiers in Psychology.

Frank, M. C., & Saxe, R. (2012). Teaching Replication. Perspectives on Psychological Science.

Freedman, L. P., Cockburn, I. M., & Simcoe, T. S. (2015). The Economics of Reproducibility in Preclinical Research. PLoS Biology.

Haibe-Kains, B., El-Hachem, N., Birkbak, N. J., Jin, A. C., Beck, A. H., & Aerts, H. J. (2013). Inconsistency in large pharmacogenomic studies. Nature.

IQVIA Institute. (2021, May 19). Global Trends in R&D. Retrieved from IQVIA: https://www.iqvia.com/insights/the-iqvia-institute/reports/global-trends-in-r-and-d

Lakens, D., Adolfi, F. G., Albers, C. J., Anvari, F., Apps, M. A., & Argamon, S. E. (2018). Justify your alpha. Nature Human Behavior.

Locascio, J. J. (2017). Result Blind Science Publishing. Basic and Applied Social Psychology.

National Institutes of Health. (2014). Replication of Key Clinical Trials Initiative (U01). Retrieved from Department of Health and Human Services: https://grants.nih.gov/grants/guide/pa-files/PAR-13-383.html

Parry, C. (2021). Taking the P: why the founder of P-values would be turning in his grave. The Pharmaceutical Journal.

Quintana, D. S. (2021). Replication studies for undergraduate theses to improve science and education. Nature Human Behavior.

Relias Media. (2022, June 1). The Reproducibility Crisis in Clinical Trial Research. Retrieved from https://www.reliasmedia.com/articles/149447-the-reproducibility-crisis-in-clinical-trial-research

Stupple, A., Singerman, D., & Celi, L. A. (2019). The reproducibility crisis in the age of digital medicine. NPJ Digital Medicine.

Vidgen, B., & Yasseri, T. (2016). P-Values: Misunderstood and Misused. Frontiers in Physics.Wilson, M. (2021, Jan). HARKing: What is it and why is it bad? Retrieved from UC Davis Health: https://health.ucdavis.edu/ctsc/area/Resource_Library/documents/HARKing_0Jan2021.pdf

BIO-RAD is a trademark of Bio-Rad Laboratories, Inc. All trademarks used herein are the property of their respective owner. © Bio-Rad laboratories, Inc.

Content you'll also find interesting

Image

Lessons for Biotech During an Economic Downturn
Image

Tips for Improving Reproducibility and Efficiency
Image

When to Replace Your Lab Instrument

Recommended Features

Image

Checklist for Starting a New Lab

Get a new lab setup checklist infographic to track the necessary steps to follow when starting a new lab.
Image

Best Practices for Increasing Lab Sustainability

Adopting sustainability practices in the lab does not require significant change. Increase lab sustainability by implementing these small changes at the bench.
Image

Biotech VC Funding: Top 5 Areas at the Investment Forefront

Discover the top research areas at the forefront of biotech VC funding that will likely shape the industry in coming years.

Are Costly Experimental Failures Causing a Reproducibility Crisis?

How to Improve the Success and Reproducibility of Your Results

$28 Billion

Reproducibility Solutions

Combating Data Misuse

Improving Study Design

Improving Transparency

Downstream Effects of Experimental Failures

Automation and Digital Technology

Bringing Reagents Up to Par

References

Lessons for Biotech During an Economic Downturn

Tips for Improving Reproducibility and Efficiency

When to Replace Your Lab Instrument

Recommended Features

Checklist for Starting a New Lab

Best Practices for Increasing Lab Sustainability

Biotech VC Funding: Top 5 Areas at the Investment Forefront

Subscribe

Get Topic-Specific Insights

Are Costly Experimental Failures Causing a Reproducibility Crisis?

How to Improve the Success and Reproducibility of Your Results

$28 Billion

Reproducibility Solutions

Combating Data Misuse

Improving Study Design

Improving Transparency

Downstream Effects of Experimental Failures

Automation and Digital Technology

Bringing Reagents Up to Par

References

Recommended Features

Subscribe

Get Topic-Specific Insights

Follow Us