Major Study Shows the Need to Improve How Scientists Approach Early-Stage Cancer Research

Cancer Research

Preclinical research — the kind that takes place before testing on humans — often guides decisions about which potential treatments should continue to clinical trials. But attempts to replicate 50 studies found the odds of getting the same results were only about 50-50. Photo: Sinology/Getty Images

Preclinical studies, the kind that scientists perform before testing in humans, don’t get as much attention as their clinical counterparts. But they are the vital first steps to eventual treatments and cures. It’s important to get preclinical findings right. When they are wrong, scientists waste resources pursuing false leads. Worse, false findings can trigger clinical studies with humans.

Last December, the Center for Open Science (COS) released the worrying results of its eight-year $US 1.5 million Reproducibility Project: Cancer Biology study. Done in collaboration with research marketplace Science Exchange, independent scientists found that the odds of replicating results of 50 preclinical experiments from 23 high-profile published studies were no better than a coin toss.

Praise and controversy have followed the project from the beginning. The journal Nature applauded the replication studies as “the practice of science at its best.” But the journal Science noted that reactions from some scientists whose studies were chosen ranged from “annoyance to anxiety to outrage,” impeding the replications. Although none of the original experiments was described in enough detail to allow scientists to repeat them, a third of the original authors were uncooperative, and some were even hostile when asked for assistance.

COS executive director Brian Nosek cautioned that the findings pose “challenges for the credibility of preclinical cancer biology.” In a tacit acknowledgement that biomedical research has not been universally rigorous or transparent, the American National Institutes of Health (NIH), the largest funder of biomedical research in the world, has announced that it will raise requirements for both of these qualities.

I have taught classes and written about good scientific practice in psychology and biomedicine for over 30 years. I’ve reviewed more grant applications and journal manuscripts than I can count, and I’m not surprised.


Controlling for Bias


The twin pillars of trustworthy science — transparency and dispassionate rigour — have wobbled under the stress of incentives that enhance careers at the expense of reliable science. Too often, proposed preclinical studies — and surprisingly, published peer-reviewed ones — don’t follow the scientific method. Too often, scientists do not share their government-funded data, even when required by the publishing journal.

Many preclinical experiments lack the rudimentary controls against bias that are taught in the social sciences, though rarely in biomedical disciplines such as medicine, cell biology, biochemistry and physiology. Controlling for bias is a key element of the scientific method because it allows scientists to disentangle experimental signal from procedural noise.

Confirmation bias, the tendency to see what we want to see, is one type of bias that good science controls by “blinding.” Think of the “double-blind” procedures in clinical trials in which neither the patient nor the research team knows who is getting the placebo and who is getting the drug. In preclinical research, blinding experimenters to samples’ identities minimizes the chance that they will alter their behaviour, however subtly, in favour of their hypothesis.

Seemingly trivial differences, such as whether a sample is processed in the morning or afternoon or whether an animal is caged in the upper or lower row, can also change results. This is not as unlikely as you might think. Moment-to-moment changes in the micro-environment, such as exposure to light and air ventilation, for example, can change physiological responses.

If all animals who receive a drug are caged in one row and all animals who do not receive the drug are caged in another row, any difference between the two groups of animals may be due to the drug, to their housing location or to an interaction between the two. You can’t honestly choose between the alternative explanations, and neither can the scientists.

Randomizing sample selection and processing order minimizes these procedural biases, makes the interpretation of the results clearer, and makes them more likely to be replicated.

Many of the replication experiments blinded and randomized, but it’s not known if the original experiments did. All that is known is that for the 15 animal experiments, only one of the original studies reported randomization and none reported blinding. But it would not be surprising if many of the studies neither randomized nor blinded.


Study Design and Statistics


According to one estimate, over half of the one million articles published each year have biased study designs, contributing to 85 per cent of US$100-billion spent each year on (mostly preclinical) research being wasted.

In a widely reported commentary, industry scientist and former academic Glenn Begley reported being able to reproduce the results of only six of 53 academic studies (11 per cent). He listed six practices of reliable research, including blinding. All six of the studies that replicated followed all six practices. The 47 studies that failed to replicate followed few or, sometimes, none of the practices.

Another way to bias findings is by misusing statistics. As with blinding and randomization, it’s not known which, if any, of the original studies in the reproducibility project misused statistics, because of the studies’ lack of transparency. But that, too, is common practice.

A dictionary of terms describes a slew of poor data analysis practices that can manufacture statistically significant (but false) findings, such as HARKing (Hypothesizing After the Results are Known), p-hacking (repeating statistical tests until a desired result is produced) and following a series of data-dependent analysis decisions known as a “garden of forking paths” to publishable findings.

These practices are common in biomedical research. Decades of pleas from methodologists, and an unprecedented statement from the American Statistical Association to change data analysis practices, however, have gone unheeded.


A Better Future


Those who are anti-science should not take heart in these findings. Preclinical science’s accomplishments are real and impressive. Decades of preclinical research led to the development of the COVID-19 mRNA vaccines, for example. And most scientists are doing the best they can within a system that rewards quick flashy results over slower reliable ones.

But science is done by humans with all the strengths and weaknesses that go with it. The trick is to reward practices that produce trustworthy science and to censor practices that do not, without killing innovation.

Changing incentives and enforcing standards are the most effective ways to improve scientific practice. The goal is to improve efficiency by ensuring scientists who value transparency and rigour over speed and flash are given a chance to thrive. It’s been tried before, with minimal success. This time may be different. The Reproducibility Project: Cancer Biology study and the NIH policy changes it prompted may be just the push needed to make it happen.The Conversation

Robert Nadon, Associate Professor, Department of Human Genetics, Faculty of Medicine, McGill University