For example, one major issue is the problem of publication bias. This describes the effect of journals favouring positive or "sexy" results over studies that get negative results or report finding no effect where previous results had reported effects. So what ends up happening is that ten researchers might choose to research whether X causes Y, and eight of them find no effect but two find an effect. This can happen due to bad methodology, fraud, experimenter bias, or just simply chance and statistical noise. But as a result of studies favouring positive findings, those two papers will get published and the eight negative findings will get rejected. Since they're never published, a review of the published literature will tell us that 100% of studies on the topic find a result, and without getting access to the unpublished studies we'd be unaware that only 20% found a result - which is much less impressive.
THE QRPs ARE COMING FROM INSIDE THE HOUSE
So far we can see that there is a serious issue here that needs addressing but it doesn't seem like all hope is lost. Things like publication bias, once identified, seem fairly easy to address - encourage journals to publish negative findings, start journals for negative findings, emphasise the importance of negative findings when teaching students to avoid the assumption that they're a waste of time, and so on. Most of these problems have relatively easy solutions because everyone agrees that it's bad science and weakens the confidence in our published results.
But what do we do when the people doing bad science think that they're doing good science?
Unlike blatant scientific misconduct, Questionable Research Practices (QRPs) are like the gray area of wrongdoing - it's not as if these researchers are faking data or inventing non-existent participants, but what they're doing is rationalising bad practices which could be justified under other conditions. For example, there are situations where some data points need to be excluded, like when there are equipment malfunctions or if there is a suspicion that the participant has lied about their responses, but these judgements need to be specified before the research begins. Often what happens is that researchers will see that the data don't support their hypothesis, realise that excluding the outliers would change the result to a positive one, and attempt to justify the decision after the fact. In the moment it can seem reasonable given that there can be valid reasons for excluding data, but if we're doing it because we know it'll give us the result we'd like then we're unfairly stacking the deck in favour of a specific outcome.
You can probably see why some people refer to these QRPs as "researcher degrees of freedom", as instead of fabricating data or acting in any directly improper way, it's more just a stretching of the truth. They're pushing the boundary of what is good practice in a way that often isn't even a conscious decision. And there's the problem - because it's not conscious, they don't think they're doing bad science. Even when these problems are identified and explained to them, many will still defend the practices and happily admit to using them in their research.
DATA FISHING: THE ONE THAT GOT AWAY
This entire post comes about because often in my discussions with other researchers about the nature of science and methodology, I will come across someone who seems completely baffled at my suggestion that there is a problem with changing your hypothesis to fit the data after you've collected it and reporting it as if that was your hypothesis all along. Their argument usually attempts to justify it by saying that the data stays the same, so it doesn't matter how you write it up. Intuitively this makes sense, but how valid is it? Not very. I think I'm supposed to ask questions like that and let you mull it over but I guess I'm not very good at creating suspense. Also, Dumbledore dies.
To figure out why it's not valid, it is helpful to look at Kerr's article on the topic. It's here that he coins the phrase "HARKing", which means "Hypothesising After The Results Are Known". I suppose he figured "HATRAKing" wasn't quite as catchy but the point of the concept is to describe the practice I mention above, where research is designed to test one hypothesis and upon finding that the data don't support it, the research is written as if it was designed to test the new hypothesis. Essentially this practice is just a form of data fishing; that is, the data is sifted through in search of an explanation that fits it, rather than coming up with an explanation first and empirically testing that explanation.
What is most disturbing about this practice is that not only do researchers report engaging in it, but (as Kerr notes) actual textbooks teaching students how to carry out scientific research are describing this practice as if its normal. Interestingly, the author of the textbook that Kerr discusses was written by Daryl Bem.
YOU DON'T NEED TO BE A PSYCHIC TO SEE WHERE THIS IS HEADING
Despite doing a lot of good research, Bem is based known for a 2010 paper where he published a series of studies that supposedly supported the existence of psi phenomena - specifically precognition, the ability to predict the future. And when I say that he published this paper, I don't mean he uploaded it to his blog or personal website, he was published by one of the leading psychology journals.
There were a number of problems with his paper, including the kinds of unusual statistical analyses he had chosen for unstated reasons, but one of the main ones was that he was accused of data fishing. There was an excellent formal response to Bem in this paper here (with decent writeups on the topic here], where they point out Bem's interesting approach to conducting science. In this book chapter he teaches students how to perform and write up their experiments:
There are two possible articles you can write: (1) the article you planned to write when you designed your study or (2) the article that makes the most sense now that you have seen the results. They are rarely the same, and the correct answer is (2).
The conventional view of the research process is that we first derive a set of hypotheses from a theory, design and conduct a study to test these hypotheses, analyze the data to see if they were confirmed or disconfirmed, and then chronicle this sequence of events in the journal article. If this is how our enterprise actually proceeded, we could write most of the article before we collected the data. We could write the introduction and method sections completely, prepare the results section in skeleton form, leaving spaces to be filled in by the specific numerical results, and have two possible discussion sections ready to go, one for positive results, the other for negative results. But this is not how our enterprise actually proceeds. Psychology is more exciting than that, and the best journal articles are informed by the actual empirical findings from the opening sentence. Before writing your article, then, you need to Analyze Your Data. Herewith, a sermonette on the topic.
If you see dim traces of interesting patterns, try to reorganize the data to bring them into bolder relief. If there are participants you don’t like, or trials, observers, or interviewers who gave you anomalous results, drop them (temporarily). Go on a fishing expedition for something—anything —interesting.
For me, I agree with the Wagenmakers paper where their position is essentially summarised as:
We agree with Bem in the sense that empirical research can benefit greatly from a careful exploration of the data; dry adherence to confirmatory studies stymies creativity and the development of new ideas. As such, there is nothing wrong with fishing expeditions. But it is vital to indicate clearly and unambiguously which results are obtained by fishing expeditions and which results are obtained by conventional confirmatory procedures. In particular, when results from fishing expeditions are analyzed and presented as if they had been obtained in a confirmatory fashion, the researcher is hiding the fact that the same data were used *twice*: first to discover a new hypothesis, and then to test that hypothesis. If the researcher fails to state that the data have been so used, this practice is at odds with the basic ideas that underlie scientific methodology (see Kerr, 1998, for a detailed discussion).
Instead of presenting exploratory findings as confirmatory, one should ideally use a two-step procedure: first, in the absence of strong theory, one can explore the data until one discovers an interesting new hypothesis. But this phase of exploration and discovery needs to be followed by a second phase, one in which the new hypothesis is tested against new data in a confirmatory fashion. This is particularly important if one wants to convince a skeptical audience of a controversial claim: after all, confirmatory studies are much more compelling than exploratory studies. Hence, explorative elements in the research program should be explicitly mentioned, and statistical results should be adjusted accordingly. In practice, this means that statistical tests should be corrected to be more conservative.
The part about using the data twice is what I'm concerned with, as it's a case of double-dipping where you not only use the data to create an explanation but then you use that same data as evidence that your explanation as correct. The easiest way to understand why this is a problem is to view it as an example of the Texas Sharpshooter fallacy - this fallacy is illustrated by imagining somebody who fires his gun at the side of a barn then paints bullseyes around his bullet holes and declares himself a sharpshooter. It looks impressive when you see all those perfect shots, until you realise that the targets were painted on after-the-fact.
ABOVE: Olympic committee ban shooting sports after all competitors inexplicably score perfect scores, resulting in everyone having to share the gold.
The same issue applies to science in the sense that a successful test of an explanation carries its weight as a result of the predictive power of that explanation. There's no real strength behind an explanation that is created simply to explain data we already know because it's easy to create such explanations, there are an infinite number of explanations for any data set and arbitrarily picking one doesn't help us get any closer to the truth. This is the problem that is particular concern to evolutionary psychology, where "just-so stories" are regularly criticised for the reasons I discuss above.
BUT ALL THE COOL KIDS ARE HARKING, WHY CAN'T I?
If all the cool scientists decided to jump off a bridge, would you want to do that as well? Of course not, but if they happened to be connected to bungy cords then maybe you would. The Devil is in the details, in that you will often see many good scientists doing something that looks close to HARKing - that is, they'll collect data and then create an explanation for it after the fact. This isn't a problem though.
The issue with HARKing is that the data are collected to test a specific hypothesis and then after the data is in, the paper is written as if their new hypothesis was what they were testing from the beginning. So there is a way to hypothesise after the fact which isn't a problem, and this is when no hypothesis is formed before data collection. This is an exploratory study and is explicitly described as a fishing expedition, with an explanation being formed after the fact in order to generate new research to test that explanation. The problem of data fishing only occurs when a study is supposed to be a confirmatory study (where you're testing a hypothesis) but your approach is that of an exploratory study (where you change the hypothesis to explain the data you collected).
If we want to address some of the problems with science, we need to acknowledge that a lot of the issues aren't done by men with curly mustaches, black top hats and capes - a lot of them are good scientists, who think they're doing good science, and are even describing such methods as good science in textbooks teaching students how to do good science. By being more aware of these problems, and our own unconscious biases, we might be able to preempt the problems with papers like Bem's, and instead of entertaining the possibility that psi abilities exist, we can challenge the methodology in ways that traditional criticisms would ignore.