Friday, 1 April 2016

Questionable Research Practices: Is data fishing a part of the scientific method?

There's been a lot of discussion lately regarding the "replication crisis" in science at the moment, where researchers have been testing Ioannidis' argument that most published research findings are false. The basic idea here being that there are a number of features inherent to scientific processes, as well as a collection of bad scientific practices, that culminate in a number of reported positive findings sneaking into the literature and not being corrected until it's already been cited as evidence of a false claim throughout entire fields of science.

For example, one major issue is the problem of publication bias. This describes the effect of journals favouring positive or "sexy" results over studies that get negative results or report finding no effect where previous results had reported effects. So what ends up happening is that ten researchers might choose to research whether X causes Y, and eight of them find no effect but two find an effect. This can happen due to bad methodology, fraud, experimenter bias, or just simply chance and statistical noise. But as a result of studies favouring positive findings, those two papers will get published and the eight negative findings will get rejected. Since they're never published, a review of the published literature will tell us that 100% of studies on the topic find a result, and without getting access to the unpublished studies we'd be unaware that only 20% found a result - which is much less impressive.

THE QRPs ARE COMING FROM INSIDE THE HOUSE

So far we can see that there is a serious issue here that needs addressing but it doesn't seem like all hope is lost. Things like publication bias, once identified, seem fairly easy to address - encourage journals to publish negative findings, start journals for negative findings, emphasise the importance of negative findings when teaching students to avoid the assumption that they're a waste of time, and so on. Most of these problems have relatively easy solutions because everyone agrees that it's bad science and weakens the confidence in our published results.

But what do we do when the people doing bad science think that they're doing good science?

Unlike blatant scientific misconduct, Questionable Research Practices (QRPs) are like the gray area of wrongdoing - it's not as if these researchers are faking data or inventing non-existent participants, but what they're doing is rationalising bad practices which could be justified under other conditions. For example, there are situations where some data points need to be excluded, like when there are equipment malfunctions or if there is a suspicion that the participant has lied about their responses, but these judgements need to be specified before the research begins. Often what happens is that researchers will see that the data don't support their hypothesis, realise that excluding the outliers would change the result to a positive one, and attempt to justify the decision after the fact. In the moment it can seem reasonable given that there can be valid reasons for excluding data, but if we're doing it because we know it'll give us the result we'd like then we're unfairly stacking the deck in favour of a specific outcome.

You can probably see why some people refer to these QRPs as "researcher degrees of freedom", as instead of fabricating data or acting in any directly improper way, it's more just a stretching of the truth. They're pushing the boundary of what is good practice in a way that often isn't even a conscious decision. And there's the problem - because it's not conscious, they don't think they're doing bad science. Even when these problems are identified and explained to them, many will still defend the practices and happily admit to using them in their research.

DATA FISHING: THE ONE THAT GOT AWAY

This entire post comes about because often in my discussions with other researchers about the nature of science and methodology, I will come across someone who seems completely baffled at my suggestion that there is a problem with changing your hypothesis to fit the data after you've collected it and reporting it as if that was your hypothesis all along. Their argument usually attempts to justify it by saying that the data stays the same, so it doesn't matter how you write it up. Intuitively this makes sense, but how valid is it? Not very. I think I'm supposed to ask questions like that and let you mull it over but I guess I'm not very good at creating suspense. Also, Dumbledore dies.

To figure out why it's not valid, it is helpful to look at Kerr's article on the topic. It's here that he coins the phrase "HARKing", which means "Hypothesising After The Results Are Known". I suppose he figured "HATRAKing" wasn't quite as catchy but the point of the concept is to describe the practice I mention above, where research is designed to test one hypothesis and upon finding that the data don't support it, the research is written as if it was designed to test the new hypothesis. Essentially this practice is just a form of data fishing; that is, the data is sifted through in search of an explanation that fits it, rather than coming up with an explanation first and empirically testing that explanation.

What is most disturbing about this practice is that not only do researchers report engaging in it, but (as Kerr notes) actual textbooks teaching students how to carry out scientific research are describing this practice as if its normal. Interestingly, the author of the textbook that Kerr discusses was written by Daryl Bem.

YOU DON'T NEED TO BE A PSYCHIC TO SEE WHERE THIS IS HEADING

Despite doing a lot of good research, Bem is based known for a 2010 paper where he published a series of studies that supposedly supported the existence of psi phenomena - specifically precognition, the ability to predict the future. And when I say that he published this paper, I don't mean he uploaded it to his blog or personal website, he was published by one of the leading psychology journals.

There were a number of problems with his paper, including the kinds of unusual statistical analyses he had chosen for unstated reasons, but one of the main ones was that he was accused of data fishing. There was an excellent formal response to Bem in this paper here (with decent writeups on the topic here], where they point out Bem's interesting approach to conducting science. In this book chapter he teaches students how to perform and write up their experiments:

There are two possible articles you can write: (1) the article you planned to write when you designed your study or (2) the article that makes the most sense now that you have seen the results. They are rarely the same, and the correct answer is (2). 

The conventional view of the research process is that we first derive a set of hypotheses from a theory, design and conduct a study to test these hypotheses, analyze the data to see if they were confirmed or disconfirmed, and then chronicle this sequence of events in the journal article. If this is how our enterprise actually proceeded, we could write most of the article before we collected the data. We could write the introduction and method sections completely, prepare the results section in skeleton form, leaving spaces to be filled in by the specific numerical results, and have two possible discussion sections ready to go, one for positive results, the other for negative results. But this is not how our enterprise actually proceeds. Psychology is more exciting than that, and the best journal articles are informed by the actual empirical findings from the opening sentence. Before writing your article, then, you need to Analyze Your Data. Herewith, a sermonette on the topic. 

 and

If you see dim traces of interesting patterns, try to reorganize the data to bring them into bolder relief. If there are participants you don’t like, or trials, observers, or interviewers who gave you anomalous results, drop them (temporarily). Go on a fishing expedition for something—anything —interesting. 

 For me, I agree with the Wagenmakers paper where their position is essentially summarised as:

We agree with Bem in the sense that empirical research can benefit greatly from a careful exploration of the data; dry adherence to confirmatory studies stymies creativity and the development of new ideas. As such, there is nothing wrong with fishing expeditions. But it is vital to indicate clearly and unambiguously which results are obtained by fishing expeditions and which results are obtained by conventional confirmatory procedures. In particular, when results from fishing expeditions are analyzed and presented as if they had been obtained in a confirmatory fashion, the researcher is hiding the fact that the same data were used *twice*: first to discover a new hypothesis, and then to test that hypothesis. If the researcher fails to state that the data have been so used, this practice is at odds with the basic ideas that underlie scientific methodology (see Kerr, 1998, for a detailed discussion). 

Instead of presenting exploratory findings as confirmatory, one should ideally use a two-step procedure: first, in the absence of strong theory, one can explore the data until one discovers an interesting new hypothesis. But this phase of exploration and discovery needs to be followed by a second phase, one in which the new hypothesis is tested against new data in a confirmatory fashion. This is particularly important if one wants to convince a skeptical audience of a controversial claim: after all, confirmatory studies are much more compelling than exploratory studies. Hence, explorative elements in the research program should be explicitly mentioned, and statistical results should be adjusted accordingly. In practice, this means that statistical tests should be corrected to be more conservative. 

The part about using the data twice is what I'm concerned with, as it's a case of double-dipping where you not only use the data to create an explanation but then you use that same data as evidence that your explanation as correct. The easiest way to understand why this is a problem is to view it as an example of the Texas Sharpshooter fallacy - this fallacy is illustrated by imagining somebody who fires his gun at the side of a barn then paints bullseyes around his bullet holes and declares himself a sharpshooter. It looks impressive when you see all those perfect shots, until you realise that the targets were painted on after-the-fact.

ABOVE: Olympic committee ban shooting sports after all competitors inexplicably score perfect scores, resulting in everyone having to share the gold.

The same issue applies to science in the sense that a successful test of an explanation carries its weight as a result of the predictive power of that explanation. There's no real strength behind an explanation that is created simply to explain data we already know because it's easy to create such explanations, there are an infinite number of explanations for any data set and arbitrarily picking one doesn't help us get any closer to the truth. This is the problem that is particular concern to evolutionary psychology, where "just-so stories" are regularly criticised for the reasons I discuss above.

BUT ALL THE COOL KIDS ARE HARKING, WHY CAN'T I?

If all the cool scientists decided to jump off a bridge, would you want to do that as well? Of course not, but if they happened to be connected to bungy cords then maybe you would. The Devil is in the details, in that you will often see many good scientists doing something that looks close to HARKing - that is, they'll collect data and then create an explanation for it after the fact. This isn't a problem though.

The issue with HARKing is that the data are collected to test a specific hypothesis and then after the data is in, the paper is written as if their new hypothesis was what they were testing from the beginning. So there is a way to hypothesise after the fact which isn't a problem, and this is when no hypothesis is formed before data collection. This is an exploratory study and is explicitly described as a fishing expedition, with an explanation being formed after the fact in order to generate new research to test that explanation. The problem of data fishing only occurs when a study is supposed to be a confirmatory study (where you're testing a hypothesis) but your approach is that of an exploratory study (where you change the hypothesis to explain the data you collected).

If we want to address some of the problems with science, we need to acknowledge that a lot of the issues aren't done by men with curly mustaches, black top hats and capes - a lot of them are good scientists, who think they're doing good science, and are even describing such methods as good science in textbooks teaching students how to do good science. By being more aware of these problems, and our own unconscious biases, we might be able to preempt the problems with papers like Bem's, and instead of entertaining the possibility that psi abilities exist, we can challenge the methodology in ways that traditional criticisms would ignore.

17 comments:

  1. Thanks for the article and links. Good to have you back!

    ReplyDelete
    Replies
    1. Thanks, Anon! Let me know if there are any topics that interest you, it might force me to put more effort in to maintaining my blog if I know someone is waiting on a response from me.

      Delete
    2. Yeah, no problem. As the title of your blog suggests, having some understanding of behaviorism can leave one feeling lonely, misunderstood and at a loss as to where to start (if it's even worth it or possible at all with finite amounts of time available), in correcting others misunderstandings, misinformation and dubious ways of explaining behavior. It's nice to read someone who gets this. Related to this post, perhaps something more on strengths and limitations of hypothetico-deductive vs inductive, group vs. (multiple) single subject designs, etc. What (the philosophy of science of) behaviorism cautions against in the search for explanations (or at least is clear on the need to be clear on explanation vs description)and why. And, what questions might behavioral beliefs and research methodologies not be best equipped to answer or investigate in the study of behavior (including cognition and language)?...also, do you have thoughts on RFT, functional contextualism, etc?

      Delete
    3. I’d be curious about your thoughts on RFT and FC as well. My thoughts have evolved on the issue since we last talked to being skeptical of RFT to really embracing it but not so much FC.

      However, I’d also be curious if you have heard of the other FC, the functional-cognitive framework that looks to combine radical behaviorism and some form of cognitivism led by Jan De Houwer.

      It’s an interesting framework that I’ve been chewing on. Here are some links on describing and clarifying the perspective.

      http://pps.sagepub.com/content/6/2/202.short

      http://www.ncbi.nlm.nih.gov/pubmed/26616481

      Delete
    4. Thanks for the suggestions, Anon! I'll have a think on them and if I can drum up some good thoughts then I'll try to construct an article around them.

      Delete
    5. Imad, or Mike Samsa if this is true for you as well, would you be willing to break down exactly what it is about functional contextualism you find problematic or disagreeable as a philosophy of science? I'm honestly just trying to better understand its possible strengths and limitations and haven't come across a good straightforward critique.

      Delete
    6. Anon,

      I should clarify that at some level, I consider myself a full adherent of functional contextualism as I see it as a refined and improved version of Skinner’s radical behaviorism. It’s in fact one of the most well laid out philosophical views in behavioral sciences and leads to great clarity of methods of science. Same can be said for the broader enterprise of Contextual Behavioral Science.

      What I disagree with are some of the claims made as part of FC regarding what science is and the philosophical implications of FC. These include things like FC’s “a-ontological” stance or the notion that there is no “truth” or science doesn’t discover real things about the world. These are not necessary absurd positions to hold but I don’t think FC argues for them (or argues for them well) but rather assumes them.

      These types of views also have led to proponents of FC to blur the position, sometimes as far as embracing post-modernism in terms of there is no “truth” but just what works for me. FC also mistakenly sometimes attributes these views to the classic pragmatists like Peirce, James and Dewey, who for the large part did not hold such views rather truth and science. In fact, the pragmatists views hold the remedy to these problems as they show how truth is relevant and essential to scientific inquiry and doesn’t lead to wild speculative metaphysics.

      I don’t know if that helps or makes it more confusing. Basically, as a practice, FC is a great guide but philosophy of science is for the most part, not about informing local methods in a field of science. It’s much more about studying the metaphysics, epistemology, ethics and critique of science and it’s in these later areas, FC fails in my view. However, I should add that FC is a living document and there are some prominent members in FC that are looking to fix the problems that I have identified here.

      The paper by James Herbert and Flavia Padovani "Contextualism Psychological Sceince and Question of Ontology” does a great job of analyzing this problem and showing why FC is mistaken in it’s stance of ignoring ontology.

      Hope that helps!

      Delete
    7. Thanks, Imad. That does help actually. I will read that article. I haven't read a lot on FC, but the a-ontological position is something in my limited and amateur reading I've interpreted to mean not that science doesn't discover apriori universal truths (under particular conditions) or to take a stance on whether it does, but that contextual behaviorists simply aren't interested in making such claims about the universe, rather they are simply seeking to achieve certain goals under particular conditions...though, if the outcomes of research are to be of use to others in their efforts to achieve similar goals, it would seem (to me as of now having not thought terribly deeply on the matter) to require that there are some laws being discovered...long story short, I should read more about this stuff to better understand it and I will start with the article you suggested. Thanks again.

      Delete
    8. Anon,

      I’m glad it was helpful. I have no problem with views that scientists are working towards their own goals under particular conditions and not concerned with broader issues of truth, etc. In fact, most scientists probably hold a similar view, not just FC proponents.

      I think for scientists such as myself though, I am interested in the goals that FC researchers pursue but I’m also interested in broader philosophical questions. For the latter question, I find the FC folks sometimes vary their response between “I’m not interested in that question” to “That question cannot be answered”. If one is not interested, that’s fine but I take issue with the second point because there are reasonable accounts of these issues that the FC folks don’t properly engage with.

      They also seem to deny the label of instrumentalism and anti-realism in favor of a-ontological. I’m confused by this tri-part distinction because to me FC is instrumentalism (or some form of methodological naturalism) but some in the community resist that label and say that a-ontology is based on learning that we cannot know anything about ontology. I think that is false, a misunderstanding of what science shows and what ontological implications it has.

      Let me know what you think when you read the article!

      Delete
    9. Hi again, Imad. I enjoyed the article. Thanks for recommending it! Again, I'm not terribly well read in these issues (though I do find them interesting when I take the time to read and think about them), but I agree with what you're saying and the arguments of the authors. I may be missing something, but I cannot see how research could successfully "work", be replicated, help others predict and influence behavior,without there being some type of "independent textured substratum" or "reality" or whatever arbitrarily applicable terms we wish to use. I don't believe I really understand the basis upon which those who would say "That question cannot be answered" hold this position though, so I may have to do a bit more reading to at least understand why they argue this at some point in the future...in between reading about things I could actually talk to other people about. I might start with the reply article. Anyway, thanks again for the recommendation!

      Delete
  2. Good article! (and welcome back)

    I agree with the fact that we need to be better addressing those who think they are doing good science. I’d probably add on that the conceptual knowledge on top of the research method and statistics weaknesses, as even well done studies often lead to absurd conclusions or are done within a incoherent framework.

    ReplyDelete
    Replies
    1. Hey Imad, it's been a while!

      Yes that's definitely a good point. I find a lot of people in science lack an understanding of what we're actually doing in science, so by streamlining the process and the philosophy underpinning our research we get outcomes like what you describe, absurd conclusions and incoherent frameworks.

      I think that's one of the advantages of working in fields like ours where there are debates over our validity and we have to defend ourselves, which means we have to dig deep and make sure we actually understand what we're trying to do. We don't have the privilege of historical streamlining, and we're constantly building our field every day.

      As for your other question, thanks for the links! I'd never actually heard of FC before so I'll have to read more into it but by the sounds of it they're just describing a current evolution in the field. Most researchers I know in behavioral psych are happy to adopt cognitive models, and the literature on the nature of reinforcement is currently very heavily influenced by that approach where they recognised that the old mechanistic, functional explanation isn't enough any more.

      I haven't looked at anything on RFT since the last time we spoke so I'm not sure I'd have anything to add, but I'll read up on FC and see if I have any new thoughts.

      Also, don't forget that I added you as a contributor a while ago, if you have anything you want to write then feel free to make a post!

      Delete
    2. I noticed my name was up there! I will have to contribute, I’m finishing my fellowship right now which is kicking my butt time wise but when I get a bit more free time, I’ll write something up.

      And FC is fairly new (maybe last 2 or 3 years) and is a subset of the contextual behavioral science work (the organization Steve Hayes founded).

      Delete
  3. This comment has been removed by the author.

    ReplyDelete
  4. First off, as a man with a curly mustache, a top hat, and more than one cape, I object to your concluding paragraph. Secondly, I think this is a great analysis, and the distinction between exploratory and confirmatory study is crucial. In fact, one would WANT data mining in exploratory contexts. The problem is that when all publications are supposed to introduce something new, and our judgment of what is or isn't an "Important Paper" is unrelated to our judgments of what other researchers might want to replicate, journals and reviewers somehow lose their ability to distinguish exploratory from confirmatory work. It is a mess!

    ReplyDelete
    Replies
    1. I can't see what you'd object to with the villain description, I know you spend your nights thinking up ways to annoy the Batman.

      But thanks, I'm glad you liked my analysis!

      Delete
  5. Another problem...

    "If all the cool scientists decided to jump off a bridge, would you want to do that as well? Of course not"

    Well.... what if they jumped off a bridge and landed in tenured, grant-funded positions at the bottom?!?

    ReplyDelete