It’s the incentive structure, people! Why science reform must come from the granting agencies.

Another day, another New York Times report on bad practice in biomedical science. The growing problems with scientific research are by now well known: Many results in the top journals are cherry picked, methodological weaknesses and other important caveats are often swept under the rug, and a large fraction of findings cannot be replicated. In some rare cases, there is even outright fraud. This waste of resources is unfair to the general public that pays for most of the research.

The Times article places the blame for this trend on the sharp competition for grant money and on the increasing pressure to publish in high impact journals. While both of these factors certainly play contributing roles, the Times article misses the root cause of the problem. The cause is not simply that the competition is too steep. The cause is that the competition is shaped to point scientists in the wrong direction.

As many other observers have already noted, scientific journals favor surprising, interesting, and statistically significant experimental results. When journal editors give preferences to these types of results, it is not surprising that more false positives will be published by simple selection effects, and sadly it is not surprising that unscrupulous scientists will manipulate their data to show these types of results. These manipulations include selection from multiple analyses, selection from multiple experiments (the “file drawer” problem), and the formulation of ‘a priori’ hypotheses after the results are known. While the vast majority of scientists are honest individuals, these biases still emerge in subtle and often subconscious ways.

Scientists have known about these problems for decades, and there have been several well-intentioned efforts to fix them. The Journal of Articles in Support of the Null Hypothesis (JASNH) is specifically dedicated to null results. The Psych File Drawer is a nicely designed online archive for failed replications. PLoS ONE publishes papers based on the quality of the methods, and allows post-publication commenting so that readers may be alerted about study flaws. Finally, Simmons and colleagues (2011) have proposed lists of regulations for other journals to enforce, including minimum sample sizes and requirements for the disclosure of all variables and analyses.

As well-intentioned as these important (and necessary) initiatives may be, they have all failed to catch on. JANSH publishes a handful of papers a year, The Psych File Drawer only has nine submissions, and hardly anyone comments on PLoS ONE papers. To my knowledge, no journals have begun enforcing the lists of regulations proposed by Simmons et al.

What is most frustrating is that all of these outcomes were completely predictable. As any economist will tell you, it’s the incentive structure, people! Nobody publishes in JASNH because the rewards for publishing in high-impact journals are larger. Nobody puts their failed replications on Psych File Drawer or comments on PLoS ONE because online archive posts can’t be put on CVs. And no journal wants to impose burdensome regulations on scientists when the scientists could submit their papers elsewhere. Even if the journals did manage to impose the regulations, wouldn’t it be better if the career incentives of scientists were aligned with the interests of good science? Wouldn’t a more sensible incentive structure make the list of regulations unnecessary?

This is where the funding agencies need to come in. Or, more to the point, where we as scientists need to ask the funding agencies to come in. Granting agencies should reward scientists who publish in journals that have acceptance criteria that are aligned with good science. In particular, the agencies should favor journals that devote special sections to replications, including failures to replicate. More directly, the agencies should devote more grant money to submissions that specifically propose replications. Moreover — and this is a fairly radical step that many good scientists I know would disagree with — I would like to see some preference given to fully “outcome-unbiased” journals that make decisions based on the quality of the experimental design and the importance of the scientific question, not the outcome of the experiment. This type of policy naturally eliminates the temptation to manipulate data towards desired outcomes.

The mechanism could start with granting agencies making modest adjustments to grant scores for scientists who submit to good-practice journals. Over time, as scientists compete to submit to these journals, more of these journals will emerge by market forces. Journals that currently encourage bad practices may adjust their policies if they wish. Under the current system, there is simply no incentive for journals to adjust their policies.

Will this transition be easy? No. Will the granting agencies manage this perfectly? Probably not. But it is obvious to me that scientists alone cannot solve problem of publication bias, and that a push from the outside is needed. The proposed system may not be perfect, but it will be vastly better than the dysfunctional system we are working in now.

If you agree that the cause of bad science is a perverse incentive structure, and if you agree that reform attempts can only work if there is pressure from granting agencies, please pass this article around and contact your funding agency. Within each agency, reform will require coordination among several sub-agencies, so it might make most sense to contact the director.

Also, please see the FAQ, above, for continuously updated answers to questions.

About these ads

29 thoughts on “It’s the incentive structure, people! Why science reform must come from the granting agencies.

  1. I wonder if it would be going too far to have some mechanism whereby people could put in escrow (encrypted)
    – what experiment designs they are going to try
    – what analyses are going to be performed
    with the possibility of collecting addenda, and release the key to journal editors when submitting?

      • Yes, sort of like registered clinical trials, but with encryption. As with most good ideas, the journals aren’t going to do this on their own. But the NIH can use its power to apply some pressure.

      • You could have an authority dicatate what is researched if that occurs. That is just as intolerable.

    • I agree, this is a good idea, but it wouldn’t solve the file drawer problem because if they decided not to submit, the encryption would never be broken. I would say we ought to also put a time limit on it, such that after X years, the encryption is broken and it becomes public, whether published or not.

      Actually I’m not even sure it needs to be encrypted at all; if it were public from day 1, yes it would “tip off” your rivals about what you’re doing, but they’d be in the same boat, so it would all even out without being unfair.

      However, that’s up for debate; an encrypted escrow would be much better than nothing. I have written about this here: http://neuroskeptic.blogspot.co.uk/2012/04/fixing-science-systems-and-politics.html

  2. Hey Chris. FYI, the Simmons et al. paper is (2011), not (2001).

    More substantively: I think the principle of “follow the money” is well applied here. I do think the tenure process is important too — universities favor scientists who can pull in money, so that part works, but departments change slowly and it may be hard to socialize them out of rewarding high-impact papers rather than good-practice papers. (According to that NYT article, impact factor and retraction rate are correlated.) But tenure is much harder to approach in a top-down way, and in any case we should do what we can.

    I also think the sociology of this has to be managed somehow. As someone who’s been hosed on the job market for one year, one thing I’m constantly thinking about when I read papers like Joe’s or hear talks like the one Josh Carp gave at CNS this year is “If I do all that, how am I supposed to compete?” Because if I’m as unlucky as everyone else, I’m going to have problems with my registered data analysis or replication and, instead of burying them with dignity, everyone will hear about it. It really doesn’t work unless (a) everyone has to do it or (b) competition is somehow normalized between those who do and those who don’t. And it’s hard to hope for (b). I guess this amounts to saying that outcome-neutral publication venues may be surprisingly central to the success of any scheme like this. (However, I’m not sure that “market forces” apply to academic publishing at this point…)

  3. Typo fixed. Regarding the last paragraph: Yes, it’s a classic collective action problem. If all the agents took an action, the group as a whole would benefit. But the same action by an individual agent alone would not benefit that agent. I think this can only be solved if an external force (i.e. the NIH) adjusts the incentives so that the individual action alone is beneficial to the agent (i.e. be assigning bonus grant points to researchers who follow good practices or who submit to good-practice journals).

    Regarding tenure committees: I think you’re right that they’ll be a little sluggish but eventually follow the money.

  4. Great article, and It seems that it is mostly centered around the field of Psychology. Are similar problems occurring in other fields? I know that in Mathematics, this problem doesn’t really exist, because there are only two major journals and no one except mathematicians read them. That, and in mathematics, the methods are the results so to speak. Focusing on the first point, it could very well be that the popular appeal of Psychology has driven once-respectable journals to publish articles that show some “big” result, regardless of scientific accuracy. It’s this strange paradox where Psychology can use popular appeal to summon grant money, but that same popular appeal jeopardizes scientific integrity. It almost seems that Psychologists as a whole have to agree to pursue an evolving list of “important questions” and intentionally steer away from research that is merely titillating.

    • Josh – yes, my impression is that the problem is even worse in medical research. Part of that is because evil evil drug companies pay for much of the research and selectively publish results that show support for their products. But even outside of drug studies, it seems worse in medical research. I’m not sure why.

      • The trouble in biology in general, and biomedical in specific, is because these are labor intensive disciplines. Labs need lots and lots of grad students to function, and there are not nearly enough PI-level slots available. On my soccer team, 7/8 Ph.Ds and postdocs have firm plans to leave the lab bench because they just do not see a future in it. You must be extremely talented, extremely lucky, and/or willing to go to moral extremes in order to succeed.

        John Ionnidas is the most well-known guy exposing replication issues in biomedical sciences. An interesting summary of his work is here: http://www.theatlantic.com/magazine/archive/2010/11/lies-damned-lies-and-medical-science/8269/

        BTW, the evil drug companies are the very best scientists out there. They actually do the replications because they know they cannot trust the academic literature, and they need to know if a drug target is worth investing $1billion dollars in before things go too far on a faulty premise. Interestingly only 25% of results in the literature fully replicate in one pharma’s attempts: http://www.nature.com/nrd/journal/v10/n9/full/nrd3439-c1.html But while they’ve got all the good data, good luck getting them to share it with the public when it’s their primary competitive advantage.

    • I agree with you that research that is praised mostly for its titillation value is not going to lead to cumulative progress. My impression though is that psychology doesn’t even know what its “important questions” are. The field is far too fragmented. Open a psychology textbook, and you find a mish-mash of findings, and one textbook is very likely to cover a significantly different set of findings from another (that is intended for the same type of class). But open a physics or chemistry textbook, and the curriculum is pretty standard. So I’d like to know how the incentive structure is going to solve this problem.

  5. Chris – interesting stuff. I have a couple of disagreements though:
    – I think that you are doing the field a great disservice by implying that dishonest conduct is simply the tail end of a distribution of bad practice, rather than its own separate thing. Even if the practices (i.e. searching for a signfiicant result across many different analyses) might overlap, I think that the midsets are completely different. Placing them on the same spectrum impugns a lot of honest (but maybe misguided) scientists in a way that will ultimately not help your cause.
    – Are you convinced that whatever system is put in place of the current system (e.g., “quality-based” metrics) won’t be gameable just like the current systems are? It will probably just be a different set of people who know how to game them well.

    • Russ – I think you’re right that even though there might be spectrum of bad practice (ranging from mildly bad to really bad), there might be a difference in kind between dishonest scientists and scientists who are simply misguided, or well-intentioned scientists who occasionally fall victim to self deception. I will make some edits in the post to reflect this.

      • Russ – regarding your second point: All systems are gameable, but some systems are better than others. My hunch is that there are lots of systems that would be better than the status quo.

  6. Pingback: Outcome Unbiased Journals — Marginal Revolution

  7. Another journal aiming to publish articles that might not meet traditional peer review because of results that aren’t in the conventional sense surprising, interesting, significantly different, in the computational and biomedical sciences, is JSUR, the Journal of Serendipitous and Unexpected Results. http://www.jsur.org/

  8. Pingback: The Incentive structure « Åse Fixes Science

  9. Pingback: we’re experimenting! | hlanthorn

  10. If people only pursue the predicted/desired outcomes, why bother doing sciences… “By manipulating, I make what I desired to be truth, I publish it, and I persuade others to belief in such truth”, isn’t that a schizophrenic logic, a world of hallucinations and delusions? But people call it science… Those behaviors stain good science and good scientists.

  11. Pingback: More proposals to reform the peer-review system « Statistical Modeling, Causal Inference, and Social Science

  12. Pingback: Links for April 20, 2012 | KevinBondelli.com: Youth Vote, Technology, Politics

  13. Reblogged this on General Musing and commented:
    I discussed this same issue in Medicine sometime ago, if it were so that a solution is thought te have been found them the sampling rate should increase. This is a case of search satisfaction – you expected to find something found something so you stop searching rather than finishing your search. While in a larger sample set or more regression to the mean takes place, which means the results come closer to the average..

  14. Pingback: Around the Web: Persistent myths about open access scientific publishing, Prepping grad students for jobs and more : Confessions of a Science Librarian

  15. Good post. I don’t think grant agencies are necessarily the only ones who can do it, though. In the case of clinical trial pre-registration, it was a consortium of journal editors (ICMJE) who originally made it happen.

    I see no reason that couldn’t happen again.

    I absolutely agree that grant agencies need to do something about this; but I disagree that reform ‘must’ come from them.

    • Neuroskeptic – The fact that we’ve seen reform happen in some other fields without pressure from the grant agencies is a very good point. At the same time, though, I am not aware of any similar movement among journals in my field to address the publication bias, and I see no evidence of it happening anytime soon without outside pressure. I would be happen to be proven wrong though.

  16. that was an interesting read. you’re bringing up quite a wide range of problems that might require different solutions. my feeling is that getting funding incentives for some of the changes might help, but it’s not the only way. for example, if an open evaluation system for scientific papers provided ratings not only of “overall significance” but also of “empirical justification of claims”, this criterion might gain the importance it should have. — niko kriegeskorte

  17. There’s another funding agency bias you forget, especially in medical research. In this field sample size is critical, as results have to be tranferable to a population of 7 billion people (or a significant and define-able subset thereof). However if you say “I’ve got a drug or disease marker that I think might be useful” the funding bodies will initially only provide money for testing in a small number of patients due to costs of sample collection, ethical considerations and the cost of the experiment. You must then publish these data, for clinical trials parhaps even perform two or three ‘small cohort’ studies (hence the result must be positive) before you can get funding to do a decent population-based study on 1000s of patients where the real applicability to a general population can be assessed. This process inevitably leads to results which are positive in small studies failing to replicate in larger ones. Frequently in a study of 2000 individuals you can select many combinations of 50 cases which taken in isolation ‘prove’ effect – it’s simple statistics in that respect (on the flip side in a 2000 patient experiment where there is an effect you can select a subset of 50 where a change would NOT be statistically significant). The only way you remove this bias is to request that funder commit to doing the 2000+ patient experiments in the first instance but this is clearly impractical with limited budgets and would lead to fewer hypotheses being tested.Question is, which allows greater progress – testing of fewer hypotheses (new medicines) but getting more reliable information regarding utility to the general population allowing an accuate go-no go decision to be made, or testing of more hypohesis which gives a higher rate of false positive results but potentially increases the absolute number of true positives due to the number of hypotheses being tested?

  18. Pingback: Replication is the only solution to scientific fraud | Chris Chambers and Petroc Sumner | Old News

  19. Pingback: 8 Lessons from the Reproducibility Crisis | The File Drawer

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s