As of yesterday I thought the debate about replication in psychology was converging on consensus in at least one respect. While there was still some disagreement about tone, basically everyone agreed that there was value in failed replications. But then this morning, Jason Mitchell posted this essay, in which he describes his belief that failed replication attempts can contain errors and therefore “cannot contribute to a cumulative understanding of scientific phenomena”. It’s hard to know where to begin when someone comes from a worldview so different from one’s own. Since there’s clearly a communication problem here, I’ll just give two examples to illustrate how I think about science.
- Example 1. A rigorous lab conducts an experiment using a measurement device that requires special care. The effect size is d=0.5. Later, a different lab with no experience using the device tries to quickly replicate the experiment and computes an effect size of d=0.0.
- Example 2. A small sample experiment in a field with a history of p-hacking shows an effect size of d=0.5. Another lab tries to replicate the study with a much larger sample and computes an effect size of d=0.0.
In both cases, I’d have subjective beliefs about the true effect size. For the first example, my posterior distribution might peak around d=0.4. For the second example, my posterior distribution might peak around d=0.1. In both cases, the replication would influence my posterior, but to varying degrees. In the first example, it would cause a small shift. In the second, it would cause a big shift. Reasonable people can disagree on the exact positions of the posteriors, but basically everyone ought to agree that our posteriors should incrementally adjust as we acquire new information, and that the size of these shifts should depend on a variety of factors, including the possibility of errors in either the original experiment or in the replication attempt. Maybe it’s because I’m stuck in a worldview, but none of this even seems very hard to understand.
Jason Mitchell sees things differently. For him, all failed replications contain “no meaningful evidentiary value” and “do not constitute scientific output”. I don’t doubt the sincerity of his beliefs, but I suspect that most scientists and nonscientists alike will find these assertions to be pretty bizarre. NHST isn’t the only thing causing the crisis in psychology, but it’s pretty clear that this is what happens when people get too immersed in it.