Replicate, replicate, replicate

By Eric Crampton 22/06/2015


Scott Alexander warned we should beware the man of one study. There’s a good reason for that: a lot of studies might not replicate. File drawer effects, p-hacking, honest errors and deliberate manipulation mean you ought to be somewhat sceptical of results from any one study.

My Canterbury colleague Bob Reed, along with Maren Duvendack and Richard Palmer-Jones, make the case for replication in the latest issue of Econ Journal Watch.

And I love that they open by citing Tullock.

In the post-World War II period, several scholars raised concerns about the quality of data and the validity of social and economic statistical analysis (Morgenstern 1950; Tullock 1959). Gordon Tullock was one of the first to draw attention to what is now commonly referred to as “the file drawer problem” (Rosenthal 1979): inconclusive findings are likely to be filed, while results that are statistically significant get published. Tullock also advocated replication: “The moral of these considerations would appear to be clear. The tradition of independent repetition of experiments should be transferred from physics and chemistry to the areas where it is now a rarity” (Tullock 1959, 593).

File drawer problems nest in file drawer problems though: confirmatory replications may be less likely to be published:

What can we learn from our analysis of replication studies? Most importantly, and perhaps not too surprisingly, the main takeaway is that, conditional on the replication having been published, there is a high rate of disconfirmation. Over the full set of replication studies, approximately two out of every three studies were unable to confirm the original findings. Another 12 percent disconfirmed at least one major finding of the original study, while confirming others (Mixed?). In other words, nearly 80 percent of replication studies have found major flaws in the original research.

Could this be an overestimate of the true rate of Type I errors in original studies? While the question is impossible to answer conclusively with our sample, there is some indication that this rate overstates the unreliability of original studies. The JAE is noteworthy in that it publishes many replications that consist of little more than the statement “we are able to reproduce the results,” as in Drukker and Guan 2003). This suggests that the JAE does not discriminate on the basis of whether the replication study confirms or disconfirms the original study. This contrasts with the American Economic Review, which has never published a replication that merely confirmed the original study. One may be tempted to take the JAE’s record as representative, and we see that the JAE’s rate of replications that disconfirm at least one major finding (that is, Negative? + Mixed?) is 65 percent (0.452+0.194). By any account, this is still a large number. It raises serious concerns about the reliability of published empirical research in economics. 

I wonder whether there’s a bigger problem in that studies more likely to be thought suspect might be subject to replication; in that case, the 65% in the JAE would be an overestimate.