A few months ago, Sir Peter Gluckman made the observation in a discussion paper (“Which science to fund: is it time to review peer review?”) that
While scientists pride themselves on objectivity, there is surprisingly little in the way of objective assessment of the nature and quality of peer review processes for grant allocation.
Ironically, under-resourcing at the Ministry of Business, Innovation and Employment last year has provided us with an opportunity to put one aspect of the peer review of grant proposals to the test. In the midst of yet another restructuring, the Ministry was unable to run a complete peer review process for the 299 proposals it received last year. The results of this incomplete process allow us to put peer review to the test.
Sir Peter’s paper is particularly concerned that the peer review process used in allocating grants may lead to an overly conservative decisions being made:
… the most innovative research tends to involve intellectual risk and thus can invite criticism, it is generally accepted that the general processes of grant awarding bias decisions towards conservatism …
Sir Peter suggests that bias can arise because:
… simple but positive reviews are often discounted as if the reviewer has not been serious in his/her evaluation. Conversely, simple but negative reviews carry extra weight in tight funding systems with low success rates.
Last week, Radio New Zealand’s William Ray received information from the Ministry concerning the distribution of the number of reviews received by each proposal and the corresponding success rates. This data is shown in the plot below. In total, 298 proposals were sent out for review (one was ruled ineligible for other reasons), with each being subject to at least one review. The Ministry had aimed to obtain five reviews per proposal (three scientific reviews and two end user reviews), but in the end it only managed an average of 2.7.
Sir Peter’s hypothesis that negative reviews are weighted by assessment panels more heavily than those that are positive would mean that the more reviews a proposal receives, the less likely it is that it will be funded (all else being equal). The data is consistent with this, as nearly 35% of proposals that received one review were funded as opposed to around 25% of those that received more than one review. But is this difference statistically significant?
Because I am doing this at home on a Sunday, I will just use a one-sided 2-by-2 Fisher test. My null hypothesis is that there is no difference in the chances of success between proposals that receive one review and those that receive two or more. Applying the Fisher test gives me a p-value of 0.125, telling me that in the absence of bias, roughly one in eight funding rounds would produce a skew in success rate of the observed amount or more. This tells us that the observed difference cannot be regarded as statistically significant.
This does not allow us to rule out bias in peer review, of course, it just means that we can’t reject the hypothesis that it is absent. Thankfully we are told that the Ministry is better prepared this year to deal with the peer review process. With such a spread in the distribution of peer reviews, the process last year was very vulnerable to any bias in the way that panels weight peer reviews. If a peer review process is to be run, then the Ministry should strive to achieve a consistent number of reviews per proposal.
However, this does illustrate the possibility of conducting deliberate experiments that might allow us to test further for bias. Although not without difficulty and expense, it would be very interesting to compare the decision making of panels that received different numbers of peer reviews for the same set of proposals for instance. Let’s just hope we are not subject to further unplanned experiments by our Ministry!
Disclosure: I was a successful applicant in the funding round last year.