An unfortunate experiment in peer review

By Shaun Hendy 10/03/2013 11


A few months ago, Sir Peter Gluckman made the observation in a discussion paper (“Which science to fund: is it time to review peer review?”) that

While scientists pride themselves on objectivity, there is surprisingly little in the way of objective assessment of the nature and quality of peer review processes for grant allocation.

Ironically, under-resourcing at the Ministry of Business, Innovation and Employment last year has provided us with an opportunity to put one aspect of the peer review of grant proposals to the test. In the midst of yet another restructuring, the Ministry was unable to run a complete peer review process for the 299 proposals it received last year. The results of this incomplete process allow us to put peer review to the test.

Sir Peter’s paper is particularly concerned that the peer review process used in allocating grants may lead to an overly conservative decisions being made:

… the most innovative research tends to involve intellectual risk and thus can invite criticism, it is generally accepted that the general processes of grant awarding bias decisions towards conservatism …

Sir Peter suggests that bias can arise because:

… simple but positive reviews are often discounted as if the reviewer has not been serious in his/her evaluation. Conversely, simple but negative reviews carry extra weight in tight funding systems with low success rates.

The numbers

Last week, Radio New Zealand’s William Ray received information from the Ministry concerning the distribution of the number of reviews received by each proposal and the corresponding success rates. This data is shown in the plot below. In total, 298 proposals were sent out for review (one was ruled ineligible for other reasons), with each being subject to at least one review. The Ministry had aimed to obtain five reviews per proposal (three scientific reviews and two end user reviews), but in the end it only managed an average of 2.7.

image

Sir Peter’s hypothesis that negative reviews are weighted by assessment panels more heavily than those that are positive would mean that the more reviews a proposal receives, the less likely it is that it will be funded (all else being equal). The data is consistent with this, as nearly 35% of proposals that received one review were funded as opposed to around 25% of those that received more than one review. But is this difference statistically significant?

Because I am doing this at home on a Sunday, I will just use a one-sided 2-by-2 Fisher test. My null hypothesis is that there is no difference in the chances of success between proposals that receive one review and those that receive two or more. Applying the Fisher test gives me a p-value of 0.125, telling me that in the absence of bias, roughly one in eight funding rounds would produce a skew in success rate of the observed amount or more. This tells us that the observed difference cannot be regarded as statistically significant.

The conclusion

This does not allow us to rule out bias in peer review, of course, it just means that we can’t reject the hypothesis that it is absent. Thankfully we are told that the Ministry is better prepared this year to deal with the peer review process. With such a spread in the distribution of peer reviews, the process last year was very vulnerable to any bias in the way that panels weight peer reviews. If a peer review process is to be run, then the Ministry should strive to achieve a consistent number of reviews per proposal.

However, this does illustrate the possibility of conducting deliberate experiments that might allow us to test further for bias. Although not without difficulty and expense, it would be very interesting to compare the decision making of panels that received different numbers of peer reviews for the same set of proposals for instance. Let’s just hope we are not subject to further unplanned experiments by our Ministry!

 

Disclosure: I was a successful applicant in the funding round last year.


11 Responses to “An unfortunate experiment in peer review”

  • Chi-2 test for trent p=0.19.

    I’ve the data too and will be speaking with William. What do you think of the v large number of reviewers approached compared with the number that actually did a review? Is this a case of reviewer fatigue?

  • Yes, apparently MSI approached more than 6000 reviewers but only 371 agreed to review. I was approached to be an end user reviewer for the Biological Industries Fund, and perhaps not surprisingly, I turned them down. My guess is that many of their invitations to review were poorly targeted.

  • Hi Shaun, is there a greater breakdown for the 3+ group, and by reviewer type? The logits are pretty suggestive.
    BTW: I was under the impression that the target was for five science reviewers, with the two end user reviews being additional… even if it was “only” five total, it’s pretty grim that 45% of props got two or less, ouch.

  • No, I don’t have a further break down of the data at the moment – the graph shows what was supplied to William. It may be worth requesting this breakdown. There were only 796 reviews conducted so few proposals can have got more than four.

    And no, according to their documents, they were targeting 3 science reviewers and 2 end users.

  • Hi Felix,

    I did follow up with MoBI to get a breakdown of the reviews. They were only able to tell me the figures they provided included both science and end user reviews and that each proposal recieved at least one science review (i.e there were no proposals which received only an end user review).

  • […] An unfortunate experiment in peer review | A Measure of Science A few months ago, Sir Peter Gluckman made the observation in a discussion paper ( “Which science to fund: is it time to review peer review?”) that While scientists pride themselves on objectivity, there is surprisingly little in the way of objective assessment of the nature and quality of peer review processes for grant allocation. Share […]

Site Meter