In May last year, the New Zealand Herald ran an editorial in which it declared:

Science has been a black hole for taxpayers’ money. Governments of all stripes agree that science is something they should fund without knowing very much about it.

Ironically, the editorial went on to praise the virtues of the National Science Challenges (NSCs), which had been announced a few weeks later. Ironic, because the NSCs are shaping up to be one of the biggest black holes that science has sent the taxpayer’s way in a long time.

In this post I want to introduce the emerging science of “science policy”, which represents a new approach to evaluating the outcomes of science and innovation investment and offers hope for rescuing science funding from the Herald’s black hole.

The dark arts

Many governments invest considerable amounts in science and innovation, and it is generally agreed that this investment is of great benefit to society. But what types of investment work best? Would it be better to provide R&D tax credits for all firms, or should we fund specific projects in strategically important industries? Should we fund blue skies research at our universities or pour money into product development at Callaghan Innovation?

For the most part, we are not yet able to answer these questions with any rigor. Science and innovation policy as it is practised today is a dark art.

As with many empirical questions in social sciences, a good deal of the difficulty lies in distinguishing correlation from causation. When the government funds a science project, it will typically choose to fund the research groups with the best track records or the best ideas, preferably both – yet these are also the teams that are the most likely to succeed without government funding. Good research groups may press on regardless, or may find other means to support their work. Even if the projects that the government funds are successful, it can’t be sure whether the funding it provided was necessary for this success.

This prevents the government from putting a value on the research it funds, and makes it very hard to assess how good the decision-making processes it uses to allocate this funding might be.

Medicine was a similarly murky affair once upon-a-time, but the invention of the randomised double-blind controlled trial sixty years ago means that when your doctor prescribes a new drug, she can be confident of its effectiveness. In such a trial, some patients in a group are randomly selected to be given a treatment, while the remainder receive a placebo. Only at the end of the trial once the data is in is it revealed which patients received the treatment and which didn’t. Researchers can learn whether the drug caused any effect, because they can compare the outcomes of those patients who received the drug with those who didn’t (the control) without any bias.

Could this approach be used to evaluate the value of investments in science and innovation?

An unfortunate experiment

In mid-2012, the newly formed Ministry of Business, Innovation and Employment (MBIE) inadvertently conducted such a trial, albeit by accident. When it assessed the quality of the funding proposals that it had received, the Ministry failed to ensure that each proposal received an equal number of external peer reviews. Some proposals received just a single peer review while others received up to four.

As I wrote in a post last year, this exposed their funding decisions to a potential bias. Even if two proposals were of equal merit, the proposal that by chance received more reviews would also be more likely to have at least one negative review. A cautious Ministry might be reluctant to fund proposals that received a negative review, even if all others were positive. Proposals that received more reviews would then be less likely to be funded than equally good proposals that, by chance, received fewer.

Indeed, more than a third of the proposals that only received one review were funded, while only one quarter of those that received two or more were successful. Was MBIE too conservative in its funding decisions?

To answer this question, we need to know how likely it is that this could have been generated by chance in the absence of bias. It turns out that without bias, one in every twelve funding rounds would produce such a skewed result, so while one might be suspicious, the data does not allow us to draw a solid statistical conclusion. Nevertheless, this example illustrates how we might use randomness to evaluate the effectiveness of our decision-making processes.

From an art to a science
While unintentional experiments such as this can reveal interesting information about the quality of decision-making by funding agencies, it would be better to undertake such studies purposefully, rather than by accident.

There are methods for studying the effectiveness of our investments in science that are fairer than randomly allocating the number of external reviews. It is these new approaches, which make use of the big data sets that are increasingly becoming available, that are driving the science of science policy.

Such an approach was recently used to test the quality of decision-making by the US National Institutes of Health (NIH), which invests billions of dollars every year in medical research. The conclusion? Projects rated poorly by the NIH, but funded nonetheless, produced just as much impact as those with that were rated the best. This suggests that the NIH funding panels are choosing to support some proposals that turn out to have low impact, while rejecting other proposals that would have delivered higher impact. This is valuable information for an organisation that spends more than US$100 million dollars each year on evaluating proposals.

Closer to home, Adam Jaffe, Director of Wellington-based Motu Economic and Public Policy Research, is currently undertaking a similar study of Marsden funded projects. Using the discontinuity regression method, Jaffe is comparing the subsequent academic performance of those who just made it over the threshold for funding to that of those who just missed out*, on the assumption that differences in the quality of proposals and teams that are being compared will be small. Proposals that just missed out on funding are effectively being used as a control group for those that just made it.

Once the study is complete, Jaffe will be in a position to estimate the scientific impact that a Marsden grant generates. If he finds that the Marsden allocation process suffers from the same problems as that of the NIH, the fund may be able to take steps to improve this process and thereby increase its impact.

So far Jaffe’s study only considers publications and their citations, but with access to more data it should also be possible to assess some of the less tangible social and economic benefits that come from Marsden-funded research. The Marsden fund may eventually be able to determine whether the PhD students it supports go on to have more successful careers or found more companies than students funded by other scholarships. Evidence like this is the sort of thing that would persuade Treasury to put more money into blue skies research (or less, if the results are negative).

Keeping score
Jaffe is able to do this for the Marsden fund because it has been operating for 20 years. Over that time it has kept high-quality records of its decision-making processes: these records detail what was funded, what wasn’t funded, and why. Yet the Marsden fund represents less than 5% of New Zealand’s public spending on science and innovation, and unfortunately good records of the processes used to allocate the remaining 95% have not been, and are not being, kept.

It is even difficult to establish what it is that the government chose to fund, let alone what it chose not to.

This loss of information can be partly attributed to the volatility in the way science is funded in New Zealand, including the regular restructuring of funding agencies themselves (MoRST, FRST, MSI, MBIE, Callaghan Innovation, …) and the churn in the funding schemes they administer (PGSF, NERF, Research for Industry, Smart Ideas, …). In contrast, the Marsden fund has been managed continuously by the Royal Society of New Zealand using a relatively stable process for the last two decades.

There also seems to be a bureaucratic reluctance by the government agencies that administer these funds to collect and curate the sort of data that might be useful for for evaluation. In response to a recent query from New Zealand Association of Scientists President, Dr Nicola Gaston, concerning possible gender bias in its grant allocation processes, MBIE responded that

Gender information is not necessary for the function of allocating research funding

Unfortunately, international evidence suggests that women researchers in many countries tend to receive less funding than men. By not collecting data on gender, MBIE cannot know whether similar biases exists here. It may well be missing an opportunity both to increase the impact of the research it funds** and to remove one of the barriers that impede the careers of women scientists.

Even if it has no immediate use for it, MBIE should be collecting data where reasonable and practicable to enable future studies of impacts and funding processes.

Escaping the black hole
With new methodologies available such as discontinuity regression and a better understanding of the need to collect data, one would hope that within a few years we will be in a position to rigorously evaluate the impact of our newest funding mechanism, the National Science Challenges.

Sadly, this is unlikely to be possible.

The problem lies in the difficulty of identifying a control group for the NSCs: the way that they have been selected and established makes it very difficult to establish what the world would look like without them. Would the science proposed have been carried out anyway? Was the panel that chose the NSCs subject to bias? We will never know, because the processes used to choose the ten challenges and assemble the challenge teams have not been transparent. We have no records of challenges that weren’t chosen or team members that weren’t named on the challenge proposals.

For each individual challenge, MBIE notes***:

Because of the focus on ‘best teams’ an effective outcome for each Challenge will be to generate a single proposal – there can only be one ‘best team’

In other words, the NSC process has made it impossible to establish a control group by design. And unless those that are putting together the NSCs can outperform the NIH, it is very unlikely that the teams for each challenge will be the ‘best’.

The NSCs do represent a significant increase in funding for science in New Zealand, and there is a school of thought that we should just get on and make them work as best we can. I have much sympathy for this point of view, and have indeed pulled my sleeves up, together with a number of my colleagues, to put together a proposal for the “Science for Technological Innovation” challenge.

Yet at the same time I am aware that the design and implementation of the NSCs represents a wasted opportunity. Sir Peter Gluckman, the Prime Minister’s Chief Science Advisor, has called for the greater use of scientific evidence in government policy-making. I agree; it’s well past time that we started using evidence in making science policy.


* OK, it’s a bit more complicated but this gives you the basic idea.

** It is worth noting that the Marsden fund collects gender information and finds no bias in its allocation process.

*** No, this is not from a Joseph Heller novel.