Evidence-based science policy

By Shaun Hendy 11/06/2014 9


In May last year, the New Zealand Herald ran an editorial in which it declared:

Science has been a black hole for taxpayers’ money. Governments of all stripes agree that science is something they should fund without knowing very much about it.

Ironically, the editorial went on to praise the virtues of the National Science Challenges (NSCs), which had been announced a few weeks later. Ironic, because the NSCs are shaping up to be one of the biggest black holes that science has sent the taxpayer’s way in a long time.

In this post I want to introduce the emerging science of “science policy”, which represents a new approach to evaluating the outcomes of science and innovation investment and offers hope for rescuing science funding from the Herald’s black hole.

The dark arts

Many governments invest considerable amounts in science and innovation, and it is generally agreed that this investment is of great benefit to society. But what types of investment work best? Would it be better to provide R&D tax credits for all firms, or should we fund specific projects in strategically important industries? Should we fund blue skies research at our universities or pour money into product development at Callaghan Innovation?

For the most part, we are not yet able to answer these questions with any rigor. Science and innovation policy as it is practised today is a dark art.

As with many empirical questions in social sciences, a good deal of the difficulty lies in distinguishing correlation from causation. When the government funds a science project, it will typically choose to fund the research groups with the best track records or the best ideas, preferably both – yet these are also the teams that are the most likely to succeed without government funding. Good research groups may press on regardless, or may find other means to support their work. Even if the projects that the government funds are successful, it can’t be sure whether the funding it provided was necessary for this success.

This prevents the government from putting a value on the research it funds, and makes it very hard to assess how good the decision-making processes it uses to allocate this funding might be.

Medicine was a similarly murky affair once upon-a-time, but the invention of the randomised double-blind controlled trial sixty years ago means that when your doctor prescribes a new drug, she can be confident of its effectiveness. In such a trial, some patients in a group are randomly selected to be given a treatment, while the remainder receive a placebo. Only at the end of the trial once the data is in is it revealed which patients received the treatment and which didn’t. Researchers can learn whether the drug caused any effect, because they can compare the outcomes of those patients who received the drug with those who didn’t (the control) without any bias.

Could this approach be used to evaluate the value of investments in science and innovation?

An unfortunate experiment

In mid-2012, the newly formed Ministry of Business, Innovation and Employment (MBIE) inadvertently conducted such a trial, albeit by accident. When it assessed the quality of the funding proposals that it had received, the Ministry failed to ensure that each proposal received an equal number of external peer reviews. Some proposals received just a single peer review while others received up to four.

As I wrote in a post last year, this exposed their funding decisions to a potential bias. Even if two proposals were of equal merit, the proposal that by chance received more reviews would also be more likely to have at least one negative review. A cautious Ministry might be reluctant to fund proposals that received a negative review, even if all others were positive. Proposals that received more reviews would then be less likely to be funded than equally good proposals that, by chance, received fewer.

Indeed, more than a third of the proposals that only received one review were funded, while only one quarter of those that received two or more were successful. Was MBIE too conservative in its funding decisions?

To answer this question, we need to know how likely it is that this could have been generated by chance in the absence of bias. It turns out that without bias, one in every twelve funding rounds would produce such a skewed result, so while one might be suspicious, the data does not allow us to draw a solid statistical conclusion. Nevertheless, this example illustrates how we might use randomness to evaluate the effectiveness of our decision-making processes.

From an art to a science
While unintentional experiments such as this can reveal interesting information about the quality of decision-making by funding agencies, it would be better to undertake such studies purposefully, rather than by accident.

There are methods for studying the effectiveness of our investments in science that are fairer than randomly allocating the number of external reviews. It is these new approaches, which make use of the big data sets that are increasingly becoming available, that are driving the science of science policy.

Such an approach was recently used to test the quality of decision-making by the US National Institutes of Health (NIH), which invests billions of dollars every year in medical research. The conclusion? Projects rated poorly by the NIH, but funded nonetheless, produced just as much impact as those with that were rated the best. This suggests that the NIH funding panels are choosing to support some proposals that turn out to have low impact, while rejecting other proposals that would have delivered higher impact. This is valuable information for an organisation that spends more than US$100 million dollars each year on evaluating proposals.

Closer to home, Adam Jaffe, Director of Wellington-based Motu Economic and Public Policy Research, is currently undertaking a similar study of Marsden funded projects. Using the discontinuity regression method, Jaffe is comparing the subsequent academic performance of those who just made it over the threshold for funding to that of those who just missed out*, on the assumption that differences in the quality of proposals and teams that are being compared will be small. Proposals that just missed out on funding are effectively being used as a control group for those that just made it.

Once the study is complete, Jaffe will be in a position to estimate the scientific impact that a Marsden grant generates. If he finds that the Marsden allocation process suffers from the same problems as that of the NIH, the fund may be able to take steps to improve this process and thereby increase its impact.

So far Jaffe’s study only considers publications and their citations, but with access to more data it should also be possible to assess some of the less tangible social and economic benefits that come from Marsden-funded research. The Marsden fund may eventually be able to determine whether the PhD students it supports go on to have more successful careers or found more companies than students funded by other scholarships. Evidence like this is the sort of thing that would persuade Treasury to put more money into blue skies research (or less, if the results are negative).

Keeping score
Jaffe is able to do this for the Marsden fund because it has been operating for 20 years. Over that time it has kept high-quality records of its decision-making processes: these records detail what was funded, what wasn’t funded, and why. Yet the Marsden fund represents less than 5% of New Zealand’s public spending on science and innovation, and unfortunately good records of the processes used to allocate the remaining 95% have not been, and are not being, kept.

It is even difficult to establish what it is that the government chose to fund, let alone what it chose not to.

This loss of information can be partly attributed to the volatility in the way science is funded in New Zealand, including the regular restructuring of funding agencies themselves (MoRST, FRST, MSI, MBIE, Callaghan Innovation, …) and the churn in the funding schemes they administer (PGSF, NERF, Research for Industry, Smart Ideas, …). In contrast, the Marsden fund has been managed continuously by the Royal Society of New Zealand using a relatively stable process for the last two decades.

There also seems to be a bureaucratic reluctance by the government agencies that administer these funds to collect and curate the sort of data that might be useful for for evaluation. In response to a recent query from New Zealand Association of Scientists President, Dr Nicola Gaston, concerning possible gender bias in its grant allocation processes, MBIE responded that

Gender information is not necessary for the function of allocating research funding

Unfortunately, international evidence suggests that women researchers in many countries tend to receive less funding than men. By not collecting data on gender, MBIE cannot know whether similar biases exists here. It may well be missing an opportunity both to increase the impact of the research it funds** and to remove one of the barriers that impede the careers of women scientists.

Even if it has no immediate use for it, MBIE should be collecting data where reasonable and practicable to enable future studies of impacts and funding processes.

Escaping the black hole
With new methodologies available such as discontinuity regression and a better understanding of the need to collect data, one would hope that within a few years we will be in a position to rigorously evaluate the impact of our newest funding mechanism, the National Science Challenges.

Sadly, this is unlikely to be possible.

The problem lies in the difficulty of identifying a control group for the NSCs: the way that they have been selected and established makes it very difficult to establish what the world would look like without them. Would the science proposed have been carried out anyway? Was the panel that chose the NSCs subject to bias? We will never know, because the processes used to choose the ten challenges and assemble the challenge teams have not been transparent. We have no records of challenges that weren’t chosen or team members that weren’t named on the challenge proposals.

For each individual challenge, MBIE notes***:

Because of the focus on ‘best teams’ an effective outcome for each Challenge will be to generate a single proposal – there can only be one ‘best team’

In other words, the NSC process has made it impossible to establish a control group by design. And unless those that are putting together the NSCs can outperform the NIH, it is very unlikely that the teams for each challenge will be the ‘best’.

The NSCs do represent a significant increase in funding for science in New Zealand, and there is a school of thought that we should just get on and make them work as best we can. I have much sympathy for this point of view, and have indeed pulled my sleeves up, together with a number of my colleagues, to put together a proposal for the “Science for Technological Innovation” challenge.

Yet at the same time I am aware that the design and implementation of the NSCs represents a wasted opportunity. Sir Peter Gluckman, the Prime Minister’s Chief Science Advisor, has called for the greater use of scientific evidence in government policy-making. I agree; it’s well past time that we started using evidence in making science policy.

 

* OK, it’s a bit more complicated but this gives you the basic idea.

** It is worth noting that the Marsden fund collects gender information and finds no bias in its allocation process.

*** No, this is not from a Joseph Heller novel.


9 Responses to “Evidence-based science policy”

  • My experience of the Marsden fund, having submitted proposals and seen what’s be funded in my field disappoints me on so many levels. However one of the most disappointing features of the application process is the lack of feedback. I was interested in the last word of your sentence re the Marsden fund “Over that time it has kept high-quality records of its decision-making processes: these records detail what was funded, what wasn’t funded, and why.” If there are records of why proposals weren’t funded then distributing this to unsuccessful applicants would be helpful. I for one would be very interested in accessing this information. How is it that some people are allowed access to this while others aren’t?

  • One trivial point is that for successful evaluation of science policy the government needs to be explicit about it policy intent, and with that comes an obligation to be explicit about its intervention logic. What is the mischief it is trying to overcome and why will investing in science help.

    As evaluators well know so often the politics dictate that this stuff isn’t made too explicit so multiple constituencies can be managed.

    I think the NSC 10 “Science for Technological Innovation” was a classic case of this. The stated target was “enhancing the capacity .. to use .. sciences for economic growth” which suggests an intervention targeting the users and their relationship with the science system.

    This plays well with the business community and IMHO is an important issue to address, particularly when it comes to ensuring sufficient longer-term science investment beyond the capacity of industry to fund (the mischief).

    But in practice the process set up involved the science system developing projects on the basis of what they felt were good ideas. This plays well with some sectors of the science sector.

    If this particular NSC gets funded it will be hard to evaluate. Should it be evaluated against the stated objective, or the one implicitly condoned in implementation?

    [I too should note my involvement in the NSC proposal]

  • Craig: Sorry – I was a little glib (alliteration FTW). The Marsden fund keeps the panel scores, which is what makes it possible to use the discontinuity regression method. From memory, the Marsden fund did try releasing more detailed information about the panel’s ranking of proposals (percentiles) to applicants a few years ago, but a number of people strongly objected to this because it was also available to their employer.

    Feedback is available for second round proposals but I do think that in many cases it is very difficult to provide useful feedback to unsuccessful applicants, as there are many excellent proposals that are not funded.

    And finally, Adam Jaffe has access to the panel data because he is using it for research purposes, the outcome of which will be to help the Marsden fund improve its impact and processes.

  • It’s a pitty “why” it was rejected info isn’t collected and provided to applicants. Being at an institue that could only form a “mock” panel of one person every 4 or 5 years really restricts access to the only advice mechanism available of having a chat with someone from the panel that rejected your application. I have to say feedback from applications I have submitted overseas has been far more detailed that anything from a NZ funding body.

  • Craig: In many cases the “why” will be that “the panel thought the proposal was very good but there were six (or more) proposals that they thought were better”. Whether the panels are right or not is what Adam Jaffe’s study can help establish.

  • Nice post Shaun – in lieu of substantive comments, I will ask you to fix the typo in the title!

  • […] Shaun Hendy reports on an accidental experiment in the effects of the number of referees on grant de…. In mid-2012, the newly formed Ministry of Business, Innovation and Employment (MBIE) inadvertently conducted such a trial, albeit by accident. When it assessed the quality of the funding proposals that it had received, the Ministry failed to ensure that each proposal received an equal number of external peer reviews. Some proposals received just a single peer review while others received up to four. […]

  • […] Shaun Hendy reports on an accidental experiment in the effects of the number of referees on grant de…. In mid-2012, the newly formed Ministry of Business, Innovation and Employment (MBIE) inadvertently conducted such a trial, albeit by accident. When it assessed the quality of the funding proposals that it had received, the Ministry failed to ensure that each proposal received an equal number of external peer reviews. Some proposals received just a single peer review while others received up to four. […]

Site Meter