[Open] Science Sunday — 18.07.10

By Fabiana Kubke 19/07/2010

Thursday saw a meeting on ‘Data Matters: Making the Most of Publicly-Funded Research Data ‘ organised by the Ministry of Research Science and Technology. The event was tweeted under the #ResearchDataMatters hash-tag on twitter, and I wrote my notes on my FriendFeed page.

by Illustir on Flickr

The day consisted of a number of topical talks (great all of them) and a couple of brainstorming sessions by the individual tables. Julian Carver, who moderated the event did a wonderful job keeping us busy while sticking to the time schedule. It was indeed a great day filled with new ideas, and more importantly, new solutions.

It was clear that the room was filled with the vibe that opening the research data was not only important but also the direction in which we should move. The arguments in favour of this happening are quite compelling, and New Zealand can look inwards and abroad to find support for that position. There were also great examples of what New Zealand is doing in that respect, and that is also encouraging.

The central emerging theme that I think emerged from the day was that the questions about sharing data has moved from the if to the how domain. And the how is not an easy issue to solve, and one that occupies the time and thoughts of many advocates of open data. I think that these issues can be grouped into three broad categories: Ethical, Cultural, and those related to archiving.

Ethical issues:

In a way these are the ones that are relatively simpler to solve and probably encompass a narrow area of research primarily associated with Health (or other human) data. One of the concerns that was raised was that ethical approval and consent around the gathering of health data is bounded to specific studies that limit the ‘use’ of the data. A second concern is associated with privacy. I see these as relatively minor, since there are protocols in place for privacy, and ‘use’ can be redefined in the consent forms.

Cultural issues:

Cultural issues in the scientific community are a slightly higher hurdle to overcome, because it requires two things: a ‘buy in’ from the research community and a (I think) rather profound behavioural change that makes data archiving the default. There are heaps of issues around this, and I will probably leave it at that and come back to it on another post.

Archiving issues:

There was a general consensus that data should be shared. As Penny Carnaby said, if we invest in something because we think it is important, then we should also be thinking on how to preserve that knowledge. Or, what is the point of creating stuff if you then go ahead and delete it?

I also had the feeling that there was a general consensus that ultimately, it is not just about putting data on the web. Data is only useful if it can be discovered and as useful as how easily it can be re-purposed. But making data available in a meaningful reusable way is hard to do. Here is where my brain explodes, and where most of the talking centred around on the day.

There were a few things, however, that stuck with me and kept floating in my head as I took my flight back to Auckland.

One was a suggestion brought up by Andrew Treolar from the Australian National Data Service, about the need to make the data a ‘primary object’. Us researchers tend to think of the ‘paper’ as the end product, but he suggests this is a hierarchy that should also apply to the data. He suggests that data sets should be given a DOI, in the same way that manuscripts have, and this has several advantages. Not only does the data itself become a primary object, but the mechanisms to linking relationships between DOIs are already in place to create relationship and track citations between objects. DOIs have a further advantage and that is that attribution to the original source is inevitable. This idea solves at least in the interim some complex issues around data sharing.

A second point also brought up by Andrew Treolar, is that open data will probably be used to answer questions that are different to those for which the data was generated. This means that we researchers need to think of the description of the data beyond its original intention to facilitate re-purposing. And this is difficult, because how can I know what details will be needed when the question has yet to be posed? The minimum requirement would be to ensure that the data is properly described at least in terms of its origins and the steps through which it was obtained.

One of the things I also really took back with me was Penny Carnaby’s description of the work that the National Library of New Zealand has been doing around archiving of digital objects. She described the work done for the National Digital Heritage Archive (you can read about it here and here). The way I understand it, this system could provide viable solutions to some of the issues surrounding data archiving.

There is obviously a lot of work to be done, but it was encouraging to be in a room filled with people willing to be honest about the challenges, yet still enthusiastic about the road ahead. I will be interested in hearing what the follow-ups of Thursday’s meeting are, in particular the position of the funding bodies that were present in the room.

Megathanks to Jonathan Hunt and Julian Carver who made it possible for me to be there