The incentive challenge - Big Data for AdvancingDementia Research

Recommendation: The incentives challenge needs to be addressed to make building and sharing resources worthwhile. Acknowledgement may be given, for example, through providing a citation for the dataset that counts for aca-demic metrics, or recognising contributions to building resources in hiring and promotion. Moreover, a model of adding data and feeding it back to the main resource may help to establish a win-win situation for data collectors and users.

As shown throughout this report, the question of reward is fundamental to the question of sharing and opening up data, going beyond the technical and organi-sational aspects of data sharing:

In the end, if a principal investigator wants to share, and can find a website to support their data, they can share. But there is no reward for data sharing.

It’s more work. Most investigators just want to get their papers out.

–Michael Weiner

Make building and sharing attractive

Currently, there are few incentives to share data the way academia is set up, with recognition being mostly attributed to publishing papers. Incentives are a crucial underlying mechanism both for building resources of future value and for sharing data more generally:

So we are talking about very expensive studies. And when you get funding for a big study like a study of dementia, longitudinal with a huge amount of people, people are very concerned to publish as much as possible but you are not really willing to give away a dataset to make others do what they want to. [...] If you are the principal investigator you want to have the control at least for several years to do what you have planned to do.

–Linda Hassing As discussed earlier, funders may play an important role in making sure that data are made available. In general, creating very clear arrangements upfront that data have to be shared may solve the problem:

I think the funders can have a really important role to play by putting pressure on researchers that they give money to share their data, and I don’t mean a clause 15.8.3 that says “You will share data”, I mean a discussion upfront in person, “You understand you only get this funding if you make your data open”. That – and funding to enable it to happen, as sharing data properly is not cost free – would help. I don’t see any reason why that shouldn’t happen.

–Simon Lovestone Potentially funders could even consider creating financial incentives to sharing:

You need to have a real lever to encourage academic data sharing. So, for example, some of the funding bodies at the moment withhold some (e. g. 10%) of the grants until you send in your final report, and you could use the same approach to encourage timely data sharing: academic researchers wouldn’t get a chunk of the cash from an academic grant unless they put the data into a trusted third party. I think it has to be that brutal.

–Derek Hill However, one has to keep in mind that often funds are insufficient for cleaning the data properly before being released – which may be exacerbated by holding back part of the grant. Similarly, structures to create pressure alone will not help.

At the core, researcher acknowledgement will need to be addressed:

Some of the pressure [of funders] making data from resources freely available

huge investment by the researchers who built the resource. Funders and various other people talk about how this work should be acknowledged but they don’t actually say how it will be acknowledged. [...] So I think that unless that is addressed you will get coercion but you won’t get collaboration generally in the concept of wide access.

–Sir Rory Collins One further way of making sharing attractive may be to create a mutual value-add, a win-win situation, through enhancements that the data user makes to the overall resource. This has been the practice in several informal collaborations:

[We] struck up a relationship with the PI by introducing ourselves and saying,

“We’d love to generate this data on your cohort and pay for that,” in order to buy our way into a collaboration. [...] So we offered to generate protein data on a subset [...] to increase their coverage in terms of the data that they have, and in return they allowed us to perform an analysis to get a first authorship paper and then to get leading or highly ranked authorships on follow up.

–Richard Dobson As outlined earlier, UK Biobank is in the process of institutionalising this form of data enhancement in order to keep developing the resource with external funding, and then make enhanced data available to others. These enhancements may be managed as a separate addition to the main dataset, with other researchers having the opportunity to recreate it if they suspect that there may be errors in it.

Manage acknowledgement

Much of the underlying issues in relation to data sharing are about incentives. In academia particularly, collected data are one of the key assets for a researcher:

People’s careers are tied up in this, I absolutely understand that, and I don’t think we as a community have paid enough attention to how to deal with that. We’re very good about saying, you know, “It’s not your data, give us your data, I want access to your data”, but actually how are we going to respect the time that these people have spent on acquiring this data?

–Simon Lovestone The challenge here is that while the academic reward system works based on publi-cations, many interviewees expressed it would neither be fair that those creating resources would suddenly have “300 or 400 publications overnight” due to being co-author on publications which they only provided the data for, while authorship would also create a sense of necessary quality checking of the paper, which again may not be scalable:

For authorship, you can’t just be giving data, you also really should give some sort of an input, and you should also be helping with the writing up the data and things like that. So we generally don’t think it’s enough just to collect the data, but you should always be asked if you’re in, if you are willing to analyse and give comments and suggestions for the paper. And also that you agreed on the final version.

–Ingmar Skoog Our Swedish case studies also showed that one of the reasons for relying on the collaborative approach of releasing data may be a lack of researcher acknowledge-ment in other modes of releasing the data:

PIs are putting much effort in establishing and maintaining high-quality co-horts. So how to acknowledge these efforts when data is shared is an impor-tant question. And I think that’s why so far we have had the tradition that it’s a collaborative approach if someone wants to use the data and there has been not so much open access data.

–Miia Kivipelto Of course, acknowledgement is not only needed for sharing datasets that have al-ready been collected, but especially for large cohort studies which require substan-tial investments by people involved in them without immediate rewards. However, indirect benefits to creating resources such as knowing the resource very well and shaping it exist, but work in a much more implicit long-term way:

I have had to think quite carefully how I ensure that [junior researchers] get personal recognition and career progression through doing this work. Part of that comes from being involved with a big resource like UK Biobank and being able to say that they have worked in the UK Biobank team, because that has a certain cachet. Part of it comes through ensuring that not only are they involved in resource building but that some of their time can be used for conducting research. And part of it comes because if you are involved in building a resource you tend to have a better idea of what it is valuable for, in terms of research.

–Cathie Sudlow Beyond what is already being done – mentioning the data sharing initiative in the acknowledgement section, abstract or keywords – other ways may help to ensure better individual acknowledgement. For example, Rohlfing and Poline (2012) suggest publishing the dataset as papers are published, which may then get cited in the papers using the data and count towards academic performance metrics, which often include a mixture of publications and citations (such as the h-index).

In addition, other research has recommended to find distinct career paths for helping with curating data and making them available to a wider community (Howe et al., 2008), or treating data as equivalent to publications (Gardner et al., 2003), as well as creating additional incentives by recognising data sharing efforts in aca-demic hiring and promotion decisions beyond publications, journal impact factors and citations (Piwowar et al., 2008).

These measures may not transform the way academia works in the short term, but may gradually shift the balance for the academic community to acknowledge the effort that individuals have put into building large-scale resources.

Dans le document Big Data for AdvancingDementia Research (Page 87-91)