On Friday 22 January Cambridge University invited our two main charity funders to discuss their views on data management and sharing with Cambridge researchers. David Carr from the Wellcome Trust and Jamie Enoch from Cancer Research UK came to the University to talk to our researchers.
The related blog ‘Charities’ perspective on research data management and sharing‘ summarises the presentations Jamie and David gave. After this event, a group of researchers from the School of Biological Sciences and from the School of Clinical Medicine at the University of Cambridge were invited to ask questions about the Wellcome Trust data management and sharing policy and CRUK data sharing and preservation policy directly of David and Jamie.
This blog is a summary of the discussion, with questions thematically grouped. These questions will be added to the list of Frequently Asked Questions on the University’s Research Data Management Website.
- It is not recommended that researchers simply share a link and release the data when requested. Research data should be available, accessible and discoverable.
- The first responsibility is to protect the study participants. The funders provide guidance documents on sharing of patient data. Ethics committees also provide advice and guidance on what data can be shared. In principle, patient data should be safeguarded, but this should not preclude sharing. There are models for managed access to data that allow personal/sensitive data to be shared for legitimate purposes in a safe and secure manner.
- The funders do not want to prevent new collaborations. When sharing data they recommend data generators provide a statement in the description of the data that they are willing to collaborate
- It is recognised that it is often appropriate for researchers to have a defined period of exclusive access to the data they generate, but this should be determined by disciplinary norms. Any exemptions or delays have to be justified on a case by case basis, ideally at the outset of the project.
- The funders expect research data that supports publications to be made accessible and publications should have a clear statement explaining how to access the underlying research data.
- However researchers need to decide what is useful to be shared considering the effort of preparing the data for deposit and of sharing the data. If nobody is going to use the data, sharing is not a good use of researcher’s time.
- Discipline-specific data repositories, where these exist, are recommended preferentially over general purpose or institutional repositories
- Biosharing is an excellent resource with references to discipline-specific metadata schemas.
- Staff members whose role is to manage data is an eligible cost on a grant
- There are no funds for sharing data from old projects, although there are exceptions on a case by case basis
- The funders are considering monitoring data management plans but their current primary goal is to encourage people to think about data management and sharing from the very start of the project
Access to research data
Q: Are funders benefiting from the expertise of organisations such as UK Data Service when providing advice on data access? UK Data Service has been managing controlled access to research data for a long time and it would be advantageous to benefit from their expertise.
A: Yes, we are in discussion with the UK Data Service. We are also working with the UK Data Service to consider whether it might be appropriate for hosting data from other disciplines beyond social science. We also believe there is significant scope to share lessons and best practices for data sharing between the social and biomedical sciences.
Q: Could we just share research data only when asked for it?
A: This is not a recommended solution: research data should be available, accessible and discoverable. Data access controls and criteria for what needs to happen for the access to be granted have to be made clear in metadata description.
Q: I have patient data which has to be stored in a secure space. I always say in my data management plan that I cannot share my data. I would like to get ethical guidance which will explain to me how to share these data. It is very easy to say that data cannot be shared. I would like to share my data, but I would like to do it properly. With patient data it is extremely difficult, especially with genomics data, where there is a risk that patients can be identified.
A: Sharing of clinical data is not easy. Both Wellcome Trust and Cancer Research UK are helping to drive a great deal of work which is considering access and governance models through which sensitive patient data can be made available for research in a safe, secure and trusted manner. They provide guidance documents on sharing of patient data. Safety of patients and patients’ data is important. Ethics committees also provide advice and guidance on what data can be shared.
Q: What about sharing of physical materials? I have received a request to share a culture derived from a patient material, but the Ethics Committee did not approve sharing of this material. What shall I do?
A (Peter Hedges, Head of Research Office): If your ethical approval says that you cannot share that material, you cannot share it. Your first responsibility is to protect your study participants.
Q: If I share my data via a repository and people can simply download my data, I can no longer collaborate with them to work on the data and I have lost the possibility of getting credit for my data.
A: Nobody wants to prevent new collaborations from happening. A solution might be to add a statement that you are willing to collaborate in the description of your data. Your data requestor might be interested in collaborating, simply because you know your data the best. Funders also expect that the data re-used by others is appropriately acknowledged/cited, and they want to ensure that due credit results from the secondary use of data.
Quality control of research data
Q: If researchers start sharing unpublished research data via data repositories there is a risk that these data will not be of good quality as they will not be peer-reviewed.
A: Authors of unpublished data can simply state in the data description that the item was not peer-reviewed. If applicable, funders also encourage reciprocal links between publications and supporting research data.
What data needs to be shared and when?
Q: If researchers start to share everything there will be a lot of useless data available in data repositories. How to prevent a flood of useless data on the internet?
A: We would like researchers to decide what data is useful to be shared. If nobody is likely to use the data, sharing is not a good use of researcher’s time. Repositories also need to make decisions over what is worth keeping over time.
Comment (Peter Hedges, Head of Research Office): The Research Council UK focuses on research data supporting publications and this is what we recommend to researchers: share research data which underpins publications.
Q: Are we expected to share large datasets resulting from bigger projects (databases, long-term datasets) or data supporting individual publications?
A: We expect research data that supports individual publications to be made available with a hyperlink to the data. We also want researchers to consider and plan more broadly how they can make data assets of value resulting from our funded research available to others in a timely and appropriate manner.
Q: What about images? Is it useful to share them? It involves a lot of time to organise images. Besides, a single confocal picture with multiple layers is 1GB. In theory it is possible to share all raw data and all raw images, but who would want to look at them? 10 figures of 10 images is already 100 GB of data. Where would I store all these images, who is going to use these data and how am I going to pay for this?
A: The effort of preparing the data for deposit and of sharing the data should be proportionate to the potential benefits of data sharing. Researchers need to decide what is useful to be shared, following disciplinary best practices and norms (recognising that disciplines are in very different places in terms of defining these).
Q: Is there a set amount of time for exclusive use of research data?
A: Researchers should adhere to disciplinary norms. For example, in genomics research data is frequently shared before publication (sometimes under a publication moratorium which protects the data generator’s right to first publication). Any exemptions or delays have to be justified on a case by case basis.
Comment (Peter Hedges, Head of Research Office): Research is competitive. Sometimes it might be useful for researchers to know who wants to get the access to data and what do they need them for.
Cost of data sharing
Q: Can I ask in my grant for a staff member to help me with data management?
A: Yes, this is an eligible cost on grant applications: you can request a salary to support a research data manager for your research project, as long as it is justified.
Q: According to CRUK policy, costs for data sharing can be budgeted in grant applications only from August 2015. What about research data from older projects, when these costs were not eligible in grant applications? Is there any transition fund available to pay for this?
A: Unfortunately, there are no additional funds to pay for these costs. Researchers who have older datasets that might be of significant value to the community should contact CRUK – all requests for support will be considered on a case by case basis.
Q: Wellcome Trust encourages data sharing and data re-use, but does not allow for costs of long-term data preservation to be budgeted in grant applications. This does not make sense to me.
A: We are still reviewing our policy on costs of data management and sharing and we might be revisiting this issue – however, it is problematic for us to consider estimated costs for preservation that extend before the life-time of the grant. Our understanding is that costs of long-term data preservation are often less significant than costs of initial data ingestion by the repository (and we will cover ingestion costs).
Q: Who is then going to pay for the long-term data storage?
A: Wellcome Trust funds some discipline-specific repositories, but this is done jointly with other funders. We support bigger undertakings and we are also working with partners to develop platforms for data sharing and discoverability in some priority areas (notably clinical trials). Cancer Research UK pays for some long-term storage options, if these are justified for particular needs of the project. These decisions are made on a case by case basis, depending on how the costs are justified and whether these are directly related to the scientific value of the project.
Q: At the moment there are many general purpose and institutional repositories, which are not well structured. To support efficient re-use of data it is important to use structured data repositories and adhere to metadata standards. What are funders’ opinions about this?
A: Wherever possible, discipline-specific data repositories should be used preferentially over general purpose or institutional repositories. Adherence to discipline-specific metadata standards is also encouraged. It has to be acknowledged that development of well-structured data repositories is very resource-intensive and not all disciplines have good quality repositories to support them. For example, it took over 30 years to adapt unified metadata standards at Cambridge Crystallographic Data Centre. The time need to properly solve problems should never be underestimated.
Q: Are funders planning to provide researchers with a list of recommended schemas for metadata?
A: Biosharing is an excellent resource with references to discipline-specific metadata schemas. It is a useful suggestion to include a reference to Biosharing on our website.
Q: Are you planning to monitor researchers’ adherence to data management plans? For example, the BBSRC does not have the manpower to check all data management plans manually, but they are planning to create a system to check if data has been uploaded automatically.
A: We are considering this. At the moment we require data management plans with the primary goal to encourage people to think about data management and sharing from the very start of the project.