Tag Archives: research data

In Conversation with the Wellcome Trust – sharing & managing research outputs

In July 2017, the Wellcome Trust updated their policy on the management and sharing of research outputs.  This policy helps deliver Wellcome’s mission – to improve health for everyone by enabling great ideas to thrive.  The University of Cambridge’s Research Data Management Facility invited Wellcome Trust to Cambridge to talk with their funded research community (and potential researchers) about what this updated policy means for them.  On 5th December in the Gurdon Institute Tea Room, the Deputy Head of Scholarly Communication Dr Lauren Cadwallader, welcomed Robert Kiley, Head of Open Research, and David Carr, Open Research Programme Manager, from the Wellcome’s Open Research Team. 

This blog summarises the presentations from David and Robert about the research outputs policy and how it has been working and the questions raised by the audience.

Maximising the value of research outputs: Wellcome’s approach

David Carr outlined key points about the new policy, which now, in addition to sharing openly publications and data, includes sharing software and materials as other valued outputs of research.

An outputs management plan is required to show how the outputs of the project will be managed and the value of the outputs maximised (whilst taking into consideration that not all outputs can be shared openly).  Updated guidance on outputs management plans has been published and can be found on Wellcome’s website.

Researchers are also to note that:

  • Outputs should be made available with as few restrictions as possible.
  • Data and software underlying publications must be made available at the time of publication at the latest.
  • Data relevant to a public health emergency should be shared as soon as it has been quality assured regardless of publication timelines.
  • Outputs should be placed in community repositories, have persistent identifiers and be discoverable.
  • A check at the final report stage, to ensure outputs have been shared according to the policy, has been introduced (recognising that parameters change during the research and management plans can change accordingly).
  • Of course, management and sharing of research outputs comes with a cost and Wellcome Trust commit to reviewing and supporting associated costs as part of the grant.

Wellcome have periodically reviewed take-up and implementation of their research outputs sharing and management policy and have observed some key responses:

  • Researchers are producing better quality plans; however, the formats and level of detail included in the plans do remain variable.
  • There is uncertainty amongst stakeholders (researchers, reviewers and institutions) in how to fulfil the policy.
  • Resources required to deliver plans are often not fully considered or requested.
  • Follow-up and reporting about compliance has been patchy.

In response to these findings, Wellcome will continue to update their guidance and work with their communities to advise, educate and define best practice.  They will encourage researchers to work more closely with their institutions, particularly over resource planning.  They will also develop a proportionate mechanism to monitor compliance.

Developing Open Research

Robert Kiley then described the three areas which the dedicated Open Research Team at Wellcome lead and coordinate: funder-led activities; community-led activities and policy leadership.

Funder-led activities include:

  • Wellcome Open Research, the publishing platform launched in partnership with F1000 around a year ago; here Wellcome-funded researchers can rapidly and transparently publish any results they think are worth sharing. Average submission to publication time for the first 100 papers published was 72 days – much faster than other publication venues.
  • Wellcome Trust is working with ASAP-Bio and other funders to support pre-prints and continues to support e-Life as an innovative Open Access journal.
  • Wellcome Trust will review their Open Access policy during 2018 and will consult their funded researchers and institutions as part of this process.
  • Wellcome provides the secretariat for the independent review panel for the com (CSDR) platform which provides access to anonymised clinical trial data from 13 pharmaceutical companies. From January 2018, they will extend the resource to allow listing of academic clinical trials supported by Wellcome, MRC, CRUK and Gates Foundation.  Note that CDSR is not a repository but provides a common discoverability and access portal.

Community-led activities

Wellcome are inviting the community to develop and test innovative ideas in Open Research.  Some exciting initiatives include:

  • The Open Science Prize: this initiative was run last year in partnership with US National Institutes of Health and Howard Hughes Medical Institute. It supported prototyping and development of tools and services to build on data and content.  New prizes and challenges currently being developed will build on this model.
  • Research Enrichment – Open Research: this was launched in November 2017. Awards of up to £50K are available for Wellcome grant-holders to develop Open Research ideas that increase the impact of their funded research.
  • Forthcoming: more awards and themed challenges aimed at Open Research – including a funding competition for pioneering experiments in open research, and a prize for innovative data re-use.
  • The Open Research Pilot Project: whereby four Wellcome-funded groups are being supported at the University of Cambridge to make their research open.

Policy Leadership

In this area, Wellcome Trust engage in policy discussions in key policy groups at the national, European and international level.  They also convene international Open Research funder’s webinars.  They are working towards reform on rewards and incentives for researchers, by:

  • Policy development and declarations
  • Reviewing grant assessment procedures: for example, providing guidance to staff, reviewers and panel members so that there is a more holistic approach on the value and impact of research outputs.
  • Engagement: for example, by being clear on how grant applicants are being evaluated and committing to celebrate grantees who are practicing Open Research. 

Questions & Answers

Policy questions

I am an administrator of two Wellcome Trust programmes; how is this information about the new policy being disseminated to students? Has it been done?

When the Wellcome Open Research platform was announced last year, there was a lot of communication, for example, in grants newsletters and working with the centres.

Further dissemination of information about the updated policy on outputs management could be realised through attending events, asking questions to our teams, or inviting us to present to specific groups.  In general, we are available and want to help.

Following this, the Office of Scholarly Communication added that they usually put information about things like funder policy changes in the Research Operations Office Bulletin.

Regarding your new updated policy, have you been in communication with the Government?

We work closely with HEFCE and RCUK. They are all very aware about what we aim to do.

One of the big challenges is to answer the question from researchers: “If we are not using a particular ‘big journal’ name, what are we using to help us show the quality of the research?”.

We have been working with other funders (including Research Councils) to look at issues around this.  Once we have other funders on board, we need to work with institutions on staff promotion and tenure criteria.  We are working with others to support a dedicated person charged with implementing the San Francisco Declaration on Research Assessment (DORA) and identify best practice.

How do you see Open Outputs going forward?

There is a growing consensus over the importance of making research outputs available, and a strong commitment from funders to overcome the challenges. Our policy is geared to openness in ways that maximise the health benefits that flow from the research we fund.

Is there a licence that you encourage researchers to use?

No. We encourage researchers to utilise existing sources of expertise (e.g. The Software Sustainability Institute) and select the licence most appropriate for them.

Some researchers could just do data collection instead of publishing papers. Will we have future where people are just generating data and publishing it on its own and not doing the analysis?

It could happen. Encouraging adoption of CRediT Taxonomy roles in publication authorship is one thing that can help.

Outputs Management Plans

How will you approach checking outputs against the outputs management plan?

We will check the information submitted at the end of grants – what outputs were reported and how these were shared – and refer back to the plan submitted at application. We will not rule out sanctions in the future once things are in place. At the moment there are no sanctions as it is premature to do this.  We need to get the data first, monitor the situation and make any changes later in the process.

What are your thoughts on providing training for reviewers regarding the data management plans as well as for the people who will do the final checks? Are you going to provide any training and identify gaps on research for this?

We have provided guidance on what plans should contain; this is something we can look at going forwards.

One of the key elements to the outputs management plan is commenting on how outputs will be preserved. Does the Wellcome Trust define what it means by long term preservation anywhere?

Long term preservation is tricky. We have common best practice guidelines for data retention – 10 years for regular data and 20 years for clinical research. We encourage people to use community repositories where these exist.

What happens to the output if 10 years have passed since the last time of access?

This is a huge problem. There need to be criteria to determine what outputs are worth keeping which take into account whether the data can be regenerated. Software availability is also a consideration.

Research enrichment awards

You said that there will be prizes for data re-use, and dialogue on infrastructure is still in the early stages. What is the timeline? It would be good to push to get the timeline going worldwide.

Research enrichment awards are already live and Wellcome will assess them on an ongoing basis. Please apply if you have a Wellcome grant. Other funding opportunities will be launched in 2018. The Pioneer awards will be open to everyone in the spring and it is aimed for those who have worked out ways to make their work more FAIR.  The same applies to our themed challenges for innovative data re-use which will also launch in the spring – we will identify a data set and get people to look at it.  For illustration, a similar example is The NEJM SPRINT Data Analysis Challenge.

Publishing Open Access

What proportion of people are updating their articles on Wellcome Open Research?

Many people, around 15%, are editing their articles to Version 2 following review. We have one article at Version 3.

Has the Wellcome Trust any plans for overlay journals, and if so, in which repository will they be based?

Not at the moment. There will be a lot of content being published on platforms such as Open Research, the Gates platform and others. In the future, one could imagine a model where content is openly published on these platforms, and the role of journals is to identify particular articles with interesting content or high impact (rather than to manage peer review).  Learned societies have the expertise in their subjects; they potentially have a role here, for example in identifying lead publications in their field from a review of the research.

Can you give us any hints about the outcome of your review of the Wellcome Trust Open Access policy? Are you going to consider not paying for hybrid journals when you review your policy?

We are about to start this review of the policy. Hybrid journals are on the agenda. We will try to simplify the process for the researcher.  We are nervous about banning hybrid journals.  Data from the last analysis showed that 70% of papers from Wellcome Trust grants, for which Wellcome Trust paid an article processing charge, were in hybrid journals.  So if we banned hybrid journals it would not be popular.  Researchers would need to know which are hybrid journals.  Possibly with public health emergencies we could consider a different approach.  So there is a lot to consider and a balance to keep.  We will consult both researchers and institutions as part of the exercise.  There is also another problem in that there is a big gap in choice between hybrid and other journals.

If researchers can publish in hybrid journals, would Wellcome Trust consider making rules regarding offsetting?

That would be interesting. However, more rules could complicate things as researchers would then also need to check both the journal’s Open Access policy and find out if they have an approved offset deal in place.

Open Data & other research outputs

What is your opinion on medical data? For example, when we write an article, we can’t publish the genetics data as there is a risk that a person could be identified.

Wellcome Trust recognise that some data cannot be made available. Our approach is to support managed access. Once the data access committee has considered that the requirement is valid, then access can be provided. The author will need to be clear on how the researcher can get hold of the data.  Wellcome Trust has done work around best practice in this area.

Does Open Access mean free access? There is a cost for processing.

Yes, there is usually a cost. For some resources, those requesting data do have to pay a fee. For example, major cohort studies such as ALSPAC and UK Biobank have a fee which covers the cost of considering the request and preparing the data.

ALSPAC is developing a pilot with Wellcome Open Research to encourage those who access data and return derived datasets to the resource, to publish data papers describing data they are depositing.  Because the cost of access has already been met, such data will be available at no cost.

Does the software that is used in the analysis need to be included?

Yes, the policy is that if the data is published, the software should be made available too. It is a requirement, so that everybody can reproduce the study.

Is there a limit to volume of data that can be uploaded?

Wellcome Open Research uses third party data resources (e.g. Figshare). The normal dataset limit there is 5GB, but both Figshare and subject repositories can store much higher volumes of data where required.

What can be done about misuse of data?

In the survey that we did, researchers expressed fears of data misuse. How do we address such a fear? Demonstrating the value of data will play a great role in this.  It is also hard to know the extent to which these fears play out in reality – only a very small proportion of respondents indicated that they had actually experienced data being used inappropriately.  We need to gather more evidence of the relative benefits and risks, and it could be argued that publishing via preprints and getting a DOI are your proofs that you got there first.

Published 26 January 2018
Written by Dr Debbie Hansen
Creative Commons License

Making the connection: research data network workshop

During International Data Week 2016, the Office of Scholarly Communication is celebrating with a series of blog posts about data. The first post was a summary of an event we held in July. This post looks at the challenges associated with financially supporting RDM training.

corpus-main-hallFollowing the success of hosting the Data Dialogue: Barriers to Sharing event  in July we were delighted to welcome the Research Data Management (RDM) community to Cambridge for the second Jisc research data network workshop. The event was held in Corpus Christi College with meals held in the historical dining room. (Image: Corpus Christi )

RDM services in the UK are maturing and efforts are increasingly focused on connecting disparate systems, standardising practices and making platforms more usable for researchers. This is also reflected in the recent Concordat on Research Data which links the existing statements from funders and government, providing a more unified message for researchers.

The practical work of connecting the different systems involved in RDM is being led by the Jisc Research Data Shared Services project which aims to share the cost of developing services across the UK Higher Education sector. As one of the pilot institutions we were keen to see what progress has been made and find out how the first test systems will work. On a personal note it was great to see that the pilot will attempt to address much of the functionality researchers request but that we are currently unable to fully provide, including detailed reporting on research data, links between the repository and other systems, and a more dynamic data display.

Context for these attempts to link, standardise and improve RDM systems was provided in the excellent keynote by Dr Danny Kingsley, head of the Office of Scholarly Communication at Cambridge, reminding us about the broader need to overhaul the reward systems in scholarly communications. Danny drew on the Open Research blogposts published over the summer to highlight some of the key problems in scholarly communications: hyperauthorship, peer review, flawed reward systems, and, most relevantly for data, replication and retraction. Sharing data will alleviate some of these issues but, as Danny pointed out, this will frequently not be possible unless data has been appropriately managed across the research lifecycle. So whilst trying to standardise metadata profiles may seem irrelevant to many researchers it is all part of this wider movement to reform scholarly communication.

Making metadata work

Metadata models will underpin any attempts to connect repositories, preservation systems, Current Research Information Systems (CRIS), and any other systems dealing with research data. Metadata presents a major challenge both in terms of capturing the wide variety of disciplinary models and needs, and in persuading researchers to provide enough metadata to make preservation possible without putting them off sharing their research data. Dom Fripp and Nicky Ferguson are working on developing a core metadata profile for the UK Research Data Discovery Service. They spoke about their work on developing a community-driven metadata standard to address these problems. For those interested (and Git-Hub literate) the project is available here.

They are drawing on national and international standards, such as the Portland Common Data Model, trying to build on existing work to create a standard which will work for the Shared Services model. The proposed standard will have gold, silver and bronze levels of metadata and will attempt to reward researchers for providing more metadata. This is particularly important as the evidence from Dom and Nicky’s discussion with researchers is that many researchers want others to provide lots of metadata but are reluctant to do the same themselves.

We have had some success with researchers filling in voluntary metadata fields for our repository, Apollo, but this seems to depend to a large extent on how aware researchers are of the role of metadata, something which chimes with Dom and Nicky’s findings. Those creating metadata are often unaware of the implications of how they fill in fields, so creating consistency across teams, let alone disciplines and institutions can be a struggle. Any Cambridge researchers who wish to contribute to this metadata standard can sign up to a workshop with Jisc in Cambridge on 3rd October.

Planning for the long-term

A shared metadata standard will assist with connecting systems and reducing researchers’ workload but if replicability, a key problem in scholarly communications, is going to be possible digital preservation of research data needs to be addressed. Jenny Mitcham from the University of York presented the work she has been undertaking alongside colleagues from the University of Hull on using Archivematica for preserving research data and linking it to pre-existing systems (more information can be found on their blog.)

Jenny highlighted the difficulties they encountered getting timely engagement from both internal stakeholders and external contractors, as well as linking multiple systems with different data models, again underlining the need for high quality and interoperable metadata. Despite these difficulties they have made progress on linking these systems and in the process have been able to look into the wide variety of file formats currently in use at York. This has lead to conversations with the National Archive about improving the coverage of research file formats in PRONOM (a registry of file formats for preservation purposes), work which will be extremely useful for the Shared Services pilot.

In many ways the project at York and Hull felt like a precursor to the Shared Services pilot; highlighting both the potential problems in working with a wide range of stakeholders and systems, as well as the massive benefits possible from pooling our collective knowledge and resources to tackle the technical challenges which remain in RDM.

Published 14 September 2016
Written by Rosie Higman
Creative Commons License

Beyond compliance – dialogue on barriers to data sharing

Welcome to International Data Week. The Office of Scholarly Communication is celebrating with a series of blog posts about data, starting with a summary of an event we held in July.

JME_0629.jpgOn 29 July 2016 the Cambridge Research Data Team joined forces with the Science and Engineering South Consortium to organise a one day conference at the Murray Edwards College to gather researchers and practitioners for a discussion about the existing barriers to data sharing. The whole aim of the event was to move beyond compliance with funders’ policies. We hoped that the community was ready to change the focus of data sharing discussions from whether it is worth sharing or not towards more mature discussions about the benefits and limitations of data sharing.

What are the barriers?

So what are the barriers to effective sharing of research data? There were three main barriers identified, all somewhat related to each other: poorly described data, insufficient data discoverability and difficulties with sharing personal/sensitive data. All of these problems arise from the fact that research data does not always shared in accordance to FAIR principles: that data is Findable, Accessible, Interoperable and Re-usable.

Poorly described data

The event started with an inspiring keynote talk from Dr Nicole Janz from the Department of Sociology at the University of Cambridge: “Transparency in Social Science Research & Teaching”. Nicole regularly runs replication workshops at Cambridge, where students select published research papers and they work hard for several weeks to reproduce the published findings. The purpose of these workshop is to allow students to learn by experience on what is important in making their own work transparent and reproducible to others.

Very often students fail to reproduce the results. Frequently, the reasons for failures are insufficient methodology available, or simply the fact that key datasets were not made available. Students learn that in order to make research reproducible, one not only needs to make the raw data files available, but that the data needs to be shared with the source code used to transform it and with written down methodology of the process, ideally in a README file. While doing replication studies, students also learn about the five selfish benefits of good data management and sharing: data disasters are avoided, it is easier to write up papers from well-managed data, transparent approach to sharing makes the work more convincing to reviewers, the continuity of research is possible and researchers can build their reputation for being transparent. As a tip for researchers, Nicole suggested always asking a colleague to try to reproduce the findings before submitting a paper for peer-review.

The problem of insufficient data description/availability was also discussed during the first case study talk by Dr Kai Ruggeri from the Department of Psychology, University of Cambridge. Kai reflected on his work on the assessment of happiness and wellbeing across many European countries, which was part of the ESRC Secondary Data Analysis Initiative. Kai re-iterated that missing data make the analysis complicated and sometimes prevent one from being able to make effective policy recommendations. Kai also stressed that frequently the choice of baseline for data analysis can affect the final results. Therefore, proper description of methodology and approaches taken is key for making research reproducible.

Insufficient data discoverability

JME_0665We also heard several speakers describing problems with data discoverability. Fiona Nielsen founded Repositive – a platform for finding human genomic data. Fiona founded the platform out of frustration that genomic data was so difficult to find and access. Proliferation of data repositories made it very hard for researchers to actually find what they need.

IMG_SearchingForData_20160911Fiona started with doing a quick poll among the audience: how do researchers look for data? It turned out that most researchers find data by doing a literature research or by googling for it. This is not surprising – there is no search engine enabling looking for information simultaneously across the multiple repositories where the data is available. To make it even more complicated, Fiona reported that in 2015 80PB of human genomic data was generated. Unfortunately, only 0.5PB of human genomic data was made available in a data repository.

So how can researchers find the other datasets, which are not made available in public repositories? Repositive is a platform harvesting metadata from several repositories hosting human genomic data and providing a search engine allowing researchers to simultaneously look for datasets shared in all of them. Additionally, researchers who cannot share their research data via a public repository (for example, due to lack of participants’ consent for sharing), can at least create a metadata record about the data – to let others know that the data exist and to provide them with information on data access procedure.

The problem of data discoverability is however not only related to people’s awareness that datasets exists. Sometimes, especially in the case of complex biological data with a vast amount of variables, it can be difficult to find the right information inside the dataset. In an excellent lightening talk, Jullie Sullivan from the University of Cambridge described InterMine –platform to make biological data easily searchable (‘mineable’). Anyone can simply upload their data onto the platform to make it searchable and discoverable. One example of the platform’s use is FlyMine – database where researchers looking for results of experiments conducted on fruit fly can easily find and share information.

Difficulties with sharing personal/sensitive data

The last barrier to sharing that we discussed was related to sharing personal/sensitive research data. This barrier is perhaps the most difficult one to overcome, but here again the conference participants came up with some excellent solutions. First one came from the keynote speech by Louise Corti – with a talk with a very uplifting title: “Personal not painful: Practical and Motivating Experiences in Data Sharing”.

Louise based her talk on the long experience of the UK Data Service with providing managed access to data containing some forms of confidential/restricted information. Apart from being able to host datasets which can be made openly available, the UKDS can also provide two other types of access: safeguarded access, where data requestors need to register before downloading the data, and controlled data, where requests for data are considered on a case by case basis.

At the outset of the research project, researchers discuss their research proposals with the UKDS, including any potential limitations to data sharing. It is at this stage – at the outset of the research project, that the decision is made on the type of access that will be required for the data to be successfully shared. All processes of project management and data handling, such as data anonymisation and collection of informed consent forms from study participants, are then carried in adherence to that decision. The UKDS also offers protocols clarifying what is going to happen with research data once they are deposited with the repository. The use of standard licences for sharing make the governance of data access much more transparent and easy to understand, both from the perspective of data depositors and data re-users.

Louise stressed that transparency and willingness to discuss problems is key for mutual respect and understanding between data producers, data re-users and data curators. Sometimes unnecessary misunderstandings make data sharing difficult, when it does not need to be. Louise mentioned that researchers often confuse ‘sensitive topic’ with ‘sensitive data’ and referred to a success case study where, by working directly with researchers, UKDS managed to share a dataset about sedation at the end of life. The subject of study was sensitive, but because the data was collected and managed with the view of sharing at the end of the project, the dataset itself was not sensitive and was suitable for sharing.

As Louise said “data sharing relies on trust that data curators will treat it ethically and with respect” and open communication is key to build and maintain this trust.

So did it work?

JME_0698The purpose of this event was to engage the community in discussions about the existing limitation to data sharing. Did we succeed? Did we manage to engage the community? Judging by the fact that we have received twenty high quality abstract applications from researchers across various disciplines for only five available case study speaking slots (it was so difficult to shortlist the top five ones!) and also because the venue was full – with around eighty attendees from Cambridge and other institutions, I think that the objective was pretty well met.

Additionally, the panel discussion was led by researchers and involved fifty eight active users on the Sli.do platform for questions to panellists. There were also questions asked outside of Sli.do platform. So overall I feel that the event was a great success and it was truly fantastic to be part of it and to see the degree of participant involvement in data sharing.

Another observation is also the great progress of the research community in Cambridge in the area of sharing: we have successfully moved away from discussions whether research data is worth sharing to how to make data sharing more FAIR.

It seems that our intense advocacy, and the effort of speaking with over 1,800 academics from across the campus since January 2015 paid off and we have indeed managed to build an engaged research data management community.

Read (and see!) more:

Published 12 September 2016
Written by Dr Marta Teperek
Creative Commons License