Category Archives: Open Research at Cambridge Conference

In conversation with Ben Ryan from EPSRC

Cambridge University hosted Ben Ryan and Amanda Chmura from the Engineering and Physical Sciences Research Council (EPSRC) on Friday 15 May for a discussion about how the University is meeting the EPSRC expectations for sharing research data.

We started the conversation with a demonstration of the services we offer our researchers including our Research Data Management website, and talked about the open data sessions and other training events we have been holding. So far we have managed to speak to 764 researchers about data sharing requirements (the numbers continue to grow).

Managing expectations

In 2011 EPSRC published nine key expectations on research data management. The expectations are directed principally at research organisations and highlight their role in supporting researchers to ensure research data is properly managed. EPSRC set a deadline, 1 May 2015, for research organisation compliance with their expectations.

One of the expectations is that data supporting publications arising from funded research is openly available – this reflects the Common Principles on Data Policy published by RCUK (2011) and in the Royal Society’s subsequent (2012) report ‘Science as a Public Enterprise’. To monitor compliance with this expectation EPSRC have said that this autumn they will conduct checks of papers published after 1 May 2015 to ensure these provide appropriate directions to the supporting data.

Ben clarified that the checks will help to determine the level of awareness of the policy and expectations. He noted that there is a balance in what the EPSRC is trying to do. They are trying to create a new research culture, and they are primarily focused on what the institution should be doing to support that.

According to the EPSRC policy, in situations where research arises from collaborations, or from work partially funded by commercial partners, any potential problems with research data sharing should be addressed before the start of the project, in a data management plan. We therefore asked Ben why the EPSRC – of all the RCUK funding bodies– don’t require researchers to create a data management plan. Ben indicated that the main value in data management planning is to the researcher and the research organisation – adding them to EPSRC’s funding submission process would simply add to the admin and peer review burden without it being clear how peer reviewers could properly judge them because they don’t know the infrastructure available where the research is being conducted.

The question arose of whether a single RCUK policy on research data might be possible. Ben noted that the different councils fund different types of work, which informs their individual policies, and explained that although a single policy might be achievable it would require every council to change their existing policy and would be very disruptive of current processes across the whole system. As such he felt it would need a ‘very strong steer externally’ to drive such a change.

However, the research councils recognise the need for more guidance and are about to publish cross-council guidelines presenting a collective position on what should be done with particular types of data.

Clarification

A question that often arises from researchers is ‘what data are we expected to keep and make available’? We were able to get confirmation that it is:

  • the data that underpins publications
  • the data that validates research findings
  • the data that is worth keeping

All questions should be answered by considering the principles behind the policy. The default position is data should be open – in a way that does not damage the research process. The important thing is that the validity of the published research findings is testable.

An example of the way this principle can be used is when considering another common question – what to do in the situation where several papers are expected to come out of the one set of data. Researchers are concerned that if they release the data on the first publication it jeopardises their subsequent publications as they may be scooped. Ben acknowledged this is a concern but asked is it reasonable to sit on data for, say, five years so that other people end up being funded to generate the same data again?

He pointed out that the RCUK Common Principles state that those who undertake Research Council funded work may be entitled to a limited period of privileged use of the data they have collected to enable them to publish the results of their research. However, the length of this period varies by research discipline.

There is also the consideration of the way another user can access the data and reproduce results. The question is – how far do we go to enable a user to reproduce the work? The minimum is that we should provide the information that someone would need to be able to validate published work – this is also critical to maximise the impact of publicly funded research and to maintain public trust in science and research.

The software situation

We had representatives from Cambridge Enterprise and from the School of Technology at the meeting who had specific questions about sharing software. While Ben indicated he might need to reflect on some of the questions, we did come to some clarification on others.

Although software is different from other forms of intellectual property the same basic question arises: “is the institution best served by making it freely available or by commercialising it?” Both approaches can lead to the creation of jobs and economic impact. EPSRC is clear that the choice of exploitation strategy rests with the research organisation.

The EPSRC does not have an expectation about the licence under which software should be released.

It was agreed that if there is material that is potentially commercial, then we should take the steps to make it available and commercialise the software. It was confirmed we are able to make software arising from a research project available free for non-commercial re-use by other researchers (within the academic community) while at the same time making it available to others under a commercial licence

One can argue that since the taxpayer funded the work in the first place the taxpayer should not have to pay for it again, but this position, taken to its natural conclusion, of course would mean that no commercialisation of funded research should ever occur.

There is also the situation where a researcher has put their ‘life and soul’ into generating outputs and naturally feels they have some ownership of the work. Ben agreed that many of these questions are ‘very challenging’, but noted that researchers seldom ‘own’ their outputs – under RCUK grant conditions the research organisation owns all the intellectual assets arising from the funded research and is responsible for seeing that they are used to the benefit of society and the economy. Some of these questions stem from a mindset that insufficiently recognises the importance of ensuring that the economy and society as a whole benefits from publicly funded research, and a culture change is needed in addition to new processes.

The EPSRC do wish to avoid people sitting on data indefinitely because they don’t want to release their software. Ben said that in principle it is permissible for people to make software available through GitHub, but he would need to investigate how sustainable it is and how it is governed before being able to say whether GitHub is a reasonable option in terms of meeting EPSRC expectations..

Addressing (some) concerns

Time prevented us covering all of the topics we wished to raise. Many Cambridge researchers have raised questions about sharing data from collaborations – with concern that non-UK partners who do not have a data sharing requirement may find the UK requirements onerous and that this could decrease the amount of international collaborations in which UK institutions are involved.

There was also no magic bullet for the challenge of paying the not insignificant cost of storing research data safely for 10 years+. The problem is that where researchers were unaware of this expectation at the time they applied for their grant there is no allowance for it in their budget. This will not be an issue in the future as current grants are approved, but we are in a transition period now as the research from existing grants is published and the supporting data is being made available and stored. When we discussed this, Ben explained that the EPSRC does not have any additional funds to support this transition period, and that the costs need to be found within existing resources.

There have been some challenges with communication of the EPSRC policy. Many researchers at the University of Cambridge have said they would have liked to be informed about it directly by EPSRC (as, for example, they would expect to have been by e.g. the Wellcome Trust). Ben explained that the approach had deliberately been to communicate the policy through research organisation senior managers (e.g. ProVCs Research), and that this was because the expectations are addressed principally to research institutions, which have primary responsibility for ensuring that researchers manage their data effectively and have access to appropriate facilities to do so. However, he acknowledged that EPSRC could have communicated more with researchers and undertook to explore how more information could be made available directly to researchers.

Therefore it was helpful to be able to express some of the concerns and fears amongst the research community. We have been collating the questions that people have asked during our sessions and will compile a FAQ from this that will appear on our Research Data Management website. Ben indicated that there might be a possibility of a selection of these FAQs also appearing on the RCUK website to help address the universal questions about sharing research data. This step would be welcomed by the University.

Published 21 May 2015
Written by Dr Danny Kingsley
Creative Commons License

Data management – one size does not fit all

As the Research Data Facilitator at the University of Cambridge, I am part of the team establishing a Research Data Management (RDM) Facility at the University. This blog is a note of my impressions from the Digital Curation Centre (DCC) meeting held in London on the 28th April 2015: Preparing Data for Deposit.

As always, the DCC meeting was extremely useful for networking. I met with people at similar roles at other institutions. And again, the breakout sessions were invaluable – they allowed us to exchange precious experience, feedback gained and lessons learnt while developing RDM services.

What could have been done better though is more appreciation for differences between universities.

Unrealistic staffing

The talk from the keynote speaker, Louise Corti, the Associate Director at the UK Data Service, was very inspirational. I loved the uplifting expression that RDM supporters are like artists evangelising researchers. It was great to hear about RDM solutions available at the UK Data Service, and the professional approach to research data, with every aspect of data curation addressed by the excellent team of 70 dedicated people, with precise workflows for data processing.

However, how realistic it is for a university to develop similar solutions locally? Which University would be able to dedicate similar amount of resources for the development of an RDM facility?

At the University of Cambridge, I am the only full-time employee dedicated to work on establishment and provision of RDM services to our researchers. There is a team of people supporting the facility but these staff are shared with other projects. I would have very much appreciated what would be the scalable solution that the UK Data Service could recommend universities to develop, knowing that resources available are nowhere near what a 70 people team could offer.

Scalability

On the other hand, we had a presentation from the University of Loughborough. The University, represented by Gary Brewerton, teamed up with Figshare and Arkivum (Mark Hahnel and Matthew Addis, respectively). The three of them explained to us the infrastructure developed to support RDM management at the University of Loughborough. The University data repository, DSpace, has been equipped with archival storage provided by Arkivum, which guarantees 100% data integrity. Additionally, researchers at the University of Loughborough can benefit from the use of Figshare, which provides them with a user-friendly research data sharing platform.

These systems seemed to offer excellent solutions to researchers, but somehow I could not help having the impression of listening to sales pitches. Are there any disadvantages of these solutions? Are there any alternatives?

Figshare charges for the file transfer (downloading of openly accessible data is actually not free for institutions). How substantial would be these charges for bigger institutions, producing huge amounts of valuable research data, frequently sought after and downloaded by others? Would institutions be able to sustain the cost of data access to their most valuable research datasets?

Risk management

The Loughborough solutions do not appear to take into account risks associated with implementation of services from third party providers at bigger, research-intense universities. At the University of Cambridge we have almost 300 EPSRC-funded research grants. In April this year alone our data repository received 40GB of research data deposits coming from EPSRC-funded projects. Producing valuable research outputs is business-critical for universities.

What would be the costs associated with the data transfer of supposedly open-access datasets if these were available via Figshare? Is there any upper limit on possible transfer charges?

What is the long-term risk of handing over university’s research data holdings to a third party service provider? Note that some UK research funders expect data to be stored long-term, and in some cases in perpetuity (10 years from the last access). What will be the conditions for research data storage offered by these external providers in 10, 20, 30 years time? How will the cost change? Will it be easy/possible to transfer all research data somewhere else?

Figshare has recently entered into a legal partnership with Macmillan (you can read more about it in a blog post from Dr Peter Murray-Rust) – how will this partnership evolve in the future?

Suggestion

It would be extremely valuable if RDM solutions proposed at DCC meetings could be discussed taking into account the size of the institution, the amount of research conducted at the University, and the size of the RDM team locally available to work on the implementation of the solution.

One size does not and will not fit all, and a better recognition of differences between organisations would greatly help developing optimal solutions for each individual institution. Additionally, it seems to me of key importance to openly talk about drawbacks of each solution for universities to efficiently mitigate future risks.

Published 14 May 2015
Written by Dr Marta Teperek
Creative Commons License

Benchmarking the Cambridge RDM program

Cambridge University released its Research Data Management Policy Framework today.

This is a good opportunity to assess whether Cambridge is fulfilling the 10 recommendations for libraries on how to get started in data management presented in the final report of the LIBER working group on E-Science / Research Data Management. Since publication in July 2012, this is the most downloaded item from the Association of European Research Libraries (LIBER) website. We list below the 10 recommendations and what Cambridge is doing to meet them.

Benchmarking against RDM recommendations

  1. Offer research data management support, including data management plans for grant applications, intellectual property rights advice and information materials. Assist faculty with data management plans and the integration of data management into the curriculum.

The Open Data team at the University of Cambridge has created a comprehensive dedicated website for research data management. The website provides researchers with guidance on various aspects of research data management from project design and data management planning, through data collection and maintenance, to data curation and sharing.

The University also offers numerous workshops and training on research data management. An on-demand assistance with all aspects of research data management is available to researchers via a simple website support request form.

  1. Engage in the development of metadata and data standards and provide metadata services for research data.

The University of Cambridge is actively involved in developing metadata standards. All research data depositions to the University data repository occur via a simple website form. This form collects information on metadata descriptions and provides guidance on what should be included in each description field. All research data and metadata descriptions submitted to the University repository are carefully curated by our repository managers.

  1. Create Data Librarian posts and develop professional staff skills for data librarianship.

Cambridge Library has a dedicated research data management working group composed of librarians across various University departments who are actively involved in Open Access. The research data management working group is designing and delivering a series of training and workshops for the broader library community to equip them with professional research data management support skills.

  1. Actively participate in institutional research data policy development, including resource plans. Encourage and adopt open data policies where appropriate in the research data life cycle.

The newly released Research Data Management Policy Framework builds on policy frameworks in place since 2013.  The policy framework encourages the University researchers and research students to share their research data as widely and openly as possible, and provides guidance on best practice for data sharing.

  1. Liaise and partner with researchers, research groups, data archives and data centers [sic] to foster an interoperable infrastructure for data access, discovery and data sharing.

The Open Data Project Working Group at the University of Cambridge consists of members from several independent operational units at the University. These include the Cambridge University Library, the Research Operations Office, the Research Strategy Office and the University Information Services. This ensures a deep integration and engagement within the broader University structure.

Additionally, members of the Open Data team are conducting daily consultations with researchers and with research support staff across all departments at the University, to ensure that the developed research data management services are tailored to meet their needs.

  1. Support the lifecycle for research data by providing services for storage, discovery and permanent access.

At the University of Cambridge the University Information Services provide researchers with day to day research data management solutions, such as platforms for file sharing, data storage and backup. The Open Data team ensures that shareable research data is deposited into a suitable data repository (guaranteeing long term data sustainability) and shared as widely and openly as possible.

  1. Promote research data citation by applying persistent identifiers to research data.

The University of Cambridge data repository mints persistent links to each deposited research dataset. Additionally, the repository is currently being upgraded to enable minting of DOIs (digital object identifiers). These are all persistent links and their use ensures the access to data over the long term preservation period, as well as facilitates data citation.

  1. Promote research data citation by applying persistent identifiers to research data. Provide an institutional Data Catalogue or Data Repository, depending on available infrastructure.

The University of Cambridge provides both a data repository and a data registry. Our institutional repository has accepted research datasets since 2005. The University of Cambridge aims to ultimately be able to streamline and record in an automated way information about metadata descriptions from all repositories used by our researchers.

  1. Get involved in subject specific data management practice.

The respect for subject-specific differences in data management practice is recognised and affirmed throughout the University of Cambridge Research Data Management Policy Framework. The University recognises that research data management solutions need to be tailored to researchers working in different disciplines. Therefore, the Open Data team conducts daily consultations with researchers all different fields of study – to better understand individual needs and to tailor research data management support appropriately.

  1. Offer or mediate secure storage for dynamic and static research data in co-operation with institutional IT units and/or seek exploitation of appropriate cloud services.

The University Information Services (members of which are part of the Open Data Team) are currently developing a cloud-based, Dropbox-like storage solution to facilitate easy and secure data storage and sharing between collaborators.

Published 28 April 2015
Written by Dr Marta Teperek and Dr Danny Kingsley
Creative Commons License