Monthly Archives: May 2015

In conversation with Ben Ryan from EPSRC

Cambridge University hosted Ben Ryan and Amanda Chmura from the Engineering and Physical Sciences Research Council (EPSRC) on Friday 15 May for a discussion about how the University is meeting the EPSRC expectations for sharing research data.

We started the conversation with a demonstration of the services we offer our researchers including our Research Data Management website, and talked about the open data sessions and other training events we have been holding. So far we have managed to speak to 764 researchers about data sharing requirements (the numbers continue to grow).

Managing expectations

In 2011 EPSRC published nine key expectations on research data management. The expectations are directed principally at research organisations and highlight their role in supporting researchers to ensure research data is properly managed. EPSRC set a deadline, 1 May 2015, for research organisation compliance with their expectations.

One of the expectations is that data supporting publications arising from funded research is openly available – this reflects the Common Principles on Data Policy published by RCUK (2011) and in the Royal Society’s subsequent (2012) report ‘Science as a Public Enterprise’. To monitor compliance with this expectation EPSRC have said that this autumn they will conduct checks of papers published after 1 May 2015 to ensure these provide appropriate directions to the supporting data.

Ben clarified that the checks will help to determine the level of awareness of the policy and expectations. He noted that there is a balance in what the EPSRC is trying to do. They are trying to create a new research culture, and they are primarily focused on what the institution should be doing to support that.

According to the EPSRC policy, in situations where research arises from collaborations, or from work partially funded by commercial partners, any potential problems with research data sharing should be addressed before the start of the project, in a data management plan. We therefore asked Ben why the EPSRC – of all the RCUK funding bodies– don’t require researchers to create a data management plan. Ben indicated that the main value in data management planning is to the researcher and the research organisation – adding them to EPSRC’s funding submission process would simply add to the admin and peer review burden without it being clear how peer reviewers could properly judge them because they don’t know the infrastructure available where the research is being conducted.

The question arose of whether a single RCUK policy on research data might be possible. Ben noted that the different councils fund different types of work, which informs their individual policies, and explained that although a single policy might be achievable it would require every council to change their existing policy and would be very disruptive of current processes across the whole system. As such he felt it would need a ‘very strong steer externally’ to drive such a change.

However, the research councils recognise the need for more guidance and are about to publish cross-council guidelines presenting a collective position on what should be done with particular types of data.


A question that often arises from researchers is ‘what data are we expected to keep and make available’? We were able to get confirmation that it is:

  • the data that underpins publications
  • the data that validates research findings
  • the data that is worth keeping

All questions should be answered by considering the principles behind the policy. The default position is data should be open – in a way that does not damage the research process. The important thing is that the validity of the published research findings is testable.

An example of the way this principle can be used is when considering another common question – what to do in the situation where several papers are expected to come out of the one set of data. Researchers are concerned that if they release the data on the first publication it jeopardises their subsequent publications as they may be scooped. Ben acknowledged this is a concern but asked is it reasonable to sit on data for, say, five years so that other people end up being funded to generate the same data again?

He pointed out that the RCUK Common Principles state that those who undertake Research Council funded work may be entitled to a limited period of privileged use of the data they have collected to enable them to publish the results of their research. However, the length of this period varies by research discipline.

There is also the consideration of the way another user can access the data and reproduce results. The question is – how far do we go to enable a user to reproduce the work? The minimum is that we should provide the information that someone would need to be able to validate published work – this is also critical to maximise the impact of publicly funded research and to maintain public trust in science and research.

The software situation

We had representatives from Cambridge Enterprise and from the School of Technology at the meeting who had specific questions about sharing software. While Ben indicated he might need to reflect on some of the questions, we did come to some clarification on others.

Although software is different from other forms of intellectual property the same basic question arises: “is the institution best served by making it freely available or by commercialising it?” Both approaches can lead to the creation of jobs and economic impact. EPSRC is clear that the choice of exploitation strategy rests with the research organisation.

The EPSRC does not have an expectation about the licence under which software should be released.

It was agreed that if there is material that is potentially commercial, then we should take the steps to make it available and commercialise the software. It was confirmed we are able to make software arising from a research project available free for non-commercial re-use by other researchers (within the academic community) while at the same time making it available to others under a commercial licence

One can argue that since the taxpayer funded the work in the first place the taxpayer should not have to pay for it again, but this position, taken to its natural conclusion, of course would mean that no commercialisation of funded research should ever occur.

There is also the situation where a researcher has put their ‘life and soul’ into generating outputs and naturally feels they have some ownership of the work. Ben agreed that many of these questions are ‘very challenging’, but noted that researchers seldom ‘own’ their outputs – under RCUK grant conditions the research organisation owns all the intellectual assets arising from the funded research and is responsible for seeing that they are used to the benefit of society and the economy. Some of these questions stem from a mindset that insufficiently recognises the importance of ensuring that the economy and society as a whole benefits from publicly funded research, and a culture change is needed in addition to new processes.

The EPSRC do wish to avoid people sitting on data indefinitely because they don’t want to release their software. Ben said that in principle it is permissible for people to make software available through GitHub, but he would need to investigate how sustainable it is and how it is governed before being able to say whether GitHub is a reasonable option in terms of meeting EPSRC expectations..

Addressing (some) concerns

Time prevented us covering all of the topics we wished to raise. Many Cambridge researchers have raised questions about sharing data from collaborations – with concern that non-UK partners who do not have a data sharing requirement may find the UK requirements onerous and that this could decrease the amount of international collaborations in which UK institutions are involved.

There was also no magic bullet for the challenge of paying the not insignificant cost of storing research data safely for 10 years+. The problem is that where researchers were unaware of this expectation at the time they applied for their grant there is no allowance for it in their budget. This will not be an issue in the future as current grants are approved, but we are in a transition period now as the research from existing grants is published and the supporting data is being made available and stored. When we discussed this, Ben explained that the EPSRC does not have any additional funds to support this transition period, and that the costs need to be found within existing resources.

There have been some challenges with communication of the EPSRC policy. Many researchers at the University of Cambridge have said they would have liked to be informed about it directly by EPSRC (as, for example, they would expect to have been by e.g. the Wellcome Trust). Ben explained that the approach had deliberately been to communicate the policy through research organisation senior managers (e.g. ProVCs Research), and that this was because the expectations are addressed principally to research institutions, which have primary responsibility for ensuring that researchers manage their data effectively and have access to appropriate facilities to do so. However, he acknowledged that EPSRC could have communicated more with researchers and undertook to explore how more information could be made available directly to researchers.

Therefore it was helpful to be able to express some of the concerns and fears amongst the research community. We have been collating the questions that people have asked during our sessions and will compile a FAQ from this that will appear on our Research Data Management website. Ben indicated that there might be a possibility of a selection of these FAQs also appearing on the RCUK website to help address the universal questions about sharing research data. This step would be welcomed by the University.

Published 21 May 2015
Written by Dr Danny Kingsley
Creative Commons License

Data management – one size does not fit all

As the Research Data Facilitator at the University of Cambridge, I am part of the team establishing a Research Data Management (RDM) Facility at the University. This blog is a note of my impressions from the Digital Curation Centre (DCC) meeting held in London on the 28th April 2015: Preparing Data for Deposit.

As always, the DCC meeting was extremely useful for networking. I met with people at similar roles at other institutions. And again, the breakout sessions were invaluable – they allowed us to exchange precious experience, feedback gained and lessons learnt while developing RDM services.

What could have been done better though is more appreciation for differences between universities.

Unrealistic staffing

The talk from the keynote speaker, Louise Corti, the Associate Director at the UK Data Service, was very inspirational. I loved the uplifting expression that RDM supporters are like artists evangelising researchers. It was great to hear about RDM solutions available at the UK Data Service, and the professional approach to research data, with every aspect of data curation addressed by the excellent team of 70 dedicated people, with precise workflows for data processing.

However, how realistic it is for a university to develop similar solutions locally? Which University would be able to dedicate similar amount of resources for the development of an RDM facility?

At the University of Cambridge, I am the only full-time employee dedicated to work on establishment and provision of RDM services to our researchers. There is a team of people supporting the facility but these staff are shared with other projects. I would have very much appreciated what would be the scalable solution that the UK Data Service could recommend universities to develop, knowing that resources available are nowhere near what a 70 people team could offer.


On the other hand, we had a presentation from the University of Loughborough. The University, represented by Gary Brewerton, teamed up with Figshare and Arkivum (Mark Hahnel and Matthew Addis, respectively). The three of them explained to us the infrastructure developed to support RDM management at the University of Loughborough. The University data repository, DSpace, has been equipped with archival storage provided by Arkivum, which guarantees 100% data integrity. Additionally, researchers at the University of Loughborough can benefit from the use of Figshare, which provides them with a user-friendly research data sharing platform.

These systems seemed to offer excellent solutions to researchers, but somehow I could not help having the impression of listening to sales pitches. Are there any disadvantages of these solutions? Are there any alternatives?

Figshare charges for the file transfer (downloading of openly accessible data is actually not free for institutions). How substantial would be these charges for bigger institutions, producing huge amounts of valuable research data, frequently sought after and downloaded by others? Would institutions be able to sustain the cost of data access to their most valuable research datasets?

Risk management

The Loughborough solutions do not appear to take into account risks associated with implementation of services from third party providers at bigger, research-intense universities. At the University of Cambridge we have almost 300 EPSRC-funded research grants. In April this year alone our data repository received 40GB of research data deposits coming from EPSRC-funded projects. Producing valuable research outputs is business-critical for universities.

What would be the costs associated with the data transfer of supposedly open-access datasets if these were available via Figshare? Is there any upper limit on possible transfer charges?

What is the long-term risk of handing over university’s research data holdings to a third party service provider? Note that some UK research funders expect data to be stored long-term, and in some cases in perpetuity (10 years from the last access). What will be the conditions for research data storage offered by these external providers in 10, 20, 30 years time? How will the cost change? Will it be easy/possible to transfer all research data somewhere else?

Figshare has recently entered into a legal partnership with Macmillan (you can read more about it in a blog post from Dr Peter Murray-Rust) – how will this partnership evolve in the future?


It would be extremely valuable if RDM solutions proposed at DCC meetings could be discussed taking into account the size of the institution, the amount of research conducted at the University, and the size of the RDM team locally available to work on the implementation of the solution.

One size does not and will not fit all, and a better recognition of differences between organisations would greatly help developing optimal solutions for each individual institution. Additionally, it seems to me of key importance to openly talk about drawbacks of each solution for universities to efficiently mitigate future risks.

Published 14 May 2015
Written by Dr Marta Teperek
Creative Commons License