Tag Archives: Arkivum

Data management – one size does not fit all

As the Research Data Facilitator at the University of Cambridge, I am part of the team establishing a Research Data Management (RDM) Facility at the University. This blog is a note of my impressions from the Digital Curation Centre (DCC) meeting held in London on the 28th April 2015: Preparing Data for Deposit.

As always, the DCC meeting was extremely useful for networking. I met with people at similar roles at other institutions. And again, the breakout sessions were invaluable – they allowed us to exchange precious experience, feedback gained and lessons learnt while developing RDM services.

What could have been done better though is more appreciation for differences between universities.

Unrealistic staffing

The talk from the keynote speaker, Louise Corti, the Associate Director at the UK Data Service, was very inspirational. I loved the uplifting expression that RDM supporters are like artists evangelising researchers. It was great to hear about RDM solutions available at the UK Data Service, and the professional approach to research data, with every aspect of data curation addressed by the excellent team of 70 dedicated people, with precise workflows for data processing.

However, how realistic it is for a university to develop similar solutions locally? Which University would be able to dedicate similar amount of resources for the development of an RDM facility?

At the University of Cambridge, I am the only full-time employee dedicated to work on establishment and provision of RDM services to our researchers. There is a team of people supporting the facility but these staff are shared with other projects. I would have very much appreciated what would be the scalable solution that the UK Data Service could recommend universities to develop, knowing that resources available are nowhere near what a 70 people team could offer.

Scalability

On the other hand, we had a presentation from the University of Loughborough. The University, represented by Gary Brewerton, teamed up with Figshare and Arkivum (Mark Hahnel and Matthew Addis, respectively). The three of them explained to us the infrastructure developed to support RDM management at the University of Loughborough. The University data repository, DSpace, has been equipped with archival storage provided by Arkivum, which guarantees 100% data integrity. Additionally, researchers at the University of Loughborough can benefit from the use of Figshare, which provides them with a user-friendly research data sharing platform.

These systems seemed to offer excellent solutions to researchers, but somehow I could not help having the impression of listening to sales pitches. Are there any disadvantages of these solutions? Are there any alternatives?

Figshare charges for the file transfer (downloading of openly accessible data is actually not free for institutions). How substantial would be these charges for bigger institutions, producing huge amounts of valuable research data, frequently sought after and downloaded by others? Would institutions be able to sustain the cost of data access to their most valuable research datasets?

Risk management

The Loughborough solutions do not appear to take into account risks associated with implementation of services from third party providers at bigger, research-intense universities. At the University of Cambridge we have almost 300 EPSRC-funded research grants. In April this year alone our data repository received 40GB of research data deposits coming from EPSRC-funded projects. Producing valuable research outputs is business-critical for universities.

What would be the costs associated with the data transfer of supposedly open-access datasets if these were available via Figshare? Is there any upper limit on possible transfer charges?

What is the long-term risk of handing over university’s research data holdings to a third party service provider? Note that some UK research funders expect data to be stored long-term, and in some cases in perpetuity (10 years from the last access). What will be the conditions for research data storage offered by these external providers in 10, 20, 30 years time? How will the cost change? Will it be easy/possible to transfer all research data somewhere else?

Figshare has recently entered into a legal partnership with Macmillan (you can read more about it in a blog post from Dr Peter Murray-Rust) – how will this partnership evolve in the future?

Suggestion

It would be extremely valuable if RDM solutions proposed at DCC meetings could be discussed taking into account the size of the institution, the amount of research conducted at the University, and the size of the RDM team locally available to work on the implementation of the solution.

One size does not and will not fit all, and a better recognition of differences between organisations would greatly help developing optimal solutions for each individual institution. Additionally, it seems to me of key importance to openly talk about drawbacks of each solution for universities to efficiently mitigate future risks.

Published 14 May 2015
Written by Dr Marta Teperek
Creative Commons License