Monthly Archives: April 2015

Benchmarking the Cambridge RDM program

Cambridge University released its Research Data Management Policy Framework today.

This is a good opportunity to assess whether Cambridge is fulfilling the 10 recommendations for libraries on how to get started in data management presented in the final report of the LIBER working group on E-Science / Research Data Management. Since publication in July 2012, this is the most downloaded item from the Association of European Research Libraries (LIBER) website. We list below the 10 recommendations and what Cambridge is doing to meet them.

Benchmarking against RDM recommendations

  1. Offer research data management support, including data management plans for grant applications, intellectual property rights advice and information materials. Assist faculty with data management plans and the integration of data management into the curriculum.

The Open Data team at the University of Cambridge has created a comprehensive dedicated website for research data management. The website provides researchers with guidance on various aspects of research data management from project design and data management planning, through data collection and maintenance, to data curation and sharing.

The University also offers numerous workshops and training on research data management. An on-demand assistance with all aspects of research data management is available to researchers via a simple website support request form.

  1. Engage in the development of metadata and data standards and provide metadata services for research data.

The University of Cambridge is actively involved in developing metadata standards. All research data depositions to the University data repository occur via a simple website form. This form collects information on metadata descriptions and provides guidance on what should be included in each description field. All research data and metadata descriptions submitted to the University repository are carefully curated by our repository managers.

  1. Create Data Librarian posts and develop professional staff skills for data librarianship.

Cambridge Library has a dedicated research data management working group composed of librarians across various University departments who are actively involved in Open Access. The research data management working group is designing and delivering a series of training and workshops for the broader library community to equip them with professional research data management support skills.

  1. Actively participate in institutional research data policy development, including resource plans. Encourage and adopt open data policies where appropriate in the research data life cycle.

The newly released Research Data Management Policy Framework builds on policy frameworks in place since 2013.  The policy framework encourages the University researchers and research students to share their research data as widely and openly as possible, and provides guidance on best practice for data sharing.

  1. Liaise and partner with researchers, research groups, data archives and data centers [sic] to foster an interoperable infrastructure for data access, discovery and data sharing.

The Open Data Project Working Group at the University of Cambridge consists of members from several independent operational units at the University. These include the Cambridge University Library, the Research Operations Office, the Research Strategy Office and the University Information Services. This ensures a deep integration and engagement within the broader University structure.

Additionally, members of the Open Data team are conducting daily consultations with researchers and with research support staff across all departments at the University, to ensure that the developed research data management services are tailored to meet their needs.

  1. Support the lifecycle for research data by providing services for storage, discovery and permanent access.

At the University of Cambridge the University Information Services provide researchers with day to day research data management solutions, such as platforms for file sharing, data storage and backup. The Open Data team ensures that shareable research data is deposited into a suitable data repository (guaranteeing long term data sustainability) and shared as widely and openly as possible.

  1. Promote research data citation by applying persistent identifiers to research data.

The University of Cambridge data repository mints persistent links to each deposited research dataset. Additionally, the repository is currently being upgraded to enable minting of DOIs (digital object identifiers). These are all persistent links and their use ensures the access to data over the long term preservation period, as well as facilitates data citation.

  1. Promote research data citation by applying persistent identifiers to research data. Provide an institutional Data Catalogue or Data Repository, depending on available infrastructure.

The University of Cambridge provides both a data repository and a data registry. Our institutional repository has accepted research datasets since 2005. The University of Cambridge aims to ultimately be able to streamline and record in an automated way information about metadata descriptions from all repositories used by our researchers.

  1. Get involved in subject specific data management practice.

The respect for subject-specific differences in data management practice is recognised and affirmed throughout the University of Cambridge Research Data Management Policy Framework. The University recognises that research data management solutions need to be tailored to researchers working in different disciplines. Therefore, the Open Data team conducts daily consultations with researchers all different fields of study – to better understand individual needs and to tailor research data management support appropriately.

  1. Offer or mediate secure storage for dynamic and static research data in co-operation with institutional IT units and/or seek exploitation of appropriate cloud services.

The University Information Services (members of which are part of the Open Data Team) are currently developing a cloud-based, Dropbox-like storage solution to facilitate easy and secure data storage and sharing between collaborators.

Published 28 April 2015
Written by Dr Marta Teperek and Dr Danny Kingsley
Creative Commons License

Good news stories about data sharing?

We have been speaking to researchers around the University recently to discuss the expectations of their funders in relation to data management. This has raised the issue of how best to convince people this is a process that benefits society rather than a waste of time or just yet another thing they are being ‘forced to do’ – which is the perspective of some that we have spoken with.

Policy requirements

In general most funders require a Research Data Management Plan to be developed at the beginning of the project – and then adhered to. But the Engineering and Physical Sciences Research Council (EPSRC) have upped the ante by introducing a policy requiring that papers published from May 2015 onwards resulting from funded research include a statement about where the supporting research data may be accessed. The data needs to be available in a secure storage facility with a persistent URL, and that it must be available for 10 years from the last time it was accessed.

Carrot or stick?

While having a policy from funders does make researchers sit up and listen, there is a perception in the UK research community that this is yet another impost on time-poor researchers. This is not surprising. There has recently been an acceleration of new rules about sharing and assessing research.

The Research Excellence Framework (REF) occurred last year, and many researchers are still ‘recuperating’. Now the Higher Education Funding Council of England (HEFCE) is introducing  a policy in April 2016 that any peer reviewed article or conference paper that is to be included in the post-2014 REF must have been deposited to their institution’s repository within three months of acceptance or it cannot be counted.  This policy is a ‘green’ open access policy.

The Research Councils UK (RCUK) have had an open access policy in place for two years, introduced in 1 April 2013, a result of the 2012 Finch Report. The RCUK policy states that funded research outputs must be available open access, and it is permitted to make them available through deposit into a repository. At first glance this seems to align with the HEFCE policy, however, restrictions on the allowed embargo periods mean that in practice most articles must be made available gold open access – usually with the payment of an accompanying article processing charge. While these charges are supported by a block grant fund, there is considerable impost on the institutions to manage these.

There is also considerable confusion amongst researchers about what all these policies mean and how they relate to each other.

Data as a system

We are trying to find some examples about how making research data available can help research and society. It is unrealistic to hope for something along the lines of Jack Akandra‘s breakthrough for a diagnostic test for pancreatic cancer using only open access research.

That’s why I was pleased when Nicholas Gruen pointed me to a report he co-authored: Open for Business: How Open Data Can Help Achieve the G20 Growth Target – A Lateral Economics report commissioned by Omidyar Network – published in June 2014.

This report is looking primarily at government data but does consider access to data generated in publicly funded research. It makes some interesting observations about what can happen when data is made available. The consideration is that data can have properties at the system level, not just the individual  level of a particular data set.

The point is that if data does behave in this way, once a collection of data becomes sufficiently large then the addition of one more set of data could cause the “entire network to jump to a new state in which the connections and the payoffs change dramatically, perhaps by several orders of magnitude”.

Benefits of sharing data

The report also refers to a 2014 report The Value and Impact of Data Sharing and Curation: A synthesis of three recent studies of UK research data centres. This work explored the value and impact of curating and sharing research data through three well-established UK research data centres – the Archaeological Data Service, the Economic and Social Data Services, and the British Atmospheric Data Centre.

In summarising the results, Beagrie and Houghton noted that their economic analysis indicated that:

  • Very significant increases in research, teaching and studying efficiency were realised by the users as a result of their use of the data centres;
  • The value to users exceeds the investment made in data sharing and curation via the centres in all three cases; and
  • By facilitating additional use, the data centres significantly increase the measurable returns on investment in the creation/collection of the data hosted.
So clearly there are good stories out there.

If you know of any good news stories that have arisen from sharing UK research output data we would love to hear them. Email us or leave a comment!

Interview with Nigel Shadbolt on The Life Scientific

Sir Nigel Shadbolt was interviewed on ‘The Life Scientific‘ this morning  on BBC Radio4 about open data.

The general discussion ranged from his background and what got him interested in this area. The data being discussed is more about government public data (such as medical information or cyclist black spots) than that generated in research projects, but an interesting conversation nonetheless. A couple of items that jumped out to me:

16:50 – When we talk about data, really we are talking about information … Data and information and knowledge are kinda different and mostly when we talk about open data we are talking about information. Data (such as a number) only becomes information if it is placed in context. If you can do something with the information then it becomes knowledge – ‘actionable information’. These are different strains of stuff that the computer holds.  We need open information to build knowledge. The semantic web.

16:00 – Do the risks of making data available outweigh the benefits? And do we ask the general public’s opinion or just tell them that this is what we do? They want some sort of empowerment in this but often there is no empowerment.

29:00 – We are barely scratching the surface in terms of the insights as we anlayse and look for patterns in the information.  We are living in a world that is increasingly emitting data – people are increasingly able to collect data onto and off their phones (or supercomputers, depending on how you look at it). This data richness demands a new world for applications we haven’t thought of and ways of analysing the information.

Listen to the half hour interview here.

Blurb from the BBC webpage:

Sir Nigel Shadbolt, Professor of Artificial Intelligence at Southampton University, believes in the power of open data. With Sir Tim Berners-Lee he persuaded two UK Prime Ministers of the importance of letting us all get our hands on information that’s been collected about us by the government and other organisations. But, this has brought him into conflict with people who think there’s money to be made from this data. And open data raises issues of privacy.

Nigel Shadbolt talks to Jim al-Khalili about how a degree in psychology and philosophy lead to a career researching artificial intelligence and a passion for open data.

Published 14 April 2015
Written by Dr Danny Kingsley
Creative Commons License