Good news stories about data sharing?

We have been speaking to researchers around the University recently to discuss the expectations of their funders in relation to data management. This has raised the issue of how best to convince people this is a process that benefits society rather than a waste of time or just yet another thing they are being ‘forced to do’ – which is the perspective of some that we have spoken with.

Policy requirements

In general most funders require a Research Data Management Plan to be developed at the beginning of the project – and then adhered to. But the Engineering and Physical Sciences Research Council (EPSRC) have upped the ante by introducing a policy requiring that papers published from May 2015 onwards resulting from funded research include a statement about where the supporting research data may be accessed. The data needs to be available in a secure storage facility with a persistent URL, and that it must be available for 10 years from the last time it was accessed.

Carrot or stick?

While having a policy from funders does make researchers sit up and listen, there is a perception in the UK research community that this is yet another impost on time-poor researchers. This is not surprising. There has recently been an acceleration of new rules about sharing and assessing research.

The Research Excellence Framework (REF) occurred last year, and many researchers are still ‘recuperating’. Now the Higher Education Funding Council of England (HEFCE) is introducing  a policy in April 2016 that any peer reviewed article or conference paper that is to be included in the post-2014 REF must have been deposited to their institution’s repository within three months of acceptance or it cannot be counted.  This policy is a ‘green’ open access policy.

The Research Councils UK (RCUK) have had an open access policy in place for two years, introduced in 1 April 2013, a result of the 2012 Finch Report. The RCUK policy states that funded research outputs must be available open access, and it is permitted to make them available through deposit into a repository. At first glance this seems to align with the HEFCE policy, however, restrictions on the allowed embargo periods mean that in practice most articles must be made available gold open access – usually with the payment of an accompanying article processing charge. While these charges are supported by a block grant fund, there is considerable impost on the institutions to manage these.

There is also considerable confusion amongst researchers about what all these policies mean and how they relate to each other.

Data as a system

We are trying to find some examples about how making research data available can help research and society. It is unrealistic to hope for something along the lines of Jack Akandra‘s breakthrough for a diagnostic test for pancreatic cancer using only open access research.

That’s why I was pleased when Nicholas Gruen pointed me to a report he co-authored: Open for Business: How Open Data Can Help Achieve the G20 Growth Target – A Lateral Economics report commissioned by Omidyar Network – published in June 2014.

This report is looking primarily at government data but does consider access to data generated in publicly funded research. It makes some interesting observations about what can happen when data is made available. The consideration is that data can have properties at the system level, not just the individual  level of a particular data set.

The point is that if data does behave in this way, once a collection of data becomes sufficiently large then the addition of one more set of data could cause the “entire network to jump to a new state in which the connections and the payoffs change dramatically, perhaps by several orders of magnitude”.

Benefits of sharing data

The report also refers to a 2014 report The Value and Impact of Data Sharing and Curation: A synthesis of three recent studies of UK research data centres. This work explored the value and impact of curating and sharing research data through three well-established UK research data centres – the Archaeological Data Service, the Economic and Social Data Services, and the British Atmospheric Data Centre.

In summarising the results, Beagrie and Houghton noted that their economic analysis indicated that:

  • Very significant increases in research, teaching and studying efficiency were realised by the users as a result of their use of the data centres;
  • The value to users exceeds the investment made in data sharing and curation via the centres in all three cases; and
  • By facilitating additional use, the data centres significantly increase the measurable returns on investment in the creation/collection of the data hosted.
So clearly there are good stories out there.

If you know of any good news stories that have arisen from sharing UK research output data we would love to hear them. Email us or leave a comment!

Interview with Nigel Shadbolt on The Life Scientific

Sir Nigel Shadbolt was interviewed on ‘The Life Scientific‘ this morning  on BBC Radio4 about open data.

The general discussion ranged from his background and what got him interested in this area. The data being discussed is more about government public data (such as medical information or cyclist black spots) than that generated in research projects, but an interesting conversation nonetheless. A couple of items that jumped out to me:

16:50 – When we talk about data, really we are talking about information … Data and information and knowledge are kinda different and mostly when we talk about open data we are talking about information. Data (such as a number) only becomes information if it is placed in context. If you can do something with the information then it becomes knowledge – ‘actionable information’. These are different strains of stuff that the computer holds.  We need open information to build knowledge. The semantic web.

16:00 – Do the risks of making data available outweigh the benefits? And do we ask the general public’s opinion or just tell them that this is what we do? They want some sort of empowerment in this but often there is no empowerment.

29:00 – We are barely scratching the surface in terms of the insights as we anlayse and look for patterns in the information.  We are living in a world that is increasingly emitting data – people are increasingly able to collect data onto and off their phones (or supercomputers, depending on how you look at it). This data richness demands a new world for applications we haven’t thought of and ways of analysing the information.

Listen to the half hour interview here.

Blurb from the BBC webpage:

Sir Nigel Shadbolt, Professor of Artificial Intelligence at Southampton University, believes in the power of open data. With Sir Tim Berners-Lee he persuaded two UK Prime Ministers of the importance of letting us all get our hands on information that’s been collected about us by the government and other organisations. But, this has brought him into conflict with people who think there’s money to be made from this data. And open data raises issues of privacy.

Nigel Shadbolt talks to Jim al-Khalili about how a degree in psychology and philosophy lead to a career researching artificial intelligence and a passion for open data.

Published 14 April 2015
Written by Dr Danny Kingsley
Creative Commons License

A review of the RCUK review of implementation of its OA policy

The RCUK released its ‘Review of the implementation of the RCUK Policy on Open Access’ today and it makes interesting reading. First I should state that I think this is a good report, it seems well researched and balanced in tone and it is well written and laid out. Jisc also welcomes the report.

Overall findings

It seems that a ‘common factor’ amongst all of the people and groups interviewed was ‘a general acceptance and welcome given to the concept of open access’. However, the administrative effort to implement the policy and distribute the funds is significant. This is not helped by a level of confusion about different funding policies, particularly relating to embargo length, licence usage and expectations of data collection for compliance monitoring.

Not only is this an administrative problem but it is ‘leading to researchers ultimately not engaging with open access at all as it was perceived as being ‘too difficult’.’ (p16) Certainly there have been instances of this view expressed by researchers at Cambridge University.

This blog will concentrate on a few aspects of the review I thought interesting – support or otherwise of hybrid, reporting issues, non-compliance amongst publishers, lack of awareness amongst researchers and licenses. It finishes with an observation that the review validates some of the decisions Cambridge has made in relation to implementing the RCUK policy.

I should note the review includes some interesting information about learned societies, embargo periods and monographs but these are big issues that need teasing out on their own.

Supporting hybrid

As the Wellcome Trust found in their recent analysis of open access spend in 2013/2014 the RCUK reported that the amount charged for APCs for hybrid open access continue to be ‘consistently more expensive’ than fully OA journals, ‘despite the fact that hybrid journals still enjoyed a revenue stream through subscriptions’.

The review recommended that this should be monitored and ‘if these costs show no sign of being responsive to market forces, then a future review should explore what steps RCUK could take to make this market more effective’ (p25).

The reported amounts being spent on APCs are also interesting. The average APC paid during the first year, at £1,600 inc VAT was £472 less than the average APC assumed by the Finch Group, which was used as a proxy when calculating the size of the RCUK block grant (£1,727 + VAT = £2,072) (p11). While this in itself is not surprising as the amount quoted in the Finch report was seen to be high by open access advocates at the time, it is interesting to note that the average APC paid by Cambridge in 2014 was higher than the average quoted in the review at £1891.63.

Despite this large amount of money being spent on APCs, publishers offering hybrid – not the fully open access publishers, it should be noted – ’questioned’ level of the block grant currently offered by RCUK. These publishers expressed the view that the block grant ‘was too low to properly fund the transition to gold. Publishers felt that the transition to full gold open access publishing would be successful only if it was fully funded’ (pp15-16). It does beg the question as to what ‘fully funded’ means in this context.

Researcher awareness

Researchers appear to remain unaware of the tsunami that is occurring in scholarly communication. By centralising the payment of APCs we once again have a situation where researchers are divorced from the economic realities of publishing, in the same way libraries have traditionally been the foil between the economics of subscriptions and the access to the materials.

This concern is supported by the review’s observation that: ‘There is little evidence to suggest that the introduction 
of the RCUK policy had much of an impact on author behaviour, with publishers reporting that authors did not seem to be changing their choices on where to publish.
’ (p15)

If anything it has had a negative effect where ‘RCUK’s preference for gold has therefore been, at times, seen as a barrier to implementation and ‘buy-in’ from various communities across the disciplines’(p26). Anecdotally we are seeing this happening at Cambridge.

The review did note that ‘further transparency on what is being paid in APCs by institutions to publishers will be crucial in helping to change behaviours and ease the transition towards open access’.

Reporting issues

The review noted at several stages that there have been difficulties with collecting data and that they ‘have been more reliant on opinion than perhaps
 we might have liked to at the outset of the review’ (p4). They acknowledge the process would have been assisted greatly if there had been some standardisation in what the RCUK was asking for as the ‘template was, understandably, interpreted in a variety 
of ways’ (p9) I should note that Jisc is attempting to standardise the reporting.

When Cambridge was asked to report on compliance levels for the RCUK we were hampered by our inability to articulate the complete number of articles being published that have been funded  by RCUK. The review recognises that this was a widespread problem, particularly in ‘larger, distributed institutions (such as the research intensive universities)’. (p9). Many institutions provided estimates for the compliance reporting.

The review also looked at the (substantial) costs associated with collecting this data and noted that publishers could help given that the sources of data held by publishers ‘would be administratively simpler to collect’ (p10).

Not only could publishers reduce the costs of compliance by providing data, but, the review noted that  ‘complexities in working with publishers [was] one of the areas that had generated considerable administrative effort’ (p21). The problems include initial negotiations and ensuring that licences and invoicing were correct. The cost for this is borne by authors, library and administrative staff and the finance team.

Non compliant publishers

This then moves the focus to the compliance of publisher – which can be taken in a couple of ways. First, the review panel looked at how 
the publishers had helped institutions and researchers to comply with the policy by ensuring that their journals were ‘compliant’ (p11).

It seems that a considerable amount of funded research where an APC has been paid is not compliant with the RCUK policy because the license is not a CC-BY license. For example Elsevier stated that around 40% of the articles from RCUK funding that they had published gold were not under the CC-BY licence and are therefore not compliant with the policy. The American Society of Plant Biologists noted that its journal was not compliant as it did not offer the CC-BY licence and that was unlikely to change in the near future (p19).

Other publishers offer more than one type of license which makes it confusing for the authors, indeed  there was clear evidence that some publishers were offering a choice of licences, even when they knew that the author was RCUK-funded..

The question of publishers not making articles available even after an APC was paid was not singled out in the report but is implied in a  few of the statements in the review, particularly in the institutions having to double check if work is available post publication. This is an area which needs further analysis.

Licensing

The issue of the CC-BY licenses was a recurrent theme in the review. Many arts, humanities and social science disciplines hold ‘principled and practical objections to the use of CC-BY licences’ (p18). This is partly because work under a CC-BY license ‘could be both used commercially in ways of which the author does not approve and also might not be properly acknowledged as their work’ (pp19-20).

This does demonstrate a lack of full understanding of what a CC-By license allows, but  this is not surprising as  ‘Many publishers … reported a significant number of researchers were signing licence agreements without understanding what they were signing’ (p19).

Also highlighted in evidence was an issue with third
party copyright in that some rights owners (for example, image libraries) are reluctant to license material for digital reproduction, let alone for reproduction in an article that
is published under a CC-BY licence.

Support for the University of Cambridge approach

It was heartening to read of a couple of areas that support the position that Cambridge University has taken towards the implementation of the RCUK and HEFCE policies.

The review mentioned visits to institutions and noted how long it takes 
for researchers to learn about open access including the requirements, expectations and processes they need to follow. ‘One senior researcher commented that it had taken a full half a day to learn about open access.’ At Cambridge University we have taken a very soft touch approach to the researcher who simply has to fill in a few fields and upload a file through a simple interface and the Open Access team takes care of the rest.

Cambridge University has also taken a ‘first in best dressed’ approach to expenditure of the block grant. This seems to have been a good decision as the review has noted that there were concerns raised within both written and oral evidence that where institutions had distributed the block grant by department or faculty, as it had a detrimental impact on some disciplines.

About the review

The review covered the period from April 2013 to July 2014. When the RCUK policy was announced they did say that there would be a review within a year, however there was a need for a full year of implementation before they collected the data so hence the delay.

Chaired by an independent researcher, Professor Sir Robert Burgess, the review panel consisted of ‘knowledgeable members of the various communities and sectors with an interest in the policy and open access’. The evidence collected was through over 80 submissions,  some verbal evidence and a small number of visits to institutions to talk informally with researchers, librarians and institutional administrative staff about their experiences of implementing the policy.

The report mentions on no fewer than three occasions that it is a review of the policy implementation not a debate on the merits of open access.

The next planned review will be in 2016.

Published 26 March 2015
Written by Dr Danny Kingsley
Creative Commons License