Tag Archives: compliance

Open Data – moving science forward or a waste of money & time?

On the 4 November the Research Data Facility at Cambridge University invited some inspirational leaders in the area of research data management and asked them to address the question: “is open data moving science forward or a waste of money & time?”. Below are Dr Marta Teperek’s impressions from the event.

Great discussion

Want to initiate a thought-provoking discussion on a controversial subject? The recipe is simple: invite inspirational leaders, bright people with curious minds and have an excellent chair. The outcome is guaranteed.

We asked some truly inspirational leaders in data management and sharing to come to Cambridge to talk to the community about the pros and cons of data sharing. We were honoured to have with us:

  • PRE_IntroSlide_V3_20151123Rafael Carazo-Salas, Group Leader, Department of Genetics, University of Cambridge
  • Sarah Jones, Senior Institutional Support Officer from the Digital Curation Centre; @sjDCC
  • Frances Rawle, Head of Corporate Governance and Policy, Medical Research Council; @The_MRC
  • Tim Smith, Group Leader, Collaboration and Information Services, CERN/Zenodo; @TimSmithCH
  • Peter Murray-Rust, Molecular Informatics, Dept. of Chemistry, University of Cambridge, ContentMine; @petermurrayrust

The discussion was chaired by Dr Danny Kingsley, the Head of Scholarly Communication at the University of Cambridge (@dannykay68).

What is the definition of Open Data?

IMG_PMRWithText_V1_20151126The discussion started off with a request for a definition of what “open” meant. Both Peter and Sarah explained that ‘open’ in science was not simply a piece of paper saying ‘this is open’. Peter said that ‘open’ meant free to use, free to re-use, and free to re-distribute without permission. Open data needs to be usable, it needs to be described, and to be interpretable. Finally, if data is not discoverable, it is of no use to anyone. Sarah added that sharing is about making data useful. Making it useful also involves the use of open formats, and implies describing the data. Context is necessary for the data to be of any value to others.

What are the benefits of Open Data?

IMG_RCSWithText_V1_20151126Next came a quick question from Danny: “What are the benefits of Open Data”? followed by an immediate riposte from Rafael: “What aren’t the benefits of Open Data?”. Rafael explained that open data led to transparency in research, re-usability of data, benchmarking, integration, new discoveries and, most importantly, sharing data kept it alive. If data was not shared and instead simply kept on the computer’s hard drive, no one would remember it months after the initial publication. Sharing is the only way in which data can be used, cited, and built upon years after the publication. Frances added that research data originating from publicly funded research was funded by tax payers. Therefore, the value of research data should be maximised. Data sharing is important for research integrity and reproducibility and for ensuring better quality of science. Sarah said that the biggest benefit of sharing data was the wealth of re-uses of research data, which often could not be imagined at the time of creation.

Finally, Tim concluded that sharing of research is what made the wheels of science turn. He inspired further discussions by strong statements: “Sharing is not an if, it is a must – science is about sharing, science is about collectively coming to truths that you can then build on. If you don’t share enough information so that people can validate and build up on your findings, then it basically isn’t science – it’s just beliefs and opinions.”

IMG_TSWithText_V1_20151126Tim also stressed that if open science became institutionalised, and mandated through policies and rules, it would take a very long time before individual researchers would fully embrace it and start sharing their research as the default position.

I personally strongly agree with Tim’s statement. Mandating sharing without providing the support for it will lead to a perception that sharing is yet another administrative burden, and researchers will adopt the ‘minimal compliance’ approach towards sharing. We often observe this attitude amongst EPSRC-funded researchers (EPSRC is one of the UK funders with the strictest policy for sharing of research data). Instead, institutions should provide infrastructure, services, support and encouragement for sharing.

Big data

Data sharing is not without problems. One of the biggest issues nowadays it the problem of sharing of big data. Rafael stressed that with big data, it was extremely expensive not only to share, but even to store the data long-term. He stated that the biggest bottleneck in progress was to bridge the gap between the capacity to generate the data, and the capacity to make it useful. Tim admitted that sharing of big data was indeed difficult at the moment, but that the need would certainly drive innovation. He recalled that in the past people did not think that one day it would be possible just to stream videos instead of buying DVDs. Nowadays technologies exist which allow millions of people to watch the webcast of a live match at the same time – the need developed the tools. More and more people are looking at new ways of chunking and parallelisation of data downloads. Additionally, there is a change in the way in which the analysis is done – more and more of it is done remotely on central servers, and this eliminates the technical barriers of access to data.

Personal/sensitive data

IMG_FRWithText_V1_20151126Frances mentioned that in the case of personal and sensitive data, sharing was not as simple as in basic sciences disciplines. Especially in medical research, it often required provision of controlled access to data. It was not only important who would get the data, but also what they would do with it. Frances agreed with Tim that perhaps what was needed is a paradigm shift – that questions should be sent to the data, and not the data sent to the questions.

Shades of grey: in-between “open” and “closed”

Both the audience and the panellists agreed that almost no data was completely “open” and almost no data was completely “shut”. Tim explained that anything that gets research data off the laptop to a shared environment, even if it was shared only with a certain group, was already a massive step forward. Tim said: “Open Data does not mean immediately open to the entire world – anything that makes it off from where it is now is an important step forward and people should not be discouraged from doing so, just because it does not tick all the other checkboxes.” And this is yet another point where I personally agreed with Tim that institutionalising data sharing and policing the process is not the way forward. To the contrary, researchers should be encouraged to make small steps at a time, with the hope that the collective move forward will help achieving a cultural change embraced by the community.

Open Data and the future of publishing

Another interesting topic of the discussion was the future of publishing. Rafael started explaining that the way traditional publishing works had to change, as data was not two-dimensional anymore and in the digital era it could no longer be shared on a piece of paper. Ideally, researchers should be allowed to continue re-analysing data underpinning figures in publications. Research data underpinning figures should be clickable, re-formattable and interoperable – alive.

IMG_DKWithText_V1_20151126Danny mentioned that the traditional way of rewarding researchers was based on publishing and on journal impact factors. She asked whether publishing data could help to start rewarding the process of generating data and making it available. Sarah suggested that rather than having the formal peer review of data, it would be better to have an evaluation structure based on the re-use of data – for example, valuing data which was downloadable, well-labelled, re-usable.

Incentives for sharing research data

IMG_SJWithText_V1_20151126The final discussion was around incentives for data sharing. Sarah was the first one to suggest that the most persuasive incentive for data sharing is seeing the data being re-used and getting credit for it. She also stated that there was also an important role for funders and institutions to incentivise data sharing. If funders/institutions wished to mandate sharing, they also needed to reward it. Funders could do so when assessing grant proposals; institutions could do it when looking at academic promotions.

Conclusions and outlooks on the future

This was an extremely thought-provoking and well-coordinated discussion. And maybe due to the fact that many of the questions asked remained unanswered, both the panellists and the attendees enjoyed a long networking session with wine and nibbles after the discussion.

From my personal perspective, as an ex-researcher in life sciences, the greatest benefit of open data is the potential to drive a cultural change in academia. The current academic career progression is almost solely based on the impact factor of publications. The ‘prestige’ of your publications determines whether you will get funding, whether you will get a position, whether you will be able to continue your career as a researcher. This, connected with a frequently broken peer-review process, leads to a lot of frustration among researchers. What if you are not from the world’s top university or from a famous research group? Will you be able to still publish your work in a high impact factor journal? What if somebody scooped you when you were about to publish results of your five years’ long study? Will you be able to find a new position? As Danny suggested during the discussion, if researchers start publishing their data in the ‘open”’ there is a chance that the whole process of doing valuable research, making it useful and available to others will be rewarded and recognised. This fits well with Sarah’s ideas about evaluation structure based on the re-use of research data. In fact, more and more researchers go to the ‘open’ and use blog posts and social media to talk about their research and to discuss the work of their peers. With the use of persistent links research data can be now easily cited, and impact can be built directly on data citation and re-use, but one could also imagine some sort of badges for sharing good research data, awarded directly by the users. Perhaps in 10 or 20 years’ time the whole evaluation process will be done online, directly by peers, and researchers will be valued for their true contributions to science.

And perhaps the most important message for me, this time as a person who supports research data management services at the University of Cambridge, is to help researchers to really embrace the open data agenda. At the moment, open data is too frequently perceived as a burden, which, as Tim suggested, is most likely due to imposed policies and institutionalisation of the agenda. Instead of a stick, which results in the minimal compliance attitude, researchers need to see the opportunities and benefits of open data to sign up for the agenda. Therefore, the Institution needs to provide support services to make data sharing easy, but it is the community itself that needs to drive the change to “open”. And the community needs to be willing and convinced to do so.

Further resources

  • Click here to see the full recording of the Open Data Panel Discussion.
  • And here you can find a storified version of the event prepared by Kennedy Ikpe from the Open Data Team.

Thank you

We also wanted to express a special ‘thank you’ note to Dan Crane from the Library at the Department of Engineering, who helped us with all the logistics for the event and who made it happen.

Published 27 November 2015
Written by Dr Marta Teperek
Creative Commons License

Where to from here? Open Access in Five Years

As part of the Office of Scholarly Communication Open Access Week celebrations, we are uploading a blog a day written by members of the team. Thursday is a piece by Dr Arthur Smith looking to the future.


Academic publishing is not what it used to be. Open access has exploded on the scene and challenged the established publishing model that has remained largely unchanged for 350 years. However, for those of us working in scholarly communications, the pace of change feels at times frustratingly slow, with constant roadblocks along the way. Navigating the policy landscape provided by universities, funders and publishers can be maddening, yet we need to remain mindful of how far we have come in a relatively short time. There is no sign that open access is losing momentum, so it’s perhaps instructive to consider the direction we want open access to take over the next five years, based upon the experiences of the past.

So how much is the University of Cambridge publishing and is it open access? Since 1980, according to Web of Science, the University’s publications increased from 3000 articles per year to more than 11,000 in 2014 (Fig. 1). Over the same period the proportion of gold open access articles rose steadily since first appearing on the scene in the late 1990s. Thus far in 2015 nearly one in ten articles is available gold open access, although this ignores the many articles available via green routes.


Fig. 1. Publications at the University of Cambridge since 1980 according to WoS (accessed 14/10/2015).


The HEFCE policy

By far the most important development for open access in the UK has been the introduction of HEFCE’s open access policy. As the policy applies to all higher education institutions it affects every university researcher in the UK. While the policy doesn’t formally start until April 2016, so far progress has been slow (Fig. 2). We believe that less than a third of all the University’s articles that are published today are currently compliant with the HEFCE policy, and despite a strong information campaign, our article submission rate has stagnated at around 250 articles per month, well off the monthly target of 930.

image03 image04

Fig. 2. Publications received to the University of Cambridge open access service. The target number of articles per month is 930.

It’s understandable that some papers will fall through the cracks, but even for high impact journals many papers still don’t comply with the policy. But let’s be clear, aside from any policy compliance issues and future REF eligibility, these numbers reveal that fully two thirds of research papers produced at the University cannot be read without a journal subscription. And if readers can’t afford to pay for access then they’ll happily find other means of obtaining research papers.

What about inviting authors to make their research papers open access? Since June I have tracked five high impact journals and monitored the papers published by University of Cambridge authors (Fig. 3). Upon first discovery of a published paper, only 29% of articles were compliant with the HEFCE policy, which is consistent with our overall experience in receiving AAMs. But even after inviting authors to submit their accepted manuscripts to the University’s open access repository, the number of compliant articles rose to only 42%. Less than a third of authors who were directly contacted and asked to make their work open access eventually submitted their manuscripts. Clearly, the merits of open access are not enough to convince authors to act and distribute their manuscripts.


Fig. 3. Compliant articles published in five high impact journals. Even after direct intervention less than half of all articles are HEFCE compliant.


The SCOAP3 initiative is a publishing partnership that makes journals in the field of particle physics open access. This innovative scheme brings together multiple universities, funders and publishers and turns traditional journals, that are already widely respected by the physics community, into purely open access journals. No intervention is required by either authors or university administrators, making the process of publishing open access as simple as possible. The great advantage of this scheme is that authors don’t need to worry about choosing an open access option from the publisher, nor deal with messy invoices or copyright issues. All of these problems have been swept away.

Jisc Springer Compact

Like SCOAP3 the recently announced Jisc Springer Compact is a coalition of universities in the UK that have agreed a publishing model with Springer that makes ~1600 journals open access. Following a similar Dutch agreement, this publishing model means that any authors with qualifying institutional affiliations will have their publications made open access automatically. We’ve already started receiving our first requests under this scheme. However, unlike the SCOAP3 initiative which ‘flips’ entire journals to gold OA, the journals under the UK Jisc Springer Compact are still hybrid and only content produced by qualifying authors is open access. While this is great for those universities signed up to the deal, it still leaves a great many papers languishing under the subscription model.

Affiliation vs. Community

So which of these strategies will prove to the most successful? Will universities take ownership of open access publishing or will subject based communities come together in publishing coalitions.

The advantage of subject based initiatives is they flip entire journals for the benefit of a whole research community, making all the work within a specific discipline open access. However, without sufficient cohesion and drive within an academic community it’s likely that adoption will be fragmented across the myriad of disciplines. It’s no surprise that SCOAP3 emerged out of the particle physics community, given this scholarly community’s involvement in the development of arXiv, but it’s unrealistic to expect this will be the case everywhere.

Publishing agreements based around institutional affiliations will undoubtedly become more common, but until all universities have agreements in place with all the major publishers (Elsevier, Wiley, Springer, etc.) then a large fraction of scholarly outputs will still remain locked down.

What does the future hold?

Ultimately I want to do myself out of a job. As odd as that sounds, the current system of paying publishers for individual papers to be made open access is a laborious and time consuming process for authors, publishers and universities. Similarly the process of making accepted manuscripts available under the green model is equally ridiculous. Publishers should be automatically depositing AAMs on behalf of authors. There is no evidence that making AAMs available has ever killed a journal, and besides, the sooner we can reach agreements with all the major publishers and research funders that result in change on a global scale the better it will be for everyone.

Published 22 October 2015
Written by Dr Arthur Smith
Creative Commons License

A Day in the Life of an Open Access Research Adviser

As part of the Office of Scholarly Communication Open Access Week celebrations, we are uploading a blog a day written by members of the team. Monday is a piece by Dr Philip Boyes reflecting on the variety of challenges of working in the Open Access team.

As anyone working in it knows all too well, Open Access can be a complicated field, with multiple policies from funders, institutions and publishers which can be complex, sometimes obscure and sometimes mutually contradictory. While we’re keen to raise awareness of and engagement with Open Access issues, the University of Cambridge’s view is that expecting academics to get to grips with all this themselves would represent an unreasonable demand on their time and likely lead to errors and resentment.

Instead, Cambridge’s policy is that authors should simply send us their Accepted Manuscript at acceptance through our simple upload system and our team of Research Advisers will check out exactly what they need to do to comply with all the relevant funder and journal policies and get back to them with individually-tailored advice. The same system also allows us to take care of deposit into the repository for HEFCE and to manage payments from the block grants we’ve received from the UK Research Councils (RCUK) and the Charities Open Access Fund (COAF – seven biomedical charities, including the Wellcome Trust).

The idea is that from the academic’s point of view the process feels smooth and seamless. But the reality is that very little of the process is automated. Behind the scenes there’s a lot of (thankfully metaphorical) running around by our team of three Open Access Research Advisers to provide this service, as well as working on broader issues of communication, processing APCs and improving our systems.

So what does a Cambridge Open Access Research Adviser do all day? Here’s a typical day in the life…

8.45am- Getting started

Arriving in the office, I check my emails and look at the Open Access Helpdesk. Overnight we’ve received around 15 new tickets, as well as some further correspondence on existing ones. Fairly typical. It’s split between manuscript uploads that need advice, general queries and invoicing correspondence from publishers. I start working through these on a first-come-first served basis.

They’re a real mixed bag. If a submitted article is straightforward we can deal with it in a few minutes – we check the journal site for their green and gold options and then advise the author on which is appropriate in each case. We also flag the manuscript for deposit into our repository – at the moment that’s a manual process and is mostly handled by temps.

Today things aren’t straightforward. A lot of the submissions are conference proceedings and there’s very little information on the conference websites. It’s not even clear whether some of these are being formally published (does private distribution on memory stick count? Do they have ISBNs or ISSNs?) It’s going to be a slow morning of chasing up authors and conference organisers for any information they have.

 10.00am – Complexity

I’m more or less through the conference proceedings, but we’re not through with complex cases. One of the invoices we’ve received is for an article we’ve not heard about before. It’s from a senior professor but he’s never submitted it to the open access service so we weren’t able to advise him on policy or eligibility for block grant funds. He selected the gold option for a Wellcome-funded correspondence article and now wants us to pay the $5000 + VAT bill. The trouble is, letters aren’t covered by the Wellcome policy so technically it isn’t eligible. I contact the author and break the news that he might have to pay this large bill himself and that this is why we like people to contact us first.

 11.00am – Clarity

The professor has got back to us. Although the journal’s classed it as a letter, the paper’s actually a very short research article, he says. I decide to contact Wellcome for guidance and let them decide whether they want this to be paid for from the COAF block grant.

 11:30am – Deja-vu

For the moment the backlog on the helpdesk has been cleared and our temps are busy adding manuscripts to the repository and updating previously-added articles with citation details and embargo end-dates. I have a bit of free time to move on to something else so begin to tackle the stack of publisher APC invoices that need processing.

They’re mostly correct, but some publishers and invoicing companies are better than others. Inevitably there are a few errors that need chasing up or publishers who have invoiced us repeatedly for the same thing. Among the stack is an overdue notice from a major publisher for a familiar article. It’s one we’ve repeatedly confirmed was paid fully almost two years ago but every few months ever since the publisher has told us it’s outstanding. I send them back the payment reference and details yet again and ask them to mark the issue as resolved. I somehow suspect we’ll be seeing it again.

 2.00pm – Presentation

Today offers a welcome opportunity to get out of the office. We’re holding a joint Open Access/Open Data presentation to researchers in one of the University’s departments to try and increase awareness of the policies. Our stats show that this department has particularly low engagement with the Open Access service so we’re keen work out why. It’s a fractious crowd. One or two people are keen Open Access advocates and speak up to say how simple the system is, but some others are vocal about their view that it’s an unwarranted burden and tell us they don’t see why they should bother.

We try to explain the benefits and funder mandates, as well as how we’ve tried to make the system as simple as possible. When we get back to the office we find that one of those present has sent us their back-catalogue of thirty articles stretching back to 2007 to put into the repository.

 4.00 – Compliance

While my colleagues work on the helpdesk I need to turn my attention to compliance and reporting. All too often when we’ve paid an APC the publisher hasn’t delivered Open Access with the correct licence, or in some cases at all. I generally try to do a weekly check of the articles for which we’d paid APCs to see whether they’ve been published correctly but it’s time-consuming and things have been busy lately. It’s been around three weeks since the last check so it really needs doing.

But the deadline is also fast approaching for annual reports to RCUK and COAF. These are both large and complex, and cover slightly different periods (and different again from the Jisc report a couple of months ago). It’s proving a major challenge to get the information together from our various systems and to match it to the relevant figures from the University Finance System. I decide to let the compliance checking wait a bit longer and work on trying to move things along on the reports. I make a bit of progress, but there’s still a huge amount left to do – information on thousands of articles that needs to be manually collated. With luck in the future we’ll have integrated systems that can do much of this automatically, but for now each report represents weeks of work.

Wrap up

There is, then, a huge variety and amount of work that goes into the Open Access service. The Helpdesk and the reporting alone would be more than enough to keep us busy, but we also have to make time for outreach and communications, managing the finances, improving our systems and more. We’re finding that as our team grows, we’re starting to specialise more into particular areas, but we’re still basically all generalists, working on all areas of the job. This balance between specialisation for the purposes of efficiency and the need for individuals to be able to move effectively from one task to another – not least to keep our jobs interesting and varied – is one that’s likely to become ever more challenging as the volume of articles we handle increases.

Published 19 October 2015
Written by Dr Philip Boyes
Creative Commons License