Monthly Archives: September 2017

Milestone -1000 datasets in Cambridge’s repository

September 21, 2017UncategorizedApollo, data, digital curation, repository, research data managementOffice of Scholarly Communication

Last week, Cambridge celebrated a huge milestone – the deposit of the 1000th dataset to our repository Apollo since the launch of the Research Data Facility in early 2015. This is the culmination of a huge amount of work by the team in the Office of Scholarly Communication, in terms of developing systems, workflows, policies and through an extensive advocacy campaign. The Research Data team have run 118 events over the past couple of years and published 39 blogs.

In the past 12 months alone there have been 26000 downloads of the data in Apollo. In some cases the dataset has been downloaded many times – 170 – and the data has featured in news, blogs and Twitter.

An event was held at Cambridge University Library last week to celebrate this milestone.

Opening remarks

The Director of Library Services, Dr Jess Gardner opened proceedings with a speech where she noted “the Research Data Services and all who sail in her are at the core of our mission in our research library”.

Dr Gardner referred to the library’s long and proud history of collecting and managing research data that “began on vellum, paper, stone and bone”. The research data of luminaries such as Isaac Newton and Charles Darwin was on paper and, she noted “we have preserved that with great care and share it openly on line through our digital library.”

Turning to the future, Dr Gardner observed: “But our responsibility now is today’s researcher and today’s scientists and people working across all disciplines across our great university. Our preservation stewardship of that research data from the digital humanities across the biomedical is a core part of what we now do.”

“In the 21^st century our support and our overriding philosophy is all about supporting open research and opening data as widely as possible,” she noted. “It is about sharing freely wherever it is appropriate to do so”. [Dr Gardner’s speech is in full at the end of this post.]

Perspectives from a researcher

The second speaker was Zoe Adams, a PhD student at Cambridge who talked about the work she has done with Professor Simon Deakin on the Labour Regulation Index in association with the Centre for Business Research.

Ms Adams noted it was only in retrospect she could “appreciate the benefit of working in a collaborative project and open research generally”. She discussed how helpful it had been as an early career researcher to be “associated with something that was freely available”. She observed that few of her peers had many citations, and the reason she did was because “the dataset is online, people use the data, they cite the data, and cite me”.

Working openly has also improved the way she works, she explained, saying “It has given me a new perspective on what research should be about. … It gives me a sense that people are relying on this data to be accurate and that does change the way you approach it.”

View from the team

The final speaker was Dr Lauren Cadwallader, Joint Deputy Head of the OSC with responsibility for the Research Data Facility, who discussed the “showcase dataset of the data that we can produce in the OSC” which is taken from usage of our Request a Copy service.

Dr Cadwallader noted there has been an increase in the requests for theses over time. “This is a really exciting observation because the Board of Graduate studies have agreed that all students should deposit a digital copy of their thesis in our repository,” she said. “So it is really nice evidence that we can show our PhD students that by putting a copy in the repository people can read it and people do want to read theses in our repository.”

One observation was that several of the theses that were requested were written 60 years ago, so the repository is sharing older research as well. The topics of these theses covered algebra, Yorkshire evangelists and one of the oldest requested theses was written in 1927 about the Falkland Islands. “So there is a longevity in research and we have a duty to provide access to that research, ” she said.

Thanks go to…

The dataset itself is one created by the OSC team looking at the usage of our Request a Copy service. The analysis undertaken by Peter Sutton Long and we recently published a blog post about the findings.

The music played at the event was complied by Tony Malone and covers almost 1000 years of music, from Laura Cannell’s reworking of Hildegard of Bingen, to Jane Weaver’s Modern Cosmology. There are acknowledgments to Apollo, and Cambridge too. The soundtrack is available for those interested in listening.

This achievement is entirely due to the incredible work of the team in the Research Data Facility and their ability to engage with colleagues across the institution, the nation and the world. In particular the vision and dedication of Dr Marta Teperek cannot be understated.

In the words of Dr Gardner: “They have made our mission different, they have made our mission better, through the work they have achieved and the commitment they have.”

The event was supported by the Arcadia Fund, a charitable fund of Lisbet Rausing and Peter Baldwin.

Published 21 September 2017
Written by Dr Danny Kingsley

Speech by Dr Jess Gardner

First let us begin with some headline numbers. One thousand datasets. This is hugely significant and a very high level when looking at research repositories around the country. There is every reason to be proud of that achievement and what it means for open research.

There have been 26000 downloads of that data in the past 12 months alone – that is about use and reuse of our research data and is changing the face of how we do research. Some of these datasets have been downloaded 117 times and used in news, blogs and Twitter. The Research Data team have written 39 blogs about research data and have run 118 events, most of these have been with researchers.

While the headline numbers give us a sense of volume, perhaps let’s talk about the underlying rationale and philosophy behind this, which is core.

Cambridge University Library has a 600 year old history we are very proud of. In that time we have had an abiding responsibility to collect, care for and make available for use and reuse, information and research objects that form part of the intrinsic international scholarly record of which Cambridge has been such a strong part. And the ability for those ideas to inspire new ideas. The collection began on vellum, paper, and stone and bone.

And today much of that of course is digital. You can’t see that in the same way you can see the manuscripts and collections. It is sometimes hard to grasp when we are in this grand old dame of a building that I dare you not to love. It is home to the physical papers of such greats as Isaac Newton and Charles Darwin. Their research data was on paper and we have preserved that with great care and share it openly on line through our digital library. But our responsibility now is today’s researcher and today’s scientists and people working across all disciplines across our great university. Our preservation stewardship of that research data from the digital humanities across the biomedical is a core part of what we now do.

And the people in this room have changed that. They have made our mission different, they have made our mission better through the work they have achieved and the commitment they have.

Philosophically this is very natural extension of what we have done in the Library and the open library and its great research community for which this very building is designed. Some of you may know there is a philosophy behind this building and the famous ‘open library Cambridge’. In the 19^th century and 20^th century that was mostly about our open stack of books and we have quite a few of them, we are a little weighed down by them.

Our research data weighs less but it is just as significant and in the 21^st century our support and our overriding philosophy is all about supporting open research and opening data as widely as possible. It is about sharing freely wherever it is appropriate to do so and there are many reasons why data isn’t open sometimes, and that is fine. What we are looking for is managing so we can make those choices appropriately, just as we have with the archive for many, many years.

So whilst as there is a fantastic achievement to mark tonight with those 1000 datasets it really is significant, we are really celebrating a deeper milestone with our research partners, our data champions, our colleagues in the research office and in the libraries across Cambridge, and that is about the changing role in research support and library research support in the digital age, and I think that is something we should be very proud of in terms of what we have achieved at Cambridge. I certainly am.

I am relatively new here at Cambridge. One of the things that was said to me when I was first appointed to the job was how lucky I was to be working at this University but also with the Office of Scholarly Communication in particular and that has proved to be absolutely true. I like to take this opportunity to note that achievement of 1000 datasets and to state very publicly that the Research Data Services and all who sail in her are at the core of our mission in our research library. But also to thank you and the teams involved for your superb achievements. It really is something to be very proud of and I thank you.

Biting the hand that feeds – the obfuscation of publishers

September 18, 2017UncategorizedOffice of Scholarly Communication

Let’s not pull any punches here. We are unimpressed. Late last week HEFCE published a blog: Are UK universities on track to meet open access requirements? In the blog HEFCE identified the key issues in meeting OA requirements as:

The complexity of the OA environment
Resource constraints
Cultural resistance to OA
Inadequate technical infrastructure.

Right. So the deliberate obstruction to Open Access by the academic publishing industry doesn’t factor at all?

Policy confusion

We also note that the fact that the funders have different compliance requirements in terms of the means by which we make work available, the timing in the publication process and the financial support of their policies is not articulated clearly in this list. The euphemism used is ‘complexity’.

Well, yes. To give some idea of how ‘complex’ this situation is, the sister blog to this one describes the decision making process the Cambridge Open Access Team follows to ensure compliance with our multiple policies.

But we are hopeful the impending creation of UK Research and Innovation bringing HEFCE into the same regulatory body as the Research Councils will result in something being done about the conflicting policy problem. Indeed, the survey HEFCE is running may feed into that process.

Publisher obfuscation

However there are no such positive outlooks for the challenges publishers continually throw at us in relation to Open Access.

Elsevier has a long and complicated list of embargoes. There is a different list for embargoes imposed in the UK to those for the rest of the world. The complications of a range of embargo periods and some journals with non-standard arrangements are apparent on both Wiley’s and Taylor & Francis’ pages. BMJ has a non-compliant special embargo of 12 months for funders that require archiving of articles. There is no embargo at all for non-funded papers.

An exemplar is Springer with a standard embargo of 12 months for everything. However, because we are signed up to the Springer Compact most of our publications are published Open Access anyway.

We are not alone in our irritation. In the last couple of months there have been two publications identifying the amount of work libraries do to manage embargoes for Open Access compliance.

The University of St Andrews published a UKCORR blog on 22 August. Requesting permission: reflections and perspectives from the University of St Andrews discussed the processes they have to manage to ensure compliance with publishers which don’t have a public Open Access or author self-archiving policy. The reason this is a challenge is because 60% of their permissions requests are for outputs potentially in scope for the REF open access policy. St Andrews notes that “having an effective permissions policy can potentially affect an institution’s approach to their REF return and level of exceptions required.”

Management of poor publisher practices in relation to Open Access is not a UK specific problem. In July, Leila Sterman, scholarly communication librarian at Montana State University published an article in College and Research Libraries News – The enemy of the good: How specifics in publisher’s green OA policies are bogging down IR deposits. In the article she argued that there is no consistency in policies and embargoes, which creates unnecessary work. She states that publishers, “who often claim they are supportive of green open access, work to impose restrictions on digital works as if they were physical items being placed in physical locations.”

Sterman also refers to the same challenges identified by St Andrews, noting that: Green open access policies are often buried on publisher’s websites or only mentioned in contracts. This practice obfuscates important information, increasing both the time library staff spend searching for that information and author’s obliviousness to the opportunities and restrictions of green open access.

Indeed this is not a new issue. Over four years ago in a previous role and different country, I published a post: Walking in quicksand – keeping up with copyright agreements which notes similar issues as these two recent papers, but also identifies the issue of publishers changing their policies without notice.

Do we need embargoes?

Publishers argue that they need embargoes to remain ‘sustainable’. The claim is that by making an author’s copy of the work (not copyedited or formatted) available in a repository on a relatively piecemeal basis will cause libraries to cancel subscriptions en masse. Despite repeated attempts, to date there has been no evidence released to support this claim.

The UK produces 6% of the world’s research output. And yet when the RCUK policy was announced some publishers (see here and here) changed their policies across the globe to take advantage of the huge amounts of UK government funds being added into the system.

As an aside, the green = cancellation argument does beg the question about the value publishers themselves place on the work they do between an Author’s Accepted Manuscript and the final Version of Record. If access to the AAM is apparently good enough for libraries to cancel subscriptions then why bother doing the extra work?

Getting some perspective

But let’s think about the bigger picture. Researchers share their publications in multiple ways. ResearchGate and Academia.edu are academic sharing sites that do not monitor the copyright status of the work that is uploaded, and which have highly aggressive content recruitment strategies.

In 2015 Universities UK published a paper Monitoring the transition to open access

This report contained a table identifying where research was available to download.

Institutional repositories are the red section. The really small red section. Globally, institutional repositories hold 4.8% of all of the AAMs available. In the UK, probably due to the strongest Open Access mandates in the world, the percentage of AAMs available in institutional repositories proportionally is slightly higher at 7.9%.

These are tiny numbers. The research material research institutions are making available in their repositories are not the big threat to publishers’ ‘sustainability’.

In contrast, the incredible coverage of SciHub – which provides (illegal) access to two thirds of the world’s research – as the final published version – poses a real actual threat.

Who loses out here?

Of all the different sharing platforms, academic libraries are the only ones curating deposits and navigating the embargo labyrinth. Author deposits to commercial sharing sites and PubMed Central primarily rely on authors’ instructions relating to embargoes.

Academic institutions (and by proxy the taxpayer) are paying multiple times – for the creation of the work, for the editing and peer review of the work and for the subscription to the work, or the Article Processing Charge to make the work available (or both, in the case of hybrid journals). We are also paying a huge levy on green open access through staffing costs to meet embargo requirements.

The subscriptions paid by academic libraries worldwide hold up the publishing industry. Talk about biting the hand that feeds you.

Published 18 September 2017
Written by Dr Danny Kingsley

Open Access policy, procedure & process at Cambridge

September 18, 2017UncategorizedAPC, COAF, gold open access, HEFCE, open access, RCUKArthur Smith

The Open Access policies developed and applied by the UK’s major research funders (HEFCE, RCUK and COAF) all aim to achieve one thing: freedom of knowledge for all. However, the specific mechanisms these funders have taken to achieve this goal varies considerably and requires careful implementation from higher education institutions (HEIs).

In this blog post, I’ll describe the different workflows required to meet each funder’s expectations and then look at how these policies intersect with each other to form a tangled web of policy nightmare. Some of the decisions and processes will be peculiar to the University of Cambridge, especially when it comes to decisions around funding for article processing charges (APCs), but the general approach will be true of most UK HEIs.

First up, HEFCE’s Open Access policy:

At the outset, let’s be clear: the HEFCE Open Access policy applies to all researchers working at all UK HEIs. If an HEI wants to submit a journal article for consideration in REF 2021 the article must appear in an Open Access repository (although there is a long list of exceptions). Keen observers will note that in the above flowchart HEFCE’s policy is enforced based on deposit within three months of acceptance. This requirement has caused significant consternation amongst researchers and administrators alike; however, during the first two years of the policy (i.e. until 31 March 2018) publications deposited within three months of publication will still be eligible for the REF. At Cambridge, we have been recording manuscript deposits that meet this criterion as exceptions to the policy[1].

Next up, the RCUK Open Access policy. This policy is straightforward to implement, the only complication being payment of APCs, which is contingent on sufficient block grant funding. Otherwise, the choice for authors is usually quite obvious: does the journal have a compliant embargo? No? Then pay for immediate open access.

One extra feature of the RCUK Open Access policy not captured here is the Europe PMC deposit requirement for MRC and BBSRC funded papers. Helpfully, the policy document makes no mention of this requirement; rather, this feature of the policy appears in the accompanying FAQs. I’m not expert, but this seems like the wrong way to write policies.

Finally, we have the COAF policy, possibly the single most complicated OA policy to enforce anywhere in the world. The most challenging part of the COAF policy is the Europe PMC deposit requirement. It is often difficult to know whether a journal will indeed deposit the paper in Europe PMC, and if, for whatever reason, the publisher doesn’t immediately deposit the paper, it can take months of back-and-forth with editors, journal managers and publishing assistants to complete the deposit. This is an extremely burdensome process, though the blame should be laid squarely at the publishers. How hard is it to update a PMC record? Does it really take two months to update the Creative Commons licence?

This leads us to one of the more unusual parts of the COAF policy: publications are considered journals if they are indexed in Medline. That means we will occasionally receive book chapters that need to meet the journal OA policy. Most publishers are unwilling to make such publications OA in line with COAF’s journal requirements so they are usually non-compliant.

What happens if you should be foolish enough to try to combine these policies into one process? Well, as you might expect, you get something very complicated:

This flowchart, despite its length, still doesn’t capture every possible policy outcome and is missing several nuances related to the payment of APCs, but nonetheless, it gives an idea of the enormous complexity that underlies the decision making process behind every article deposited in Apollo and in other repositories across the UK.

[1] Within the University’s CRIS, Symplectic Elements, only one date range is possible so we have chosen to monitor compliance from the acceptance date. Publications deposited within the ‘transitional’ three months from publication window receive an ‘Other’ exception within Elements that contains a short note to this effect.

Published 18 September 2017
Written by Dr Arthur Smith

Unlocking Research

Open Research at Cambridge