Tag Archives: repository

Thoth Archiving Network goes live at Cambridge 

Dr Agustina Martínez-García, Head of Open Research Systems, Digital Initiatives

Cambridge University Library (CUL) is piloting participation in the Thoth Archiving Network, which allows small presses to use a simple deposit option to archive their publications in multiple repository locations, creating the opportunity to safeguard against the complete loss of their open books catalogue, should they cease to operate. 

Participation in the pilot has allowed us to explore the implementation of suitable infrastructure, built on interoperable, open, and widely adopted platforms to support discovery, access, and long-term availability of open scholarly works. 

Work done so far 

We are pleased to share that the Cambridge repository platform participating in the Thoth network is now live at https://thoth-arch.lib.cam.ac.uk/home, and now includes a full back catalogue of two open monograph publishers. This repository is based on the open-source DSpace software

Through the implementation phase, we have worked very closely with the Thoth technical team to support the implementation and testing of standard and automated deposit mechanisms into DSpace-based repositories. This work has allowed us to further our knowledge and expertise on scholarly and research platforms by using well adopted repository platforms (DSpace) in a new area: open access books and monographs. It has also provided us with the opportunity to test the implementation of additional infrastructure to support discovery, access, and dissemination of such open access content, and potentially experiment with other types of scholarly work. 

What’s next 

Now that the repository platform is live, we would like to gather insights about volume of content, required storage and staff resources (both infrastructure and user support). This will help us estimating associated costs for provision of such a service as well as preservation costs for the longer term, during the 3-year pilot.  

In terms of long-term preservation, we will explore several preservation options, including preserving the content in-house as part of the Libraries’ wider Digital Preservation Programme. The types of material hosted in this platform can provide an exemplary use case of scholarly content that is “preservation ready”, uses open and standard file formats (i.e., PDF and epub) and is accompanied by rich, high quality descriptive metadata. 

See this post by the Open Book Futures Team for more details about the pilot:  

https://copim.pubpub.org/pub/thoth-archiving-network-goes-live-at-university-of-cambridge/release/1

Open access: fringe or mainstream?

When I was just settling in to the world of open access and scholarly communication, I wrote about the need for open access to stop being a fringe activity and enter the mainstream of researcher behaviour:

“Open access needs to stop being a ‘fringe’ activity and become part of the mainstream. It shouldn’t be an afterthought to the publication process. Whether the solution to academic inaction is better systems or, as I believe, greater engagement and reward, I feel that the scholarly communications and repository community can look forward to many interesting developments over the coming months and years.”

While much has changed in the five years since I (somewhat naïvely) wrote those concluding thoughts, there are still significant barriers towards the complete opening of scholarly discourse. However, should open access be an afterthought for researchers? I’ve changed my mind. Open access should be something researchers don’t even need to think about, and I think that future is already here, though I fear it will ultimately sideline institutional repositories.

According to the 2020 Leiden Ranking, the median rate at which UK institutions make their research outputs open access is over 80%, which is far higher than any other nation (Figure 1). Indeed, the UK is the only country that has ‘levelled up’ over the last five years, while the rest of the world’s institutions have slowly plodded along making slow, but steady, progress.

Figure 1. The median institutional open access percentage for each country according to the Leiden Ranking. Note, these figures are medians of all institutions within a country. This does not mean that 80% of the UK’s publications are open access, but that the median rate of open access at UK institutions is 80%.

The main driver for this increase in open access content in the UK is through green open access (Figure 2), due in large part to the REF 2021 open access policy (announced in 2014 and effective from 2016). This is a dramatic demonstration of the influence that policy can have on researcher behaviour, which has made open access a mainstream activity in the UK.

Figure 2. The median institutional green open access percentage for each country according to the Leiden Ranking.

Like the rest of the UK, Cambridge has seen similar trends across all forms of open access (Figure 3), with rising use of green open access, and steadily increasing adoption of gold and hybrid. Yet despite all the money poured into gold and (more controversially) hybrid open access, the net effect of all this other activity is a measly 3% additional open access content (82% vs 79%). Which begs the question, was it worth it? If open access can be so successfully achieved through green routes, what is the inherent benefit of gold/hybrid open access?

Figure 3. Open access trends in Cambridge according to the Leiden Ranking. In the 2020 ranking, 79% was delivered through green open access. This means that despite all the work to facilitate other forms of open access, this activity only contributed an additional 3% to the total (82%).

Of course, Plan S has now emerged as the most significant attempt to coordinate a clear and coherent international strategy for open access. While it is not without its detractors, I am nonetheless supportive of cOAlition S’s overall aims. However, as the UK scholarly communication community has experienced, policy implementation is messy and can lead to unintended consequences. While Plan S provides options for complying through green open access routes, the discussions that institutions and publishers (both traditional and fully open access alike) have engaged in are almost entirely focussed on gold open access through transformative deals. This is not because we, as institutions, want to spend more on publishing, but rather it is the pragmatic approach to create open access content at the source and provide authors with easy and palatable routes to open access. It also is a recognition that flipping journals requires give and take from institutions and publishers alike.

We are now very close to reaching a point where open access can be an afterthought for researchers, particularly in the UK. In large part, it will be done for them through direct agreements between institutions and publishers. Cambridge already has open access publishing arrangements with over 5000 journals, and this figure will continue to grow as we sign more transformative agreements. However, this will ultimately be to the detriment of green open access. Instead of being the only open access source for a journal article, institutional repositories will instead become secondary storehouses of already gold open access content. The heyday of institutional repositories, if one ever existed, is now over.

For me, that is a sad thought. We have poured enormous resource and effort into maintaining Apollo, but we must recognise the burden that green open access places on researchers. They have better things to do. I expect that the next five years will see a dramatic increase in gold and hybrid open access content produced in the UK. Green open access won’t go away, but we will have entered a time where open access is no longer fringe, nor indeed mainstream, but rather de facto for all research.

Published 23 October 2020

Written by Dr Arthur Smith

This icon displays that the content of this blog is licensed under CC BY 4.0

A Fast-Track Route to Open Access

In the last two years, since the REF 2021 open access policy came into force, the Open Access Team has received an ever increasing number of manuscript submissions for archiving in Apollo, Cambridge’s institutional open access repository.

We have been thinking long and hard about ways to cope with the workload, by scrutinising existing practices and streamlining workflows, because we want to provide the best possible service to our researchers, commensurate with the University’s world leading research.

This blog introduces what is perhaps the greatest overhaul of our workflows since the service began: a new ‘Fast Track’ deposit system.

Work it harder

Before the start of the REF OA policy (2014-2016), the Open Access Team would process and manually curate every manuscript submission we received. Authors could expect an initial response within 1-2 working days, after which (usually within a month) we would archive their manuscript in Apollo.

A simplified workflow for a typical manuscript was:

  1. Manuscript uploaded by submitter in Symplectic Elements.
  2. Item created in Apollo (DSpace) workflow
  3. Helpdesk ticket created (Zendesk).
  4. Open Access Team reviews manuscript, advises submitter and makes a decision.
  5. Open Access Team archives the manuscript in Apollo and informs submitter.

Both the decision (4) and archive (5) steps take time. For each manuscript we would need to decide whether the files we received could be archived, what funder open access policies were at play and the open access options available from the publisher. We could then advise authors about their open access choices.

To archive a manuscript the process was broadly the following:

  1. Review the helpdesk ticket (Zendesk) for the open access decision.
  2. Enter as many publication details as possible in Symplectic Elements.
  3. Retrieve the submission from the Apollo (DSpace) deposit workflow.
  4. Add licence and metadata to the record.
  5. Review the submission and approve for archiving.
  6. Move the item to the relevant departmental collection and apply an appropriate embargo (if required).
  7. Finally, update the helpdesk ticket and send the original submitter a link to their Apollo record.

Each manuscript took on average 18 minutes to archive, which, besides being manually tedious and prone to error, was extremely time-consuming. Add to this the time required to make the initial decision and each manuscript submission could easily take 30 minutes for the Open Access Team to fully process from start to finish, especially if an open access fee had to be paid.

Fast-forward two years and with the rate of new manuscript submissions now peaking at over 1,300 per month, simply processing manuscripts for the REF would require more than four full-time staff members. Whilst these manual processes were viable for a handful of submissions a day, they became unwieldy at scale.

Make it better

Our first attempt at speeding up our open access system began in August 2017. To start we made a number of operational changes to reduce the time spent processing manuscript submissions:

  • We would rely entirely on the metadata present in Symplectic Elements to populate the Apollo records (i.e. we would not curate manual records).
  • The Open Access Team would no longer update the helpdesk records, instead internal record keeping would be automated as much as possible.

Unfortunately, the number of steps in the Apollo workflow was still roughly the same as the previous process, but with one key difference: a new field to record what we call the ‘Fast Track’ decision. There were seven Fast Track options:

  • Submitted
  • Proof
  • Published (not open access)
  • Published (open access)
  • Accepted (published)
  • Accepted (not published)
  • Other

The first six options represent the vast bulk of all manuscripts received by the Open Access Team, and ‘Other’ option simply acts as a catch-all for anything else. By simply knowing what sort of manuscript has been uploaded much of the decision and archiving process can be automated. However, the agent still needed to retrieve the item from the Apollo workflow, check the version of the file and publication status of the paper, add some metadata fields, approve the item, and move it to an appropriate collection.

Figure 1. The Apollo workflow page of a typical manuscript submission, with the addition of the new ‘Fast Track’ field.

The choice of Fast Track decision leads to four possible outcomes which would ‘trigger’ actions in our Zendesk helpdesk:

  • Submitted, proof, published (not open access)
    • Email submitter, ask for accepted manuscript
  • Published (open access)
    • Archive in Apollo (no embargo) ⇒ Email submitter Apollo link
  • Accepted (published), accepted (not published)
    • Archive in Apollo (embargoed) ⇒ Email submitter Apollo link
  • Other
    • Refer to Open Access Team

Despite being a much faster process, it was still manually tedious. It could also require up to 33 actions from agents (29 mouse clicks) and 14 web pages to be loaded, still not very user friendly. However, the time to archive had decreased from 18 to 9 minutes – a 50% reduction from the previous fully manual system.

Do it faster

So what if all the steps involved in processing a manuscript submission could be reduced to the absolute minimum, and be actionable within a single webpage? After a short development sprint, the Open Access Team launched the ‘Fast Track Deposits’ interface last September. A snapshot of the user interface is shown below.

Figure 2. The Fast Track interface. Choosing one of the options in blue is enough to fully archive a manuscript, or process it for further action by the submitter or the Open Access Team.

At the top of the page, the agent can see a ‘publication summary’ including the item title, the journal title, and publisher DOI if available. Both the item title and publisher DOI are hyperlinked, so that the agent can Google-search the item or land on the publisher’s webpage with a single mouse click.

The agent must first inspect the file and check that it is a suitable version (i.e. either the accepted version or the open access published version). If wrongly labelled, they must relabel the file via a dropdown menu, and add/delete files as appropriate. The agent then ‘describes’ the manuscript (i.e. decides whether it is the accepted, published, submitted or proof version) and submits their decision. The decision determines the trigger behaviour in the automatically populated helpdesk ticket. The agent is then free to move on to the next item.

If the decision is ‘accepted’ or ‘published open access’, the item is deposited and the submitter is automatically notified via email. For submitted, proof, and non-OA published versions, the author receives an automatic email asking for the accepted manuscript. Items are archived in the repository under a generic collection, and any forthcoming publication details are added to the record via external source information in Elements.

To see just how efficient Fast Track is we’ve prepared a short demonstration video which captures some of the key features:

Video 1. Real-time demonstration of the Fast Track system.

Makes us stronger

Agents therefore need only make one decision: identify the file version. But the real ingenuity of the Fast Track system is that embargoes can be set automatically by:

  1. Taking into account the decision made by the agent (e.g. no embargo if published open access);
  2. Detecting publication status and publication dates from Elements; and
  3. Retrieving journals’ embargo policies via Orpheus (you can learn more about Orpheus in our previous blog post).

In some cases, usually because we don’t know the publication date, we can’t determine the embargo length of an accepted manuscript. In such cases we apply a 36 month embargo from the date of the Fast Track decision. We know that this embargo won’t always be correct, however, we routinely check manuscripts in Apollo and update embargoes accordingly.

Figure 3. Simplified overview of the Fast Track process. The key decision is to determine the type of manuscript that has been submitted. Everything else is handled automatically.

Since launching Fast Track the average time to process a manuscript is 1-2 minutes. More than 8,000 items have been processed since launching the phase two Fast-Track interface. If items processed under the phase one effort are included, the number goes up to just over 14,000. And since a picture speaks a thousand words, Figure 4 below shows the effect produced by the new interface launched in September on our backlog of unprocessed submissions.

Figure 4. Historical change in the number of unprocessed open access manuscript submissions. The total number of outstanding manuscript submissions peaked at nearly 2,400 in September 2018. Immediately after launching the Fast Track website the backlog dropped dramatically and was completely eliminated by March 2019.

We will continue to develop Fast Track to further streamline our processing of manuscripts. We have already started to partner with librarians and administrators across the University to leverage the collective knowledge about open access which now exists within the University’s professional academic services.

Get in contact: If you are running a DSpace repository and would like to implement Fast Track to work alongside your existing workflows email us at support@repository.cam.ac.uk

Published 23 April 2019
Written by Dr Mélodie Garnier and Dr Arthur Smith
Creative Commons License

Where are we now? Cambridge theses deposits one year in

As the nights draw in and the academic year 2018/19 begins, we are preparing to enter our second year of compulsory e-theses deposits. Our university repository, Apollo, is close to holding 6000 digital PhD theses and it is the intention of the University that this valuable research asset continues to grow into the future. The Apollo repository will play a large part in making this happen. Until recently only hardbound copies of theses were collected and catalogued by the University Library. Users could read theses on-site in Cambridge or order a digitisation of the thesis, but the introduction of e-thesis deposit to Apollo has meant that University of Cambridge theses are more accessible than ever before. It’s been an incredibly busy year and we have made some great steps forward in our management of theses in Cambridge.

e-theses at Cambridge – the background

The e-theses deposit story at Cambridge started in October 2016, when the Office of Scholarly Communication upgraded Apollo to allow the deposit of theses and began a digital thesis pilot for the academic year 2016/17. 11 departments in the University participated in the pilot, asking their PhD students to deposit an e-thesis alongside a hardcopy thesis. Theses deposited in Apollo during the pilot could either be made open access on request of the author or were treated as historical theses had been up until that point, whereby hardbound copies were held in the University Library and requestors could sign a declaration stating they wish to consult a thesis for private study or non-commercial research. Following the success of the pilot, the Board of Graduate Studies, at its meeting on 4 July 2017, made the decision that from 1 October 2017 all PhD students would be required to deposit both a hard copy and an electronic copy of their thesis to the University Library.

What we learnt during the academic year 2017/18

The experience of depositing theses during the pilot had highlighted some issues that needed addressing. We had to make decisions on how to deal with third party copyright, sensitive material, library copy and supply rules, and the alignment of access levels for hardbound and electronic theses. In response to this, we decided that we should think through each of the different ways in which a thesis could be deposited in the repository, and consider the range of contentious material that could be contained within a thesis.

How do theses enter the repository?

Whilst students that are depositing in order to graduate do this directly, we also have the capacity to scan theses on request here in the library, and these scanned theses are subsequently deposited in Apollo. In addition to this, we led a drive to digitise University of Cambridge theses held by the British Library on microfilm and gave alumni the option to digitise their thesis and make it open access at no cost to them.

British Library theses

This year the OSC has made a bulk deposit of theses scanned by the British Library, which significantly augments the number of theses stored in the repository. In the culmination of a two-year project, nearly 1300 additional Cambridge PhD theses are now available on request in the Apollo repository.

Prior to being made available in the repository, these Cambridge theses were held on microfilm at the British Library. They date from the 1960s through to 2008, when digitisation took over from microfilm as a means of document storage. The British Library holds 14,000 Cambridge PhD theses on microfilm; in 2016 they embarked on a project with the OSC to digitise ten percent of the collection at low cost – read more about this in an earlier post, Choosing from a cornucopia: a digitisation project.

You can explore the collection in Apollo: Historical Digital Theses: British Library collection.  The theses are under controlled access, which means they are available on request for non-commercial research purposes, subject to a £15 admin fee.

Establishing access levels

We established that the level of access we could allow to the thesis could be determined by the route a thesis entered the repository, its content, or in some cases the author’s wish to publish. To address all of the potential issues, we decided to define a set of access levels which would determine what we, as managers of the repository, were able to do with a thesis and the way in which it could be accessed by a requestor.

The access levels were put in action in spring 2018 and this was followed by a survey of Degree Committees, conducted by the e-theses working group consisting of members of the University Library and Student Registry. The survey asked for feedback on the suitability of the access levels for research outputs for all departments in the University; the outcome confirmed that the access levels were working and covered the options well, although a few tweaks were needed. In light of the feedback, a set of recommendations was put to the Board of Graduate Studies by the e-theses working group, and these recommendations were considered and accepted at their meeting on 3 July 2018, ready to be put in place for the 2018/19 academic year.

eSales for theses under controlled access

At the same time as we were establishing our access levels, we were also working on devising an eSales process to facilitate the supply of theses under controlled access. Controlled access replicates the way that historical, hardbound theses were managed in the library, with the addition of an electronic version of the thesis being held in the repository, and follows the library copy and supply rules for unpublished works under copyright law. A thesis scanned by the library would be deposited under controlled access so it remains unpublished, but this access level is also available to students depositing their thesis directly. The eSales process we devised went live in July 2018 and this meant a large number of theses held in the repository were made more accessible, including those digitised by the British Library. As of 18 October, we have supplied 14 theses via the eSales route and the requests keep coming in at a steady pace.

Looking forward to the 2018/19 academic year

As we begin the 2018/19 academic year, our theses management is looking in good shape but we will continue to improve and refine our internal and external services. In consultation with the University’s Student Registry we are making the final changes to our deposit forms, access levels and communications and we endeavour to make this academic year the smoothest yet for e-theses management. University of Cambridge theses are more accessible than they have ever been. The collection will grow as more students deposit each year, and the valuable research of PhD students will continue to be disseminated.

Published 25 October 2018
Written by Zoë Walker-Fagg
Creative Commons License

How open is Cambridge? 2017 edition

Welcome to Open Access Week 2017. The Office of Scholarly Communication at Cambridge is celebrating with a series of blog posts, announcements and events. In today’s blog post we revisit the question about the openness of Cambridge. 

For Open Access week last year I looked at how open Cambridge was using the extremely useful Lantern tool, developed by Cottage Labs, and which is the basis of the Wellcome Trust’s compliance tool. If you haven’t used it before, Lantern takes a list of DOIs, PMIDs, or PMCIDs and runs these through a variety of sources to try and determine the Open Access status of the publication. I found that, for publications in 2015, 51.8% of all of Cambridge’s research publications were available in at least one ‘Open Access’ source. How did Cambridge’s 2016 publications fair? Read on to find out.

Using the same method as last year, I first obtained a list of DOIs from Web of Science (n=9416) and Scopus (n=9124) for articles, proceedings papers and reviews published in 2016. Combining and deduplicating these lists returned 10,674 unique DOIs (~29 publications/day). I also refreshed the 2015 publication data using the latest Web of Science and Scopus information, which returned 10,090 unique DOIs. Year-on-year, this represents a 5.8% increase in the total number of publications attributable to Cambridge – more than inflation!

The deduplicated DOI lists for 2015 and 2016 (20,764 DOIs in total) were fed into Lantern and analysed in combination with information from Web of Science and the University’s institutional repository Apollo.

Figure 1. Distribution of papers, published in 2015 and 2016 which have a DOI, according to the Open Access sources they can be found in. 57.5% of 2016’s articles appear in at least one Open Access source, which represents a 4% increase over 2015. One third of all papers published in 2016 are available in Apollo.

Very pleasingly the percentage of publications available in at least one Open Access source increased to 57.5% in 2016 compared to only 53.4% for 2015 publications. Given that the total number of publications also increased during this period this result is doubly exciting. In raw numbers, this means that while 5384 publications were Open Access in 2015, an impressive 6135 publications were made Open Access in 2016.

Most of this increase can be attributed to the much larger share of publications that appear in Apollo, which is now the largest source of Open Access material for the University of Cambridge. An additional 822 publications were deposited in Apollo in 2016 compared to 2015, which is a 30% increase in one year alone.

You can now find more of the University’s research outputs in Apollo than in any other Open Access source. And because we operate an extremely popular Request a Copy service, potentially all of the publications held in Apollo, even those that are restricted and under embargo, are available to anyone in the world. You just need to ask.

Published 23 October 2017
Written by Dr Arthur Smith
Creative Commons License

Milestone -1000 datasets in Cambridge’s repository

Last week, Cambridge celebrated a huge milestone – the deposit of the 1000th dataset to our repository Apollo since the launch of the Research Data Facility in early 2015. This is the culmination of a huge amount of work by the team in the Office of Scholarly Communication, in terms of developing systems, workflows, policies and through an extensive advocacy campaign. The Research Data team have run 118 events over the past couple of years and published 39 blogs.

In the past 12 months alone there have been 26000 downloads of the data in Apollo. In some cases the dataset has been downloaded many times – 170 – and the data has featured in news, blogs and Twitter.

An event was held at Cambridge University Library last week to celebrate this milestone.

   

Opening remarks

The Director of Library Services, Dr Jess Gardner opened proceedings with a speech where she noted “the Research Data Services and all who sail in her are at the core of our mission in our research library”.

Dr Gardner referred to the library’s long and proud history of collecting and managing research data that “began on vellum, paper, stone and bone”. The research data of luminaries such as Isaac Newton and Charles Darwin was on paper and, she noted “we have preserved that with great care and share it openly on line through our digital library.”

Turning to the future, Dr Gardner observed: “But our responsibility now is today’s researcher and today’s scientists and people working across all disciplines across our great university. Our preservation stewardship of that research data from the digital humanities across the biomedical is a core part of what we now do.”

“In the 21st century our support and our overriding philosophy is all about supporting open research and opening data as widely as possible,” she noted.  “It is about sharing freely wherever it is appropriate to do so”. [Dr Gardner’s speech is in full at the end of this post.]

Perspectives from a researcher

The second speaker was Zoe Adams, a PhD student at Cambridge who talked about the work she has done with Professor Simon Deakin on the Labour Regulation Index in association with the Centre for Business Research.

Ms Adams noted it was only in retrospect she could “appreciate the benefit of working in a collaborative project and open research generally”. She discussed how helpful it had been as an early career researcher to be “associated with something that was freely available”. She observed that few of her peers had many citations, and the reason she did was because “the dataset is online, people use the data, they cite the data, and cite me”.

Working openly has also improved the way she works, she explained, saying “It has given me a new perspective on what research should be about. …  It gives me a sense that people are relying on this data to be accurate and that does change the way you approach it.”

View from the team

The final speaker was Dr Lauren Cadwallader, Joint Deputy Head of the OSC with responsibility for the Research Data Facility, who discussed the “showcase dataset of the data that we can produce in the OSC” which is  taken from usage of our Request a Copy service.

Dr Cadwallader noted there has been an increase in the requests for theses over time. “This is a really exciting observation because the Board of Graduate studies have agreed that all students should deposit a digital copy of their thesis in our repository,” she said. “So it is really nice evidence that we can show our PhD students that by putting a copy in the repository people can read it and people do want to read theses in our repository.”

One observation was that several of the theses that were requested were written 60 years ago, so the repository is sharing older research as well. The topics of these theses covered algebra, Yorkshire evangelists and one of the oldest requested theses was written in 1927 about the Falkland Islands. “So there is a longevity in research and we have a duty to provide access to that research, ” she said.

Thanks go to…

The dataset itself is one created by the OSC team looking at the usage of our Request a Copy service. The analysis undertaken by Peter Sutton Long and we recently published a blog post about the findings.

The music played at the event was complied by Tony Malone and covers almost 1000 years of music, from Laura Cannell’s reworking of Hildegard of Bingen, to Jane Weaver’s Modern Cosmology. There are acknowledgments to Apollo, and Cambridge too. The soundtrack is available for those interested in listening.

This achievement is entirely due to the incredible work of the team in the Research Data Facility and their ability to engage with colleagues across the institution, the nation and the world. In particular the vision and dedication of Dr Marta Teperek cannot be understated.

In the words of Dr Gardner: “They have made our mission different, they have made our mission better, through the work they have achieved and the commitment they have.”

The event was supported by the Arcadia Fund, a charitable fund of Lisbet Rausing and Peter Baldwin.

 

 

Published 21 September 2017
Written by Dr Danny Kingsley
Creative Commons License

Speech by Dr Jess Gardner

First let us begin with some headline numbers. One thousand datasets. This is hugely significant and a very high level when looking at research repositories around the country. There is every reason to be proud of that achievement and what it means for open research.

There have been 26000 downloads of that data in the past 12 months alone – that is about use and reuse of our research data and is changing the face of how we do research. Some of these datasets have been downloaded 117 times and used in news, blogs and Twitter. The Research Data team have written 39 blogs about research data and have run 118 events, most of these have been with researchers.

While the headline numbers give us a sense of volume, perhaps let’s talk about the underlying rationale and philosophy behind this, which is core.

Cambridge University Library has a 600 year old history we are very proud of. In that time we have had an abiding responsibility to collect, care for and make available for use and reuse, information and research objects that form part of the intrinsic international scholarly record of which Cambridge has been such a strong part. And the ability for those ideas to inspire new ideas. The collection began on vellum, paper, and stone and bone.

And today much of that of course is digital. You can’t see that in the same way you can see the manuscripts and collections. It is sometimes hard to grasp when we are in this grand old dame of a building that I dare you not to love. It is home to the physical papers of such greats as Isaac Newton and Charles Darwin. Their research data was on paper and we have preserved that with great care and share it openly on line through our digital library. But our responsibility now is today’s researcher and today’s scientists and people working across all disciplines across our great university. Our preservation stewardship of that research data from the digital humanities across the biomedical is a core part of what we now do.

And the people in this room have changed that. They have made our mission different, they have made our mission better through the work they have achieved and the commitment they have.

Philosophically this is very natural extension of what we have done in the Library and the open library and its great research community for which this very building is designed. Some of you may know there is a philosophy behind this building and the famous ‘open library Cambridge’. In the 19th century and 20th century that was mostly about our open stack of books and we have quite a few of them, we are a little weighed down by them.

Our research data weighs less but it is just as significant and in the 21st century our support and our overriding philosophy is all about supporting open research and opening data as widely as possible. It is about sharing freely wherever it is appropriate to do so and there are many reasons why data isn’t open sometimes, and that is fine. What we are looking for is managing so we can make those choices appropriately, just as we have with the archive for many, many years.

So whilst as there is a fantastic achievement to mark tonight with those 1000 datasets it really is significant, we are really celebrating a deeper milestone with our research partners, our data champions, our colleagues in the research office and in the libraries across Cambridge, and that is about the changing role in research support and library research support in the digital age, and I think that is something we should be very proud of in terms of what we have achieved at Cambridge. I certainly am.

I am relatively new here at Cambridge. One of the things that was said to me when I was first appointed to the job was how lucky I was to be working at this University but also with the Office of Scholarly Communication in particular and that has proved to be absolutely true. I like to take this opportunity to note that achievement of 1000 datasets and to state very publicly that the Research Data Services and all who sail in her are at the core of our mission in our research library. But also to thank you and the teams involved for your superb achievements. It really is something to be very proud of and I thank you.

 

Who is requesting what through Cambridge’s Request a Copy service?

In October last year we reported on the first four months of our Request a Copy service. Now, 15 months in, we have had over 3000 requests and this provides us with a rich source of information to mine about the users of our repository.  The dataset underpinning the findings described here is available in the repository.

What are people requesting?

We have had 3240 requests through the system since its inception in June 2016. Of those the vast majority have been for articles 1878 (58%) and theses 1276 (39%). The remaining requests are for book chapters, conference objects, datasets, images and manuscripts. It should be noted that most datasets are available open access which means there is little need for them to be requested.

Of the 23 requests for book chapters, it is perhaps not surprising that the greatest number  – 9 (39%) came for chapters held in the collections from the School of Humanities and Social Sciences. It is however possibly interesting that the second highest number – 7 (30%) came for chapters held in the School of Technology.

The School of Technology is home to the Department of Engineering which is the University’s largest department. To that end it is perhaps not surprising that the greatest number of articles requested were from Engineering with 311 of the 1878 requests (17%) from here. The areas with next most requested number of articles were, in order, the Department for Public Health and Primary Care, the Department of Psychiatry, the Faculty of Law and the Judge Business School.

What’s hot?

Over this period we have seen a proportional increase in the number of requests for theses compared to articles. When the service started the requests for articles were 71% versus 29% for theses. However more recently, theses have overtaken request for articles to a ratio of 54% to 46%.

The most requested thesis, by a considerable amount, over this period was for Professor Stephen Hawking’s thesis with double the number of requests of the following ten most requested theses. The remaining top 10 requested theses are heavily engineering focused, with a nod to history and social research. These theses were:

The top 10 requested articles have a distinctly health and behavioural focus, with the exception of one legal paper authored by Cambridge University’s Pro Vice Chancellor for Education, Professor Graham Virgo.

When are people requesting?

Looking at the day of the week people are requesting items, there is a distinct preference for early in the week. This reflects the observations we have made about the use of our helpdesk and deposits to our service – both of which are heaviest on Tuesdays.

When in the publication cycle are the requests happening?

In our October 2016 blog we noted that of the articles requested in the four months from when the service started in June 2016 to the end of September 2016, 45% were yet to be published, and 55% were published but not yet available to those without a subscription to the journal.  The method we used for working this out involved identifying those articles which had been requested and determining if the publication date was after the request.

Now, 15 months after the service began it is slightly more difficult to establish this number. We can identify items that were deposited on acceptance because we place these items on a very long embargo (until 2100) until we can establish the publication date and set the embargo period. So in theory we could compare the number of articles with this embargo period against those that have a different date.

However articles that would provide a false positive (that appear to have been requested before publication) would be ones which had been published but we had not yet identified this – to give an indication of how big an issue this is for us, as of the end of last week there were 1768 articles in our ‘to be checked’ pile. We would also have articles that would provide a false negative (that appear to have been requested after publication) because they had been published between the request and the time of the report and the embargo had been changed as a result. That said, after some analysis of the requests for articles and conference proceedings, 19% are before publication. This is a slightly fuzzy number but does give an indication. 

How many requests are fulfilled?

The vast majority of the decisions recorded (35% of the total requests for articles, but 92% of the instances where we had a decision) indicate that the requestor shared their article with the requestor. The small number (3%) of  ‘no’ recordings we have indicate the request was actively rejected.

We do not have a decision recorded from the author in 62% of the requests. We suspect that in the majority of these the request simply expires from the author not doing anything. In some cases the author may have been in direct correspondence with the requestor. We note that the email that is sent to authors does look like spam. In our review of this service we need to address this issue.

Next steps

As we explained in October, the process for managing the requests is still manual. As the volume of requests is increasing the time taken is becoming problematic. We estimate it is the equivalent of 1 person day per week. We are scoping the technical requirements for automating these processes. A new requirement at Cambridge for the deposit of digital theses means there will be three different processes because requests for these theses will be sent to the author for their decision. These authors will, in most cases, no longer be affiliated with Cambridge. Requests for digitised theses where we do not have the author’s permission are processed within the Library and requests for articles are sent to the Cambridge authors.

Given the challenges with identifying when in the publication process the request has been made, we need to look at automating the system in a manner that allows us to clearly extract this information. The percentage of requests that occur before publication is a telling number because it indicates the value or otherwise of having a policy of collecting articles at the acceptance point rather than at publication.

Published 12 September 2017
Written by Dr Danny Kingsley
Creative Commons License

2016 – that was the year that was

 In January last year we published a blog post ‘2015 that was the year that was‘ which not only helped us take stock about what we have achieved, but also was very well received. So we have decided to do it again. For those who are more visually oriented, the slides ‘The OSC a lightning Tour‘ might be useful. 

Now starting its third year of operation, the Office of Scholarly Communication (OSC) has expanded to a team of 15, managing a wide variety of projects. The OSC has developed a set of strategic goals  to support its mission: “The OSC works in a transparent and rigorous manner to provide recognised leadership and innovation in the open conduct and dissemination of research at Cambridge University through collaborative engagement with the research community and relevant stakeholders.”

1. Working transparently

The OSC maintains an active outreach programme which fits with the transparent manner of the work that the OSC undertakes, which also includes the active documentation of workflows.

One of the ways we work transparently is to share many of our experiences and idea through this blog which receives over 2,000 visits a month. During 2016 the OSC published 41 blogs – eight blogs each on Scholarly Communication and Open Research, 14 on Open Access,  nine on Research Data Management and two on Library and training matters. The blogs we published in Open Access week were accessed 1630 times that week alone.

In addition to our websites for Scholarly Communication and Open Access, our Research Data Management website has been identified internationally as best practice and receives nearly 3,000 visitors a month.

We also run a Twitter feed for both Open Access with 1100 followers, and Open Data with close to 1200 followers. Many of the OSC staff also run their own Twitter feeds which share professional observations.

We also publish monthly newsletters, including one on scholarly communication matters. Our research data management newsletter has close to 2,000 recipients. Our shining achievement for the year however has to be the hugely successful scholarly communication Advent Calendar (which people are still accessing…)

We practise what we preach and share information about our work practices such as our reports to funders on APC spend and so on, through our repository Apollo and also by blogging about it – see Cambridge University spend on Open Access 2009-2016. We also share our presentations through Apollo and in Slideshare.

2. Disseminating research

The OSC has a strong focus on research support in all aspects of the scholarly communication ecosystem, from concept, through study design, preparation of research data management plans, decisions about publishing options and support with the dissemination of research outputs beyond the formal literature. The OSC runs an intense programme of advocacy relating to Open Access and Research Data Management, and has spoken to nearly 3,000 researchers and administrators since January 2015.

2.1 Open Access compliance

In April 2016, the HEFCE policy requiring that all research outputs intended to be claimed for the REF be made open access came into force. As a result, there has been an increased uptake of the Open Access Service with the 10,000th article submitted to the system in October. Our infographics on Repository use and Open Access demonstrate the level of engagement with our services clearly.

Currently half of the entire research output of the University is being deposited to the Open Access Service each month (see the blog: How open is Cambridge?). While this is good from a compliance perspective, it has caused some processing issues due to the manual nature of the workflows and insufficient staff numbers. At the time of writing, there is a deposit backlog of over 600 items to put into the repository and a backlog of over 2,300 items to be checked if they have been published so we can update the records.

The OA team made over 15 thousand ticket replies in 2016 – or nearly 60 per work day!

2.2 Managing theses

Work on theses continues, with the OSC driving a collaboration with Student Services to pilot the deposit of digital theses in addition to printed bound ones with a select group of departments from January 2017. The Unlocking Theses project in 2015-2016 has seen an increase in the number of historic theses in the repository from 700 to over 2,200 with half openly available. An upcoming digitisation project will add a further 1,400 theses. The upgrade of the repository and associated policies means all theses (not just PhDs) can be deposited and the OSC is in negotiation with several departments to bulk upload their MPhils and other sets of theses which are currently held in closed collections and are undiscoverable. This is an example of the work we are doing to unearth and disseminate research held all over the institution.

As a result of these activities it has become obvious that the disjointed nature of thesis management across the Library is inefficient. There is considerable effort being placed on developing workflows for managing theses centrally within the Library which the OSC will be overseeing into the future.

3. Research Support

3.1  Research Data Support

The number of data submissions received by the University repository is continuously growing, with Cambridge hosting more datasets in the institutional repository than any other UK university. Our ‘Data Sharing at Cambridge’ infographic summarises our work in this area.

A recent Primary Research Group report recognised Cambridge as having ‘particularly admirable data curation services’.

3.2 Policy development

The OSC is heavily involved in policy development in the scholarly communication space and participates in several activities external to the University. In July 2016 the UK Concordat on Open Research Data was published, with considerable input from the university sector, coordinated by the OSC.

We have representatives on the RCUK Open Access Practitioners Group, the UK Scholarly Communication License and Model Policy Steering Committee and the CASRAI Open Access Glossary Working Group, plus several other committees external to Cambridge. The OSC has contributed to discussions at the Wellcome Trust about ensuring better publisher compliance with their Open Access policy.

We are also updating and writing policies for aspects of research management across the University.

3.3 Collaborations with the research community

The OSC collaborates directly with the research community to ensure that the funding policy landscape reflects their needs and concerns. To that end we have held several town-hall meetings with researchers to discuss issues such as the mandating of CC-BY licensing, peer review and options relating to moving towards an Open Research landscape. We have also provided opportunities for researchers to meet directly with funders to discuss concerns and articulate amendments to the policies. The OSC has led discussions with the sector and arXiv.org, including visiting Cornell University, to ensure that researchers using this service to make their work openly available can be compliant under the HEFCE policy.

A new Research Data Management Project Group brings researchers and administrators together to work on specific issues relating to the retention and preservation of data and the management of sensitive data. We have also recruited over 40 Data Champions from across the University. Data Champions are researchers, PhD students or support staff who have agreed to advocate for data within their department: providing local training, briefing staff members at departmental meetings, and raising awareness of the need for data sharing and management.

The initiative began as an attempt to meet the growing need for RDM training, provide more subject-specific RDM support and begin more conversations about the benefits of RDM beyond meeting funders’ mandates. There has been a lot of interest in our Data Champions from other universities in the UK and abroad, with applications for our scheme coming from around the world. In response to this we have proposed a Bird of a Feather session at the 9th RDA plenary meeting in April to discuss similar initiatives elsewhere and creating RDM advocacy communities.  

3.3 Professional development for the research community

The OSC provides the research community with a variety of advocacy, training and workshops relating to research data management, sharing research effectively, bibliometrics and other aspects of scholarly communication. The OSC held over 80 sessions for researchers in 2016, including the extremely successful ‘Helping researchers publish’ event which we are repeating in February.

Our work with the Early Career Research (ECR) community has resulted in the development of a series of sessions about the publishing process for the PhD community. These have been enthusiastically embraced and there are negotiations with departments about making some courses compulsory. While this underlines the value of these offerings it does raise issues about staffing and how this will be financed.

The OSC is increasingly managing and hosting conferences at the University. Cambridge is participating in the Jisc Shared Repositories pilot and the OSC hosted an associated Research Data Network conference in September. In July 2016, the OSC organised a conference on research data sharing in collaboration with the Science and Engineering South Consortium, which was extremely well received and attracted over 80 attendees from all over the UK.

In November, the OpenCon Cambridge group – with which the OSC is heavily involved – held a OpenConCam satellite event which was very well attended and received very positive feedback. The storify of tweets is available, as is this blog about the event. The OSC was happy to both be a sponsor of the event and to be able to support the travel of a Cambridge researcher to attend the main OpenCon event in Washington and bring back her experiences.

Increasingly we are livestreaming our events and then making them available online as a resource for later.

3.4 Developing Library capacity for support

We have published a related post which details the training programmes run for library staff members in 2016. In total 500 people attended sessions offered in the Supporting Researchers in the 21st century programme, and we successfully ‘graduated’ the second tranche of the Research Support Ambassador Programme.

Conference session proposals on both the Supporting Researchers and the Research Ambassador programmes have been submitted to various national and international conferences. Dr Danny Kingsley and Claire Sewell have also had an abstract accepted for an article to appear in the 2017 themed issue of The New Review of Academic Librarianship.

4. Updating and integrating systems

The University repository, Apollo has been upgraded and was launched during Open Access Week. The upgrade has incorporated new services, including the ability to mint DOIs which has been enthusiastically adopted. A new Request a Copy service for users wishing to obtain access to embargoed material is being heavily used without any promotion, with around 300 requests a month flowing through. This has been particularly important given the fact that we are depositing works prior to publication, so we have to put them under an infinite embargo until we know the publication date (at which time we can set the embargo lift date). The huge number of over 2,000 items needing to be checked for  publication date means a large percentage of the contents of the repository is discoverable but closed under embargo.

In order to reduce the heavy manual workload associated with the deposit and processing of over 4,000 papers annually, the OSC is working with the Research Information Office on a systems integration programme between the University’s CRIS system – Symplectic – and Apollo, and retaining our integrated helpdesk system which uses a programme called ZenDesk. This should allow better compliance reporting for the research community, and reduce manual uploading of articles.

But this process involves a great deal more than just metadata matching and coding, and touches on the extremely ‘silo’ed nature of the support services being offered to our researchers across the institution. We are trying to work through these issues by instigating and participating in several initiatives with multiple administrative areas of the University.  The OSC is taking the lead with a ‘Getting it Together’ project to align the communication sent to researchers through the research lifecycle and across the range of administrative departments including Communication, Research Operations, Research Strategy and University Information Systems, termed the ‘Joined up Communications’ group. In addition we are heavily involved in the Coordinated and Functional Research Systems Group (CoFRS) the University Research Administration Systems Committee and the Cambridge Big Data Steering Group.

5. Pursuing a research agenda

Many staff members of the OSC originate from the research community and the team have a huge conference presence. The OSC team attended over 80 events in 2016 both within the UK and major conferences worldwide, including Open Scholarship Initiative, FORCE2016, Open Repositories, International Digital Curation Conference, Electronic Thesis & Dissertations, Special Libraries Association, RLUK2016, IFLA, CILIP and Scientific Data Conference.

Increasingly the OSC team is being asked to share their knowledge and experience. In 2016 the team gave four keynote speeches, presented 18 sessions and ran one Master Class. The team has also acted as session chair for two conferences and convened two sessions.

5.1 Research projects

The OSC is undertaking several research projects. In relation to the changing nature of scholarly communication services within libraries, we are in the process of analysing  job advertisements in the area of scholarly communication, we have also conducted a survey (to which we have received over 500 respondents) on the educational and training background of people working in the area of scholarly communication. The findings of these studies will be shared and published during 2017.

Dr Lauren Cadwallader was the first recipient of the Altmetrics Research Grant which she used to explore the types and timings of online attention that journal articles received before they were incorporated into a policy document, to see if there was some way to help research administrators make an educated guess rather than a best guess at which papers will have high impact for the next REF exercise in the UK. Her findings were widely shared internationally, and there is interest in taking this work further.

The team is currently actively pursuing several research grant proposals. Other research includes an analysis of data needs of research community undertaking in conjunction with Jisc.

5.2 Engaging with the research literature

Many members of the OSC hold several editorial board positions including two on the Data Science Journal, and one on the Journal of Librarianship and Scientific Communication. We also hold positions on the Advisory Board for PeerJ Preprints. We have a staff member who is the Associate Editor, New Review of Academic Librarianship . The OSC team also act as peer reviewers for scholarly communication papers.

The OSC is working towards developing a culture of research and publishing amongst the library community at Cambridge, and is one of the founding members of the Centre for Evidence Based Librarianship and Information Practice (C-EBLIP) Research Network.

6. Staffing

Despite the organisational layout remaining relatively stable between 2015 and 2016, this belies the perilous nature of the funding of the Office of Scholarly Communication. Of the 15 staff members, fewer than half are funded from ‘Chest’ (central University) funding. The remainder are paid from a combination of non-recurrent grants, RCUK funding and endowment funds.

The process of applying for funding, creating reports, meeting with key members of the University administration, working out budgets and, frankly, lobbying just to keep the team employed has taken a huge toll on the team. One result of the financial situation is many staff – including some crucial roles – are on short-term contracts and several positions have turned over during the year. This means that a disproportionate amount of time is spent on recruitment. The systems for recruiting staff in the University are, shall we say, reflective of the age of the institution.

In 2016 alone, as the Head of the OSC, I personally wrote five job descriptions and progressed them through the (convoluted) HR review process.  I conducted 32 interviews for OSC staff and participated in 10 interviews for staff elsewhere in the University where I have assisted with the recruitment. This  has involved the assessment of 143 applications. Because each new contract has a probation period, I have undertaken 27 probationary interviews. Given each of these activities involve one (or mostly more) other staff members, the impact of this issue in terms of staff time becomes apparent.

We also conducted some experiments with staffing last year. We have had a volunteer working with us on a research project and run a ‘hotdesk’ arrangement with colleagues from the Research Information Office, the Research Operations Office and Cambridge University Press. We also conducted a successful ‘work from home’ pilot (a first for the University Library).

7. Plans for 2017

This year will herald some significant changes for the University – with a new Librarian starting in April and a new Vice Chancellor in September. This may determine where the OSC goes into the future, but plans are already underway for a big year in 2017.

As always, the OSC is considering both a practical and a political agenda. On the ‘political’ side of the fence we are pursuing an Open Research agenda for the University. We are about to kick off of the two-year Open Research Pilot Project, which is a collaboration between the Office of Scholarly Communication and the Wellcome Trust Open Research team. The Project will look at gaining an understanding of what is needed for researchers to share and get credit for all outputs of the research process. These include non-positive results, protocols, source code, presentations and other research outputs beyond the remit of traditional publications. The Project aims to understand the barriers preventing researchers from sharing (including resource and time implications), as well as what incentivises the process.

We are also now at a stage where we need to look holistically at the way we access literature across the institution. This will be a big project incorporating many facets of the University community. It will also require substantial analysis of existing library data and the presentation of this information in an understandable graphic manner.

In terms of practical activities, our headline task is to completely integrate our open access workflows into University systems. In addition we are actively investigating how we can support our researchers with text and data mining (TDM). We are beginning to develop and roll out a ‘continuum’ of publishing options for the significant amount of grey literature produced within Cambridge. We are also expanding our range of teaching programmes – videos, online tools, and new types of workshops. On a technical level we are likely to be looking at the potential implementation of options offered by the Shared Repository Pilot, and developing solutions for managed access to data. We are also hoping to explore a data visualisation service for researchers.

Published 17 January 2017
Written by Dr Danny Kingsley
Creative Commons License

 

 

Mission Open Access: the Apollo repository launches

IMG_2298To celebrate Open Access Week 2016, the Office of Scholarly Communication (OSC) officially launched ‘Apollo’, the University of Cambridge’s upgraded open access repository.

Researchers, University research staff and librarians gathered at the University’s Engineering Department to see a demonstration of the new features of Apollo, speak to some of the University’s Open Access Champions and raise a glass to launch the service.

The repository stores a range of content and provides different levels of access, but its primary focus is on providing open access to the University’s research publications.  Apollo forms an important part of the University’s provision for meeting research funder requirements for open access, enabling ‘Green’ access to publications.  The launch of the upgrade comes at an exciting time for the Office of Scholarly Communication, as the repository has recently received its 10,000th upload.

The Cambridge University Office of Scholarly Communication looks after all aspects of scholarly communication within the University. This ranges across the entire research lifecycle from searching for information and collaborators, through to authoring and copyright issues and finally the publication and dissemination process, leading into assessment. The OSC has responsibility for the open access and open data programs at the University in terms of compliance with funders’ policies, and delivers and manages the University’s digital repository, Apollo.

Cambridge University was one of a handful of ‘testbed ‘ institutions that participated in the early deployment and development of DSpace, and has been running a DSpace repository for over a decade. Over that time, Apollo has participated in a number of externally funded projects intended to better understand researcher requirements or improve the services it offers. These include: Incremental, DataTrain and PrePARe, which developed resources to support research data management and EPIC and Keeping Research Data Safe (KRDS), which focused on the repository’s preservation services.

IMG_2297Upgraded features

With the support of RCUK, the OSC have spent £43,000 to upgrade the repository. Cambridge is now leading the country by running DSpace Version 5.4, the most recent and most stable version of the application. This has given Apollo a modern and improved user-friendly interface.

Since the upgrade in May 2016, the repository has had close to 2 million views from actual people (not machines!)

The upgrade means we can now increase the services offered by the repository.  Digital Object Identifiers, or DOIs, can be minted in-house. The Open Access team has minted over 6000 DOIs since May for articles, theses, datasets and other research outputs.

In addition, people identifiers – Author ORCIDs – are now displayed in the repository. The repository is interoperable with other systems and sends ORCIDs  to Datacite, which might allow repository items to be automatically populated into Authors’ ORCID profiles in the future.

Perhaps the most exciting integration is with the University’s publication management system Symplectic, allowing for easier reporting of Open Access compliance.

Request a Copy

Part of the upgrade involved the introduction of a new feature called ‘Request a Copy,  designed to open up the University’s most current research to a wider audience.  ‘Request a Copy’ operates on the principle of peer-to-peer sharing – if an item in Apollo is not yet available to the public, a repository user can ask the author for a copy of the item.  Authors sharing copies of their work on an individual basis falls outside the publisher’s copyright restrictions; here, the repository is acting as a facilitator to a process which happens anyway.

The Request a Copy button has been much more successful than we anticipated, particularly because there is no actual ‘button’.  By the end of September 2016 (four months after the introduction of ‘Request a copy’), we had received 1120 requests (approximately 280 requests per month), with two thirds for articles. Apart from a small number of requests for datasets, the remaining third were for theses.

Of the requests for articles during this period, 38% were fulfilled by the author sending a copy via the repository, and 4% were rejected by clicking the ‘Don’t send a copy’ button.

Of the articles requested during this period 45% were yet to be published.  The large number of requests made prior to publication indicates the value of having a policy where articles are submitted to the repository on acceptance rather than publication – there is clearly interest in quickly accessing this research, rather than waiting for publication.

Open Access Week

The Apollo launch was the closing event of Open Access Week at the OSC.  Established by SPARC and partners in the student community in 2008, International Open Access Week is an opportunity to take action in making openness the default for research—to raise the visibility of scholarship, accelerate research, and turn breakthroughs into better lives.  The OSC also released a daily programme of announcements, blog posts and live-streamed events, which are spotlighted on the OA Week webpage, and celebrated this year’s theme of ‘Open in Action’.

Stay in touch with news from the OSC through the monthly newsletter

Published 28 October 2016
Written by Hannah Haines

Creative Commons License

Request a copy: process and implementation

This blog post looks at a recent feature implemented in our repository called ‘Request a copy’ and discusses the process and management of the service. There is a related blog post which discusses the uptake and reaction to the facility.

As part of our recent upgrade to the University’s institutional repository (now renamed ‘Apollo‘), we implemented a new feature called ‘Request a copy’. ‘Request a copy’ operates on the principle of peer-to-peer sharing – if an item in Apollo is not yet available to the public, a repository user can ask the author for a copy of the item. Authors sharing copies of their work on an individual basis falls outside the publisher’s copyright restrictions; here, the repository is acting as a facilitator to a process which happens anyway – peer to peer sharing.

The main advantage of the ‘Request a copy’ feature is to open up the University’s most current research to a wider audience. Many of our users do not necessarily come from an academic background, or may be based within another discipline, or an institution where journal subscriptions are more limited. The repository is often their first port of call to find new research as it ranks highly in Google search results. We hope that these users will benefit from ‘Request a copy’ by being able to access new outputs early, at researchers’ discretion. Additionally, this may provide an added benefit to researchers by introducing new contacts and potential collaborations.

How it works

Screen Shot 2016-10-06 at 13.53.30Items in Apollo that are not yet accessible to the wider public are indicated by a padlock symbol that appears on the thumbnail image and filename link which users can usually click to download the file.

Reasons why the file may not yet be publicly available include:

  • Some publishers require that articles in repositories cannot be made available until they are published, or until a specified time after publication
  • We hold a number of digitised theses in the repository, and for some we have been unable to contact the author to secure permission to make their thesis available
  • Authors may choose to make their dataset available only once the related article is published

When a user clicks on a thumbnail or filename link containing a padlock, they are directed to the ‘Request a copy’ form. Here, they provide their name, email address and a message to the author. On clicking ‘Request copy’, an email is sent to the person who submitted the article, containing the user’s details. The recipient of this email then has the option to approve or deny the user’s request, to contact the user for more information, or (if they are not the author) to forward the request to the author.

How it really works

In practice, the process is slightly more complicated. For most of the content in the repository, the person who submitted an item will be a member of repository staff, rather than the item’s author. This means that for the most part, emails generated by the ‘Request a copy’ form were initially sent to members of the Office of Scholarly Communication team. In some cases, these requests were sent to people who have left the University, and we have had to query the system to retrieve these emails. As an interim measure, we have now directed all emails to support@repository.cam.ac.uk. These still need manual processing.

Theses

For theses where we have not received permission from the author to make them available, we forward requests to the University Library’s Digital Content Unit, who have traditionally provided digitised copies of theses at a charge of £65. We have  found however, that once information about this charge is communicated to the requester, very few (approximately 1%) actually complete the process of ordering a thesis copy.

We have been working with the Digital Content Unit on a trial where thesis copies were offered at £30, then £15. However, even at these cheaper prices, uptake remained low (it increased to 10%, but due to the small size of the sample, this only equated to two and three requests at each price point, and therefore may not be statistically significant). This indicates that the objection was to being charged at all, rather than to the particular amount. Work in this area remains ongoing to try and offer thesis copies as cheaply as possible to requesters, while allowing the Digital Content Unit to cover their costs.

Articles

If the request is for an article, we first need to check whether the article has actually been published and is already available Open Access. Although we endeavour to keep all our repository records up to date, unless we are informed that an article has been published, repository staff need to check each article for which publication is pending. This is a time-consuming manual process, and when we have a large backlog, sometimes it can take a while before an article is updated following publication.

If we found that the article has indeed been published and can be made Open Access, we amend the record, make the article available and email the requester to let them know they can now download the file directly from the repository.

On the other hand, if the article is still not published, or if it is under an embargo, we need to forward the request to the corresponding author(s). Sometimes their name(s) and email address(es) will be included within the article itself, and sometimes we have a record of who submitted the article via the Open Access upload form. However, if it is not clear from the article who the corresponding author is, or if their contact details are not included, and if the article was submitted by an administrator rather than one of the authors, we then need to search via the University’s Lookup service for the email addresses of any Cambridge authors, and search the internet for email addresses of any non-Cambridge authors, before we can forward on the request.

As a result, it can take repository staff up to 30 minutes to process an individual request. This is quicker if the article has been requested previously and the author’s contact details are already stored, but can take longer when we need to search. Sometimes, there is also repeat correspondence if the author has any queries, which adds to the total time in processing each request.

Amending our processes

Since introducing ‘Request a copy’, we have started collecting the email addresses of corresponding authors when an article is submitted, and we have commissioned a repository development company to ensure that ‘Request a copy’ emails can be sent directly to those authors for whom we have an email address – a feature that we are hoping to implement in the next few weeks.

However, if the author moves institution, their university email address will no longer be valid, and any requests for their work will again need to come via repository staff. One way to solve this would be to ask for an external (non-university) email address for the corresponding author at the point where they upload the article to the repository. However, this would introduce an extra step to an already onerous process and may act as a further barrier to authors submitting articles in the first place.

Generally, ‘Request a copy’ is a great idea and provides many benefits to the research community and beyond. But the implementation of this service has been challenging. The amount of time taken by each request has meant that some staff members have been redeployed from their usual jobs to facilitate these requests, which also has an impact on the backlog of articles in the repository that need to be checked in case they have since been published. If an article is published but still in the backlog (and therefore not publicly available in the repository), unnecessary requests for it could result in a reputational issue for the Office of Scholarly Communication and the University.

We will continue to look at our processes over the coming academic year, to see how we can improve our current workflows, and identify and resolve any issues, as well as determining where best to focus any further development work. In the related blog post on ‘Request a copy’, I’ll be talking about usage statistics for the service so far, some more unexpected use cases we have encountered, and feedback from our users that will help us to shape the service into the future.

Published 7 October 2016
Written by Sarah Middle
Creative Commons License