What do you want, and why do you want it? An update on Request a Copy

 As part of Open Access Week 2018, the Office of Scholarly Communication is publishing a series of blog posts on open access and open research. In this post Dr Mélodie Garnier provides some new insights into our Request a Copy service.

4,416. This is the number of requests for copies of material in our repository we’ve received over the past 12 months. Daunting, isn’t it? And definitely on the rise, with a 33% increase from the previous year. Two years and a half after its implementation in June 2016, our Request a Copy service is now more popular than ever. Our institutional repository Apollo hosts thousands of freely available research outputs, but also many that are under embargo.  People from all over the world and from all walks of life are keen to access them. But what exactly do requesters want? And why do they want it?

What do people want?

Our repository hosts a whole range of research outputs, but theses and journal articles are by far the most popular. Interestingly, the relative proportion of requested theses vs requested articles has shifted this year. From October 2016 to October 2017, requests for journal articles made up 56% of the total number of requests, and requests for theses made up 39%. Since last October, requests for journal articles have accounted for 38% of the total while theses have accounted for 59%.

Looking at the raw figures, the number of requests for journal articles has actually gone up (from 1,647 to 1,689), though only slightly. But the number of thesis requests has more than doubled, going from 1,145 to 2,586. This is partly explained by the University of Cambridge’s requirement for PhD students to upload their theses from 1st October 2017, leading to 1,279 new theses uploads. On top of these, we have added around 1,300 historical British Library theses and around 200 scanned historical theses from the Digital Content Unit. So between 2,500 and 3,000 theses have been added to Apollo this year alone (more on this tomorrow for #ThesisThursday).

Most wanted

Most items requested this year were only requested once, but 28 items were requested 10 times or more. Of the 20 most requested items, four are journal articles and 16 are PhD theses. Here’s our top 5:

Aside from the gold medal winner, all the other works were published this year and have only been available in Apollo for a few months. So it is striking to see how popular some of them have become in quite a short period of time. A case in point is the zoology article, which was deposited in Apollo only last month and first published shortly afterwards.

Word of mouth

Though it is sometimes unclear why particular outputs suddenly attract a lot of requests, Altmetric Attention scores can be telling – see the one below for the zoology article I’ve just mentioned:

Another interesting example (not included in the top 5) is a PhD thesis deposited in Apollo at the end of August. From 18 in September, the Apollo record has gone up to an astounding number of 911 visits in October (and counting), with a surge of requests. What happened in between? The author publicised her thesis on a Facebook society page, pointing to the repository record link for access.

We only became aware of this as requesters explicitly referred to that page, but it’s possible that similar things happen a lot of the time. So aside from traditional media outlets, the influence of social media on number of requests received can be quite dramatic, and probably greater than we could ever capture.

Tell us about yourself

When requesting a copy of an embargoed article or thesis, people are prompted to leave a message alongside contact details. This is so they can introduce themselves and explain why they are interested in accessing the work, mainly so that authors can make informed decisions on whether to accept or reject requests. Quite often these messages have little to no useful information, but some can be informative in a number of ways.

Through them we can get a glimpse of the range of people accessing the repository – their geographical provenance, background and professional occupation. We can also get a sense of the range of interests that people have (which may appear very specialised, if not a little obscure). And crucially they tell us what people want to do with the research – whether use it as reference, apply it in their professional sphere or simply read it for pleasure.

Why do people request work?

Broadly speaking, people request work in Apollo for the following purposes: reference/citation, personal interest/leisure, replication of results for research purposes, and need to inform professional practice. But those broad categories can include several sub-categories, for example personal interest can stem from hearing about the research in the media or knowing the author.

Getting the full detailed account of why people request work from our repository would require going through messages individually, and perhaps some degree of subjective judgement. Since launching the Request a Copy service we’ve had over 8,000 requests – so even if uninformative messages were excluded, the analysis could be fairly time-consuming. But certainly worth exploring, so watch this space.

Just a snippet…

What better way to advocate for Open Access than to show concrete examples of how research can impact on individual lives? Our Open Access team sees evidence of this every day through Request a Copy messages. So until we can offer a full-blown analysis of the output, let’s conclude this blog post with a selection of favourites:

  • “Our daughter is being investigated for Beckwith Wiedemann Syndrome. We would like as much information as possible about this area”
  • “I’m a pediatric radiation oncologist and this paper is a “practice changer” one!”
  • “My task is to convince policy makers in Sri Lanka to switch to circular economy. I am looking for all possible information to do this”
  • “I work in FE/HE and have a number of students experiencing/ or diagnosed with psychosis, I am very interested in intervention research and programmes for psychosis that can be implemented within our college environment”
  • “I would like a copy of this material for inspiring my high school students of physics”
  • “I hope to learn more about the potential risks of my decision to donate a kidney”

Although there is a definite cost to running Request a Copy in terms of staff time, it is clear how popular and valuable a service it has become. As its popularity increases so does the need for process efficiency, however. This is currently a big priority for us and something we’ll have to keep working on, but we think the benefits for researchers and the wider community are worth it.

Published 24 October 2018
Written by Dr Mélodie Garnier
Creative Commons License

Text and data mining services: an update

Text and Data Mining (TDM) is the process of digitally querying large collections of machine-readable material, extracting specific information and, by analysis, discovering new information about a topic.

In February 2017, a group University of Cambridge staff met to discuss “Text and Data Mining Services: What can Cambridge libraries offer?”  It was agreed that a future library Text and Data Mining (TDM) support service could include:

  • Access to data from our own collections
  • Advice on legal issues, what publishers allow, what data sets and tools are available
  • Registers on data provided for mining and TDM projects
  • Fostering agreements with publishers.

This blog reports on some of the activities, events and initiatives, involving libraries at the University of Cambridge, that have taken place or are in progress since this meeting (also summarised in these slides).  Raising awareness, educating, and teasing out the issues around the low uptake of this research process have been the main drivers for these activities.

March 2017: RLUK 2017 Conference Workshop

The Office of Scholarly Communication (OSC) and Jisc ran a workshop at the Research Libraries UK 2017 conference to discuss Research Libraries and TDM.  Issues raised included licencing, copyright, data management, perceived lack of demand, where to go for advice within an institution or publisher, policy and procedural development for handling TDM-related requests (and scaling this up across an institution) and the risk of lock-out from publishers’ content, as well as the time it can take for a TDM contract to be finalised between an institution and publisher.  The group concluded that it is important to build mechanisms into TDM-specific licencing agreements between institutions and publishers where certain behaviours are expected.  For example, if suspicious activity is detected by a publisher’s website, it would be better not to automatically block the originating institution from accessing content, but investigate this first (although this may depend on systems in place), or if lock-out happens and the activity is legal, participants suggested that institutions should explore compensation for the time that access is lost if significant.

July 2017: University of Cambridge Text and Data Mining Libguide

Developed by the eResources Team, this LibGuide explains about Text and Data Mining (TDM): what it is, what the legal issues are, what you can do and what you should not try to do. It also provides a list of online journals under license for TDM at the University of Cambridge and a list of digital archives for text mining that can be supplied to the University researchers on a disc copy. Any questions our researchers may have about a TDM project, not answered through the LibGuide, can be submitted to the eResources Team via an enquiry form.

July 2017: TDM Symposium

The OSC hosted this symposium to provide as much information as possible to the attendees regarding TDM.  Internal and external speakers, experienced in the field, spoke about what TDM is and what the issues are; research projects in which TDM was used; TDM tools; how a particular publisher supports TDM; and how librarians can support TDM.

At the end of the day a whole-group discussion drew out issues around why more TDM is not happening in the UK and it was agreed that there was a need for more visibility on what TDM looks like (e.g. a need for some hands-on sessions) and increased stakeholder communication: i.e. between publishers, librarians and researchers.

November 2017: Stakeholder communication and the TDM Test Kitchen

This pilot project involves a publisher, librarians and researchers. It is providing practical insight into the issues arising for each of the stakeholders: e.g. researchers providing training on TDM methods and analysis tools, library support managing content accessibility and funding for this, and content licencing and agreements for the publisher. We’ll take a more in-depth look at this pilot in an upcoming blog on TDM – watch this space.

January 2018: Cambridge University Library Deputy Director visits Yale

The Yale University Library Digital Humanities Laboratory provides physical space, resources and a community within the Library for Yale researchers who are working with digital methods for humanities research and teaching. In January this year Dr Danny Kingsley visited the facility to discuss approaches to providing TDM services to help planning here. The Yale DH Lab staff help out with projects in a variety of ways, one example being to help researchers get to grips with digital tools and methods.  Researchers wanting to carry out TDM on particular collections can visit the lab to do their TDM: off-line discs containing published material for mining can be used in-situ. In 2018, the libraries at Cambridge have begun building up a collection of offline discs of specific collections for the same purpose.

June 2018: Text and Data Mining online course

The OSC collaborated with the EU OpenMinTeD project on this Foster online course: Introduction to Text and Data Mining.  The course helps a learner understand the key concepts around TDM, explores how Research Support staff can help with TDM and there are some practical activities that even allow those with non-technical skills try out some mining concepts for themselves.  By following these activities, you can find out a bit more about sentence segmentation, tokenization, stemming and other processing techniques.

October 2018: Gale Digital Scholar Lab

The University of Cambridge has trial access to this platform until the end of December: it provides TDM tools at a front end to digital archives from Gale Cengage.  You can find out more about this trial in this ejournals@cambridge blog.

In summary…

Following the initial meeting to discuss research support services for TDM, there have been efforts and achievements to raise awareness of TDM and the possibilities it can bring to the research process as well as to explore the issues around the low usage of TDM in the research community at large.  This is an on-going task, with the goal of increased researcher engagement with TDM.

Published 23 October 2018
Written by Dr Debbie Hansen
Creative Commons License

Cambridge Open Access spend 2013-2018

Since 2013, the Open Access Team has been helping Cambridge researchers, funded by Research Councils UK (RCUK) and the consortium of biomedical funders which make up the Charity Open Access Fund (COAF), to meet their Open Access obligations. Both RCUK (now part of UKRI) and COAF have Open Access policies which have a preference for ‘gold’, i.e. the published work should be Open Access immediately at the time of publication. Implementing these policies has come at a significant cost. In this time, Cambridge has been awarded just over £10 million from RCUK and COAF to implement their Open Access policies, and the Open Access Team has diligently used this funding to maximum effect.

Figure 1. Comparison of combined RCUK/COAF grant spend and available funds, April 2013 – March 2018.

Initially, expenditure was slow which allowed the Open Access Team to maintain a healthy balance that could guarantee funding for almost any paper which met a few basic requirements. However, since January 2016 expenditure has gradually been catching up on the available funds which has made funding decisions more difficult (specifically Open Access deals tied to multi-year publisher subscriptions). In the first three months of 2018 average monthly expenditure on the RCUK block grant alone exceeded £160,000. We are quickly reaching the point where expenditure will outstrip the available grants.

One technical change which has particularly affected our management of the block grants was RCUK’s decision last year to move away from a direct cash award (which could be rolled over year to year) to a more tightly managed research grant. In the past, carrying over underspend has given us some flexibility in the management of the RCUK funds, whereas the more restrictive style of research grant will mean that any underspend will need to be returned at the end of the grant period, while any overspend cannot be deferred into the next grant period. As we are now dealing with a fixed budget, the Open Access Team will need to ensure that expenditure is kept within the limits of the grant. This is difficult when we have no control over where or when our researchers publish.

Funding from COAF (which is also managed as though it is a research grant) has generally matched our total annual spend quite closely, but the strict grant management rules have caused some problems, especially in the transition period between one grant and another. However, unlike RCUK, the Wellcome Trust will provide supplementary funding in addition to the main COAF award if it is exhausted, and the other COAF partners have similar procedures in place to manage Open Access payments beyond the end of the grant.

Where does it all go?

Most of our expenditure (91%) goes on article processing charges (APCs), as perhaps one might expect, but the block grants are also used to support the staff of the Open Access Team (3%), helpdesk and repository systems (2%), page and colour charges (2%), and publisher memberships (1%) (where this results in a reduced APC). The majority of APCs we’ve paid go towards hybrid journals, which represent approximately 80% of total APC spend.

So let’s take a look at which publishers have received the most funds. We’ve tried to match as much of our raw financial information we have to specific papers, although some of our data is either incomplete or we can’t easily link a payment back to a specific article, particularly if we look back to 2013-2015 when our processes were still developing. Nonetheless, the average APC paid over the last 5 years was £2,291 (inc. 20% VAT), but as can be seen from Table 1, average APCs have been rising year on year at a rate of 7% p.a., significantly higher than inflation. Price increases at this rate are not sustainable in the long term – by 2022 we could be paying on average £3000 per article.

Table 1. Average APC by publication year of article (where known).

Year of publication Average APC paid (£)
2013  £1,794
2014  £1,935
2015  £2,044
2017  £2,187
2018  £2,336

Elsevier has been by far the largest recipient of block grant funds, receiving 29.4% of all APC expenditure from the RCUK and COAF awards (over £2.5 million), though only accounting for 25.5% of articles. In the same time SpringerNature also received in excess of £1 million (which as we’ll see below has mostly been spent on two titles). With such a substantial set of data we can now begin to explore the relative value that each publisher offers. Take for example Taylor & Francis (£107,778 for 120 articles) compared to Wolters Kluwer (£119,551 for 35 articles). Both publishers operate mostly hybrid OA journals and yet the relative value is significantly different. What is so fundamentally different between publishers that such extreme examples as this should exist?

Table 2. Top 20 publishers by combined total RCUK/COAF APC spend 2013-2018.

Value of APCs paid Number of APCs paid Avg. APC paid
Publisher £ % N % £
Elsevier £2,559,736 29.4% 971 25.5% £2,636
SpringerNature £1,050,774 12.1% 402 10.6% £2,614
Wiley £808,847 9.3% 279 7.3% £2,899
American Chemical Society £411,027 4.7% 251 6.6% £1,638
Oxford University Press £379,647 4.4% 169 4.4% £2,246
PLOS £267,940 3.1% 168 4.4% £1,595
BioMed Central £245,006 2.8% 153 4.0% £1,601
Institute of Physics £189,434 2.2% 98 2.6% £1,933
Royal Society of Chemistry £156,018 1.8% 106 2.8% £1,472
BMJ Publishing £144,001 1.7% 68 1.8% £2,118
Company of Biologists £140,609 1.6% 50 1.3% £2,812
Wolters Kluwer £119,551 1.4% 35 0.9% £3,416
Taylor & Francis £107,778 1.2% 120 3.2% £898
Frontiers £103,011 1.2% 61 1.6% £1,689
Cambridge University Press £77,139 0.9% 38 1.0% £2,030
Royal Society £73,890 0.8% 52 1.4% £1,421
Society for Neuroscience £69,943 0.8% 26 0.7% £2,690
American Society for Microbiology £63,056 0.7% 36 0.9% £1,752
American Heart Association £53,696 0.6% 14 0.4% £3,835
Optical Society of America £39,463 0.5% 17 0.4% £2,321
All other articles £1,654,228 19.0% 690 18.1% £2,397
Grand Total £8,714,794 100.0% 3,804 100.0% £2,291

Next, journal level metrics. The most popular journal that we pay APCs for is Nature Communications, followed closely by Scientific Reports. Both of these are SpringerNature titles, and indeed these two titles make up the bulk of our total APC spend with SpringerNature. Yet these two journals represent significantly different approaches to Open Access. Nature Communications, along with Cell and Cell Reports, are some of the most expensive routes to making research publications Open Access, whereas Scientific Reports and PLOS One sit at the lower end of the spectrum. It is interesting that we haven’t seen a particularly popular Open Access journal fill the niche between Nature Communications and Scientific Reports.

Figure 2. APC number and total spend by journal. In the last five years, nearly £450,000 has been spent on articles published in Nature Communications.


Managing the future

While the OA block grants have kept pace with overall expenditure so far, continuing monthly expenditure of £160,000 would risk overspending on the RCUK grant for 2018/19. To counter this possible outcome the University has agreed a set of funding guidelines to manage the RCUK (from now on known as Research Councils) and COAF awards. For Research Councils’ funded papers the new guidelines place an emphasis on fully Open Access journals and hybrid journals where the publisher is taking a sustainable approach to managing the transition to Open Access. We’ve spent a lot of money over the last five years, yet it’s not clear that the influx of cash from RCUK and COAF has had any meaningful impact on the overall publishing landscape. Many publishers continue to reap huge windfalls via hybrid APCs, yet they are not serious about their commitment to Open Access.

In the future, we’ll be demanding better deals from publishers before we support payments to hybrid journals so that we can effect a faster transition to a fully Open Access world.

Published 22 October 2018
Written by Dr Arthur Smith
Creative Commons License