All posts by Arthur Smith

Open access: fringe or mainstream?

When I was just settling in to the world of open access and scholarly communication, I wrote about the need for open access to stop being a fringe activity and enter the mainstream of researcher behaviour:

“Open access needs to stop being a ‘fringe’ activity and become part of the mainstream. It shouldn’t be an afterthought to the publication process. Whether the solution to academic inaction is better systems or, as I believe, greater engagement and reward, I feel that the scholarly communications and repository community can look forward to many interesting developments over the coming months and years.”

While much has changed in the five years since I (somewhat naïvely) wrote those concluding thoughts, there are still significant barriers towards the complete opening of scholarly discourse. However, should open access be an afterthought for researchers? I’ve changed my mind. Open access should be something researchers don’t even need to think about, and I think that future is already here, though I fear it will ultimately sideline institutional repositories.

According to the 2020 Leiden Ranking, the median rate at which UK institutions make their research outputs open access is over 80%, which is far higher than any other nation (Figure 1). Indeed, the UK is the only country that has ‘levelled up’ over the last five years, while the rest of the world’s institutions have slowly plodded along making slow, but steady, progress.

Figure 1. The median institutional open access percentage for each country according to the Leiden Ranking. Note, these figures are medians of all institutions within a country. This does not mean that 80% of the UK’s publications are open access, but that the median rate of open access at UK institutions is 80%.

The main driver for this increase in open access content in the UK is through green open access (Figure 2), due in large part to the REF 2021 open access policy (announced in 2014 and effective from 2016). This is a dramatic demonstration of the influence that policy can have on researcher behaviour, which has made open access a mainstream activity in the UK.

Figure 2. The median institutional green open access percentage for each country according to the Leiden Ranking.

Like the rest of the UK, Cambridge has seen similar trends across all forms of open access (Figure 3), with rising use of green open access, and steadily increasing adoption of gold and hybrid. Yet despite all the money poured into gold and (more controversially) hybrid open access, the net effect of all this other activity is a measly 3% additional open access content (82% vs 79%). Which begs the question, was it worth it? If open access can be so successfully achieved through green routes, what is the inherent benefit of gold/hybrid open access?

Figure 3. Open access trends in Cambridge according to the Leiden Ranking. In the 2020 ranking, 79% was delivered through green open access. This means that despite all the work to facilitate other forms of open access, this activity only contributed an additional 3% to the total (82%).

Of course, Plan S has now emerged as the most significant attempt to coordinate a clear and coherent international strategy for open access. While it is not without its detractors, I am nonetheless supportive of cOAlition S’s overall aims. However, as the UK scholarly communication community has experienced, policy implementation is messy and can lead to unintended consequences. While Plan S provides options for complying through green open access routes, the discussions that institutions and publishers (both traditional and fully open access alike) have engaged in are almost entirely focussed on gold open access through transformative deals. This is not because we, as institutions, want to spend more on publishing, but rather it is the pragmatic approach to create open access content at the source and provide authors with easy and palatable routes to open access. It also is a recognition that flipping journals requires give and take from institutions and publishers alike.

We are now very close to reaching a point where open access can be an afterthought for researchers, particularly in the UK. In large part, it will be done for them through direct agreements between institutions and publishers. Cambridge already has open access publishing arrangements with over 5000 journals, and this figure will continue to grow as we sign more transformative agreements. However, this will ultimately be to the detriment of green open access. Instead of being the only open access source for a journal article, institutional repositories will instead become secondary storehouses of already gold open access content. The heyday of institutional repositories, if one ever existed, is now over.

For me, that is a sad thought. We have poured enormous resource and effort into maintaining Apollo, but we must recognise the burden that green open access places on researchers. They have better things to do. I expect that the next five years will see a dramatic increase in gold and hybrid open access content produced in the UK. Green open access won’t go away, but we will have entered a time where open access is no longer fringe, nor indeed mainstream, but rather de facto for all research.

Published 23 October 2020

Written by Dr Arthur Smith

This icon displays that the content of this blog is licensed under CC BY 4.0

Clearing the final hurdle – automating embargo setting

One of the biggest issues facing the Open Access Team has been keeping up with the constant stream of accepted manuscripts that need to be processed. In many cases we receive notification of an accepted manuscript well before formal publication. This has presented a significant challenge over the last five years because although we know there is a publication forthcoming (or at least we trust that there this), we have no idea as to when an article may actually be published.

This means that we have many thousands of publication records in Apollo which have ‘placeholder’ embargoes because we simply did not know the publication date at the point of archiving and therefore could not set an accurate embargo. After archiving, many of the records in Apollo may have been supplemented with a publication date thanks to metadata supplied via Symplectic Elements, but we still need to set an accurate embargo.

In other cases we might be waiting for an article to be published gold open access so that we can update Apollo with the published version of record.

While we are now very adept at archiving manuscripts in Apollo (thanks in large part to Fast Track and Orpheus) it remains a challenge to properly and accurately update Apollo records with either correct embargoes for accepted manuscripts, or the open access version of record. It is a futile task to be constantly checking whether a manuscript has been published. While the Open Access Team keeps a list of every publication that requires updating, this is a thankless job that should be highly automatable.

To that end, we have recently leveraged Orpheus to do at lot of the heavy lifting for us. By interrogating every journal article in Apollo and comparing its metadata against Orpheus we can now quickly determine which items can be updated and take the necessary next steps, changing embargoes where appropriate or identifying opportunities to archive the published version of record.

To do this we created a DSpace curation task to check every “Article” type in Apollo that had at least one file that was currently under embargo. We then compared the publication metadata against the information held in Orpheus to determine what steps needed to be taken. In total we found 9,164 items in need of some attention. The results are displayed below in a Tableau Public visual and summarised in Table 1.

Of these items, 3,864 had a published open access version archived alongside the embargoed manuscript, so we skipped any further updating of these records. This is actually a very good sign, and indicates that the Open Access Team has been going back to records and supplementing them with the open access version of record.

Amongst the remaining items, 2,794 were successfully matched against Orpheus and had their embargoes verified: 1,862 records were updated with shorter embargoes and 412 had longer embargoes applied, leaving 520 items which were unchanged because they already had the correct embargo period.

The final 2,506 items were primarily composed of records with no publication date (1,132 items), publications that could potentially be supplemented by the open access version of record (537 items) or had no embargo information in Orpheus (434 items).

Table 1. Summary of outcomes after comparing Apollo records against Orpheus.

Date archived in Apollo2014201520162017201820192020Total
The item has an open VoR version710512001019130022673864
Accepted version – embargo updated21457613223051342794
No publication date available10159327142171132
Orpheus VoR embargo: 014511854517537
No AAM embargo information available3664393326425434
Other outcome837114472316212403
Total1915415841358152541224029164

We plan to run this curation task on a regular basis and periodically check the outcomes. Any items that continually fail to update will be processed manually by the Open Access Team, but our intention and desire is to move away from manual processing wherever possible.

Published 3 April 2020

Written by Dr Arthur Smith

Image showing that this blog post is under CC-BY licence.

A Fast-Track Route to Open Access

In the last two years, since the REF 2021 open access policy came into force, the Open Access Team has received an ever increasing number of manuscript submissions for archiving in Apollo, Cambridge’s institutional open access repository.

We have been thinking long and hard about ways to cope with the workload, by scrutinising existing practices and streamlining workflows, because we want to provide the best possible service to our researchers, commensurate with the University’s world leading research.

This blog introduces what is perhaps the greatest overhaul of our workflows since the service began: a new ‘Fast Track’ deposit system.

Work it harder

Before the start of the REF OA policy (2014-2016), the Open Access Team would process and manually curate every manuscript submission we received. Authors could expect an initial response within 1-2 working days, after which (usually within a month) we would archive their manuscript in Apollo.

A simplified workflow for a typical manuscript was:

  1. Manuscript uploaded by submitter in Symplectic Elements.
  2. Item created in Apollo (DSpace) workflow
  3. Helpdesk ticket created (Zendesk).
  4. Open Access Team reviews manuscript, advises submitter and makes a decision.
  5. Open Access Team archives the manuscript in Apollo and informs submitter.

Both the decision (4) and archive (5) steps take time. For each manuscript we would need to decide whether the files we received could be archived, what funder open access policies were at play and the open access options available from the publisher. We could then advise authors about their open access choices.

To archive a manuscript the process was broadly the following:

  1. Review the helpdesk ticket (Zendesk) for the open access decision.
  2. Enter as many publication details as possible in Symplectic Elements.
  3. Retrieve the submission from the Apollo (DSpace) deposit workflow.
  4. Add licence and metadata to the record.
  5. Review the submission and approve for archiving.
  6. Move the item to the relevant departmental collection and apply an appropriate embargo (if required).
  7. Finally, update the helpdesk ticket and send the original submitter a link to their Apollo record.

Each manuscript took on average 18 minutes to archive, which, besides being manually tedious and prone to error, was extremely time-consuming. Add to this the time required to make the initial decision and each manuscript submission could easily take 30 minutes for the Open Access Team to fully process from start to finish, especially if an open access fee had to be paid.

Fast-forward two years and with the rate of new manuscript submissions now peaking at over 1,300 per month, simply processing manuscripts for the REF would require more than four full-time staff members. Whilst these manual processes were viable for a handful of submissions a day, they became unwieldy at scale.

Make it better

Our first attempt at speeding up our open access system began in August 2017. To start we made a number of operational changes to reduce the time spent processing manuscript submissions:

  • We would rely entirely on the metadata present in Symplectic Elements to populate the Apollo records (i.e. we would not curate manual records).
  • The Open Access Team would no longer update the helpdesk records, instead internal record keeping would be automated as much as possible.

Unfortunately, the number of steps in the Apollo workflow was still roughly the same as the previous process, but with one key difference: a new field to record what we call the ‘Fast Track’ decision. There were seven Fast Track options:

  • Submitted
  • Proof
  • Published (not open access)
  • Published (open access)
  • Accepted (published)
  • Accepted (not published)
  • Other

The first six options represent the vast bulk of all manuscripts received by the Open Access Team, and ‘Other’ option simply acts as a catch-all for anything else. By simply knowing what sort of manuscript has been uploaded much of the decision and archiving process can be automated. However, the agent still needed to retrieve the item from the Apollo workflow, check the version of the file and publication status of the paper, add some metadata fields, approve the item, and move it to an appropriate collection.

Figure 1. The Apollo workflow page of a typical manuscript submission, with the addition of the new ‘Fast Track’ field.

The choice of Fast Track decision leads to four possible outcomes which would ‘trigger’ actions in our Zendesk helpdesk:

  • Submitted, proof, published (not open access)
    • Email submitter, ask for accepted manuscript
  • Published (open access)
    • Archive in Apollo (no embargo) ⇒ Email submitter Apollo link
  • Accepted (published), accepted (not published)
    • Archive in Apollo (embargoed) ⇒ Email submitter Apollo link
  • Other
    • Refer to Open Access Team

Despite being a much faster process, it was still manually tedious. It could also require up to 33 actions from agents (29 mouse clicks) and 14 web pages to be loaded, still not very user friendly. However, the time to archive had decreased from 18 to 9 minutes – a 50% reduction from the previous fully manual system.

Do it faster

So what if all the steps involved in processing a manuscript submission could be reduced to the absolute minimum, and be actionable within a single webpage? After a short development sprint, the Open Access Team launched the ‘Fast Track Deposits’ interface last September. A snapshot of the user interface is shown below.

Figure 2. The Fast Track interface. Choosing one of the options in blue is enough to fully archive a manuscript, or process it for further action by the submitter or the Open Access Team.

At the top of the page, the agent can see a ‘publication summary’ including the item title, the journal title, and publisher DOI if available. Both the item title and publisher DOI are hyperlinked, so that the agent can Google-search the item or land on the publisher’s webpage with a single mouse click.

The agent must first inspect the file and check that it is a suitable version (i.e. either the accepted version or the open access published version). If wrongly labelled, they must relabel the file via a dropdown menu, and add/delete files as appropriate. The agent then ‘describes’ the manuscript (i.e. decides whether it is the accepted, published, submitted or proof version) and submits their decision. The decision determines the trigger behaviour in the automatically populated helpdesk ticket. The agent is then free to move on to the next item.

If the decision is ‘accepted’ or ‘published open access’, the item is deposited and the submitter is automatically notified via email. For submitted, proof, and non-OA published versions, the author receives an automatic email asking for the accepted manuscript. Items are archived in the repository under a generic collection, and any forthcoming publication details are added to the record via external source information in Elements.

To see just how efficient Fast Track is we’ve prepared a short demonstration video which captures some of the key features:

Video 1. Real-time demonstration of the Fast Track system.

Makes us stronger

Agents therefore need only make one decision: identify the file version. But the real ingenuity of the Fast Track system is that embargoes can be set automatically by:

  1. Taking into account the decision made by the agent (e.g. no embargo if published open access);
  2. Detecting publication status and publication dates from Elements; and
  3. Retrieving journals’ embargo policies via Orpheus (you can learn more about Orpheus in our previous blog post).

In some cases, usually because we don’t know the publication date, we can’t determine the embargo length of an accepted manuscript. In such cases we apply a 36 month embargo from the date of the Fast Track decision. We know that this embargo won’t always be correct, however, we routinely check manuscripts in Apollo and update embargoes accordingly.

Figure 3. Simplified overview of the Fast Track process. The key decision is to determine the type of manuscript that has been submitted. Everything else is handled automatically.

Since launching Fast Track the average time to process a manuscript is 1-2 minutes. More than 8,000 items have been processed since launching the phase two Fast-Track interface. If items processed under the phase one effort are included, the number goes up to just over 14,000. And since a picture speaks a thousand words, Figure 4 below shows the effect produced by the new interface launched in September on our backlog of unprocessed submissions.

Figure 4. Historical change in the number of unprocessed open access manuscript submissions. The total number of outstanding manuscript submissions peaked at nearly 2,400 in September 2018. Immediately after launching the Fast Track website the backlog dropped dramatically and was completely eliminated by March 2019.

We will continue to develop Fast Track to further streamline our processing of manuscripts. We have already started to partner with librarians and administrators across the University to leverage the collective knowledge about open access which now exists within the University’s professional academic services.

Get in contact: If you are running a DSpace repository and would like to implement Fast Track to work alongside your existing workflows email us at support@repository.cam.ac.uk

Published 23 April 2019
Written by Dr Mélodie Garnier and Dr Arthur Smith
Creative Commons License

Blood: in short supply?

Two years ago (almost to the day) we called out Blood for their misleading open access options that they offered to Research Council and Charity Open Access Fund (COAF) authors. Unfortunately, little has changed since then:

Neither of these routes is sufficient to comply with either Research Councils’ or COAF’s open access policies which require that the accepted text be made available in PMC within 6 months of publication, or that the published paper is available immediately under a CC BY licence.

At the time, we called on Blood to change their offerings or we would advise Research Councils and COAF funded authors to publish elsewhere. And that’s exactly what’s happened:

Figure 1. All articles published in Blood since 2007 which acknowledge MRC, Wellcome, CRUK or BHF funding. Data obtained from Web of Science.

Over the last two years we’ve seen a dramatic decline in the number of papers being published in Blood by Medical Research Council (MRC), Wellcome Trust, Cancer Research UK (CRUK) and British Heart Foundation (BHF) researchers. The number of papers published in Blood that acknowledge these funders in now at its lowest point in over a decade.

It’s important to remember that the 23 papers published in Blood in 2017 are all non-compliant with the open access policies of Research Councils and COAF, and if these papers acknowledge Wellcome Trust funding then those researchers may also be at risk of losing 10% of their total grant. If you are funded by Research Councils or one of the COAF members, please consider publishing elsewhere. SHERPA/FACT confirms our assessment:

Sign the open letter

We’re still collecting signatures for our open letter to the editor of Blood in the hope that they’ll reconsider their open access options. Please join us by adding your name.

Cambridge Open Access spend 2013-2018

Since 2013, the Open Access Team has been helping Cambridge researchers, funded by Research Councils UK (RCUK) and the consortium of biomedical funders which make up the Charity Open Access Fund (COAF), to meet their Open Access obligations. Both RCUK (now part of UKRI) and COAF have Open Access policies which have a preference for ‘gold’, i.e. the published work should be Open Access immediately at the time of publication. Implementing these policies has come at a significant cost. In this time, Cambridge has been awarded just over £10 million from RCUK and COAF to implement their Open Access policies, and the Open Access Team has diligently used this funding to maximum effect.

Figure 1. Comparison of combined RCUK/COAF grant spend and available funds, April 2013 – March 2018.

Initially, expenditure was slow which allowed the Open Access Team to maintain a healthy balance that could guarantee funding for almost any paper which met a few basic requirements. However, since January 2016 expenditure has gradually been catching up on the available funds which has made funding decisions more difficult (specifically Open Access deals tied to multi-year publisher subscriptions). In the first three months of 2018 average monthly expenditure on the RCUK block grant alone exceeded £160,000. We are quickly reaching the point where expenditure will outstrip the available grants.

One technical change which has particularly affected our management of the block grants was RCUK’s decision last year to move away from a direct cash award (which could be rolled over year to year) to a more tightly managed research grant. In the past, carrying over underspend has given us some flexibility in the management of the RCUK funds, whereas the more restrictive style of research grant will mean that any underspend will need to be returned at the end of the grant period, while any overspend cannot be deferred into the next grant period. As we are now dealing with a fixed budget, the Open Access Team will need to ensure that expenditure is kept within the limits of the grant. This is difficult when we have no control over where or when our researchers publish.

Funding from COAF (which is also managed as though it is a research grant) has generally matched our total annual spend quite closely, but the strict grant management rules have caused some problems, especially in the transition period between one grant and another. However, unlike RCUK, the Wellcome Trust will provide supplementary funding in addition to the main COAF award if it is exhausted, and the other COAF partners have similar procedures in place to manage Open Access payments beyond the end of the grant.

Where does it all go?

Most of our expenditure (91%) goes on article processing charges (APCs), as perhaps one might expect, but the block grants are also used to support the staff of the Open Access Team (3%), helpdesk and repository systems (2%), page and colour charges (2%), and publisher memberships (1%) (where this results in a reduced APC). The majority of APCs we’ve paid go towards hybrid journals, which represent approximately 80% of total APC spend.

So let’s take a look at which publishers have received the most funds. We’ve tried to match as much of our raw financial information we have to specific papers, although some of our data is either incomplete or we can’t easily link a payment back to a specific article, particularly if we look back to 2013-2015 when our processes were still developing. Nonetheless, the average APC paid over the last 5 years was £2,291 (inc. 20% VAT), but as can be seen from Table 1, average APCs have been rising year on year at a rate of 7% p.a., significantly higher than inflation. Price increases at this rate are not sustainable in the long term – by 2022 we could be paying on average £3000 per article.

Table 1. Average APC by publication year of article (where known).

Year of publication Average APC paid (£)
2013  £1,794
2014  £1,935
2015  £2,044
2017  £2,187
2018  £2,336

Elsevier has been by far the largest recipient of block grant funds, receiving 29.4% of all APC expenditure from the RCUK and COAF awards (over £2.5 million), though only accounting for 25.5% of articles. In the same time SpringerNature also received in excess of £1 million (which as we’ll see below has mostly been spent on two titles). With such a substantial set of data we can now begin to explore the relative value that each publisher offers. Take for example Taylor & Francis (£107,778 for 120 articles) compared to Wolters Kluwer (£119,551 for 35 articles). Both publishers operate mostly hybrid OA journals and yet the relative value is significantly different. What is so fundamentally different between publishers that such extreme examples as this should exist?

Table 2. Top 20 publishers by combined total RCUK/COAF APC spend 2013-2018.

Value of APCs paid Number of APCs paid Avg. APC paid
Publisher £ % N % £
Elsevier £2,559,736 29.4% 971 25.5% £2,636
SpringerNature £1,050,774 12.1% 402 10.6% £2,614
Wiley £808,847 9.3% 279 7.3% £2,899
American Chemical Society £411,027 4.7% 251 6.6% £1,638
Oxford University Press £379,647 4.4% 169 4.4% £2,246
PLOS £267,940 3.1% 168 4.4% £1,595
BioMed Central £245,006 2.8% 153 4.0% £1,601
Institute of Physics £189,434 2.2% 98 2.6% £1,933
Royal Society of Chemistry £156,018 1.8% 106 2.8% £1,472
BMJ Publishing £144,001 1.7% 68 1.8% £2,118
Company of Biologists £140,609 1.6% 50 1.3% £2,812
Wolters Kluwer £119,551 1.4% 35 0.9% £3,416
Taylor & Francis £107,778 1.2% 120 3.2% £898
Frontiers £103,011 1.2% 61 1.6% £1,689
Cambridge University Press £77,139 0.9% 38 1.0% £2,030
Royal Society £73,890 0.8% 52 1.4% £1,421
Society for Neuroscience £69,943 0.8% 26 0.7% £2,690
American Society for Microbiology £63,056 0.7% 36 0.9% £1,752
American Heart Association £53,696 0.6% 14 0.4% £3,835
Optical Society of America £39,463 0.5% 17 0.4% £2,321
All other articles £1,654,228 19.0% 690 18.1% £2,397
Grand Total £8,714,794 100.0% 3,804 100.0% £2,291

Next, journal level metrics. The most popular journal that we pay APCs for is Nature Communications, followed closely by Scientific Reports. Both of these are SpringerNature titles, and indeed these two titles make up the bulk of our total APC spend with SpringerNature. Yet these two journals represent significantly different approaches to Open Access. Nature Communications, along with Cell and Cell Reports, are some of the most expensive routes to making research publications Open Access, whereas Scientific Reports and PLOS One sit at the lower end of the spectrum. It is interesting that we haven’t seen a particularly popular Open Access journal fill the niche between Nature Communications and Scientific Reports.

Figure 2. APC number and total spend by journal. In the last five years, nearly £450,000 has been spent on articles published in Nature Communications.


Managing the future

While the OA block grants have kept pace with overall expenditure so far, continuing monthly expenditure of £160,000 would risk overspending on the RCUK grant for 2018/19. To counter this possible outcome the University has agreed a set of funding guidelines to manage the RCUK (from now on known as Research Councils) and COAF awards. For Research Councils’ funded papers the new guidelines place an emphasis on fully Open Access journals and hybrid journals where the publisher is taking a sustainable approach to managing the transition to Open Access. We’ve spent a lot of money over the last five years, yet it’s not clear that the influx of cash from RCUK and COAF has had any meaningful impact on the overall publishing landscape. Many publishers continue to reap huge windfalls via hybrid APCs, yet they are not serious about their commitment to Open Access.

In the future, we’ll be demanding better deals from publishers before we support payments to hybrid journals so that we can effect a faster transition to a fully Open Access world.

Published 22 October 2018
Written by Dr Arthur Smith
Creative Commons License

arXiv and REF – together at last?

New draft REF2021 guidance was released for consultation on Monday morning. Buried half-way through this daunting 139 page document was an update to the REF Open Access policy.

This revised policy comes on the back of Research England’s report Monitoring sector progress towards compliance with funder open access policies which was released in June, and on which we have already commented.

From an Open Access perspective, additional flexibility for preprint servers has been added to the policy:

The funding bodies recognise that many researchers derive value from sharing early versions of papers using a pre-print service. Institutions may submit pre-prints as eligible outputs to REF 2021 (see Annex K). Only outputs which have been ‘accepted for publication’ (such as a journal article or conference contribution with an ISSN) are within scope of the REF 2021 open access policy. To take into account that the policy intent for ‘open access’ is met where a pre-print version is the same as the author accepted manuscript, we have introduced additional flexibility into the open access requirement: if the ‘accepted for publication’ text, or near final version, is available on the pre-print service, and the output upload date of the pre-print is prior to the date of output publication, this will be considered as compliant with the open access criteria (deposit, discovery, and access).

That’s a significant adjustment to previous advice and will be of considerable relief to many researchers who routinely publish their research in this way. Indeed, we have lobbied behind the scenes on this policy issue for more than three years.

But what does this actually mean and what should institutions and authors take from this?

Repositories, preprint servers – what’s the difference?

Firstly, this policy legitimises preprint servers (like arXiv, bioRxiv, SocArXiv and many more) and allows authors to use these systems without needing to worry about technical requirements.

This is in stark contrast to the way institutional and subject repositories are treated by the policy.  These repositories must meet all the requirements of the REF Open Access policy to be considered compliant, which is fine for most institutions because meeting the policy requirements is vital, but subject repositories are usually left in the lurch:

Individuals depositing their outputs in a subject repository are advised to ensure that their chosen repository meets the requirements set out at paragraphs 224 to 241 in this policy. REF 2021 guidance will not certify the repositories which fulfil policy requirements.

We’re still not sure if Europe PMC is compliant, for example.

Don’t just sit there!

However, just because preprint servers are okay, doesn’t mean that authors using preprint servers should assume they don’t need to do anything. There are two significant caveats to take note of:

  1. the manuscript deposited in the preprint server must be the “‘accepted for publication’ text”; and
  2. the manuscript must be uploaded prior to first publication.

Determining the deposit time is usually straightforward, so institutions will be able to monitor this aspect of the policy with some level of automation (especially for arXiv which is harvested by a range of publication systems).

However, the key challenge will be determining the manuscript version. We’ve previously described the work we do as manuscript detectives, so some level of checking with authors will still need to take place.

We are working internally at Cambridge on what our workflow will be to capture these outputs and we will be talking to our researchers on what they need to do or not once this is determined. We still encourage all of our researchers to upload manuscripts when accepted for publication until we indicate otherwise.

Regardless,

If there is one key recommendation we would make to all users of preprint repositories – annotate or label the records to clearly indicate the manuscript version (e.g. submitted, accepted, published).

It will help us, and you, in the long run.

Published 25 July 2018
Written by Dr Arthur Smith
Creative Commons License

Cambridge’s RCUK/COAF Open Access spend January 2017 – March 2018

It’s been reporting season for institutions in receipt of RCUK Open Access block grant awards, so we’ve been busy preparing data for both RCUK (now UKRI) and Jisc about how Cambridge has spent its funding allocation over the past 15 months (January 2017 – March 2018). In this blog post I’ll focus mainly on the Jisc Open Access article processing charge (APC) report as it includes both RCUK and COAF expenditure, which we’ve made available in Apollo (the RCUK report is available there too). We’ve had to make a few tweaks to the data to perform the analysis that follows, but that shouldn’t substantially affect the figures. Unless stated otherwise, all charges reported include VAT at 20%.

Headlines

Let’s start with a few headline numbers (Table 1). In the reporting period January 2017 – March 2018 the Open Access Team paid Open Access APCs totalling more than £2.8 million. By far the largest beneficiary of this funding was Elsevier, which received over £870,000 for RCUK and COAF funded research articles (that’s 31% of all our APC spend). In fact, Elsevier dominates the figures to such an extent that for this blog post I’ve split Cell Press titles to provide a little more insight.

Table 1. Headline figures between January 2017 and March 2018 for the RCUK and COAF Open Access block grants (https://doi.org/10.17863/CAM.24288).

  Value Notes
Total spend £2,989,609.13
Open Access £2,847,135.05
Additional publication costs (mainly page and colour fees) £111,631.68
Publisher memberships/deals £30,842.40
Articles 1547 SCOAP3 papers unknown
‘Other’ Springer Compact articles 221
Mean APC (All publishers) £1,840 SCOAP3 papers unknown
Mean APC (excluding ‘Other’ Springer Compact articles) £2,147 SCOAP3 papers unknown
Mean APC ± σ (invoiced APCs only) £2,254 ± 1007 Excludes SCOAP3, Springer Compact, Wiley prepayment, OUP prepayment
Median APC (invoiced APCs only) £2,042 Excludes SCOAP3, Springer Compact, Wiley prepayment, OUP prepayment

That £2.8 million paid for at least 1547 articles. I say ‘at least’ because (i) we haven’t recorded papers funded through the SCOAP3 partnership for which we paid just shy of £25,000; (ii) choosing a precise reporting date is difficult, especially for prepayment deals where invoicing is disconnected from the publishing process; and (iii) we are reporting from specific University cost centres, however, for operational reasons payments may have been taken from other sources making it difficult to ultimately reconcile in a neat report.

But assuming these problems are negligible then the mean APC was £1,840 (which is similar to previous years).

However, there is the complication of the Springer Compact which Cambridge funds through a combination of the RCUK and COAF block grants. If we only consider RCUK/COAF funded papers processed as part of the Springer Compact then the average APC is £1,036, significantly less than Springer’s APC list price of €2,200 +VAT (so it’s a good deal from an RCUK/COAF perspective). However, a majority of Springer Compact papers do not acknowledge RCUK or COAF, and under normal circumstances these papers would not be eligible for Open Access funding. Excluding these 221 ‘other’ Springer Compact papers from the calculations increases the overall mean APC to £2,147. This demonstrates, once again, how progressive the Springer Compact continues to be. We wrote last year about the value to us of the deal. The overall distribution of APCs paid to all publishers is shown in Figure 1.

Figure 1. Distribution of all APCs paid to all publishers (including prepayments to OUP and Wiley). Springer Compact and Wiley credit articles are also shown for completeness.

Level playing field?

Figure 2 and Table 2 give an in-depth breakdown of the APCs paid to publishers for which at least 10 APCs were paid. There are several interesting features to the data. Firstly, the sheer number and spread of APCs paid to Elsevier is immense. While many other publishers have clear pricing bands, Elsevier’s pricing structure exists in a continuum between £500 and £5,000. Elsevier’s mean APC is well above that of the all-publisher mean, though still within one standard deviation. The same cannot be said of Cell Press, which has a mean APC of £4,084 and is the only large publisher more than one standard deviation from the all-publisher mean invoice value. The bulk of their APCs are clustered just below £5000.

Nature Publishing Group’s (NPG) mean APC is somewhat distorted because the majority of APCs are for either Scientific Reports (£1,332) or Nature Communications (£3,780). These journals are also the two most popular with Cambridge authors at 65 and 50 papers respectively, roundly beating third placed Journal of the American Chemical Society which had 24 papers.

Price banding of APCs paid in Pounds Sterling can be seen in a number of other publishers, notably the Royal Society of Chemistry, BioMed Central and BMJ. It is also apparent in some publishers which charge in US Dollars, such as PLOS and the American Chemical Society (ACS), although currency fluctuations mean these APCs have a spread of Sterling values. A cluster of ACS invoices around £500 fall in to two categories (i) CC BY fees and (ii) invoices which had additional discounts applied by ACS (some authors get credits with ACS).

Figure 2. Individual and mean APCs paid to publishers. The mean APC value represents the total paid for these schemes per article processed. The all-publisher mean invoice with one standard deviation is shown for comparison. Standard deviations are not given for Springer Compact, Wiley (prepayment) or OUP (prepayment) because individual invoices are not processed in these cases. APC values for these deals are either based on the mean (Springer Compact) or the nominal APC value if we had been directly invoiced. Click the image to view a larger version.

Table 2. Total APC, membership and other publication fees paid to publishers.

Publisher Open Access Spend (£) Articles Mean APC (£) σ (£) Publisher memberships/deals (£) Additional publication costs (£) Articles Mean publication costs (£)
Elsevier 638,833 245 2,607 689 1,535 3 512
Springer Compact (other) 221  –
Wiley (prepayment) 288,000 151 1,907 4,509 3 1,503
NPG 316,398 130 2,434 1,182 6,535 4 1,634
ACS 141,377 81 1,745 358 2,694 23* 117
Springer Compact (RCUK/COAF) 76,700 74 1,036  –
Cell Press (Elsevier) 232,809 57 4,084 1,024 21,684 11 1,971
OUP (prepayment) 102,000 56 1,821 960 1 960
RSC 76,620 50 1,532 449  –
BMC 81,708 49 1,668 267 10,524
PLOS 68,989 40 1,725 432
IOP 74,334 38 1,956 354 1,998 2 999
Frontiers 56,359 30 1,879 524  –
Taylor & Francis 16,090 26 619 326 1,152 2 576
BMJ 51,358 25 2,054 511 900 3 300
CUP 40,991 21 1,952 465  –
OUP 43,950 19 2,313 1,171 2,918 6,909 5 1,382
Company of Biologists 41,524 16 2,595 782  –
Royal Society 21,600 15 1,440 180 15,000  –
American Society for Microbiology 30,827 14 2,202 807 3,141 5 628
MDPI 13,370 13 1,028 360  –

*These charges are ACS membership fees, which we pay on behalf of authors because the ACS offers substantial APC discounts to its members.

Page and colour charges

Paling in comparison to APC expenditure, though still a significant sum given that other UK institutions receive less than £10,000 p.a. from RCUK, we supported additional publication costs (mostly page and colour fees) to the value of £111,000. Nearly 20% of this spend went to Cell Press titles with an average article costing £1,971. One has to wonder why publishers continue to charge these sort of publication fees. Fees of this nature are outdated and out of touch, and it is hard to see how they are anything but a cynical attempt at revenue raising.

It is especially galling though when page and colour fees are levied on top of already high APCs. The combined cost to publish a single article in Neuron was £7633.19. Table 3 lists the articles for which we paid over £5000 in either APCs or page and colour charges – I’d encourage you to read them if for no other reason than we get our money’s worth. Together these nine papers represent 1.9% of our total spend, yet only 0.7% of RCUK/COAF funded articles. Cell Press is particularly guilty in this case, making up the bulk of ultra-expensive papers. Indeed, because we don’t routinely pay page and colour charges, it seems highly likely that many page and colour fees will have been paid without our knowledge. We might reasonably assume, therefore, that there are many more ultra-expensive papers that have gone unnoticed in this analysis.

Table 3. Ultra-expensive papers which cost more than £5000 to publish.

DOI Publisher Journal APC (£) P&C (£) Total (£)
10.1016/j.neuron.2017.07.016 Cell Press (Elsevier) Neuron 4808.26 2824.93 7633.19
10.1016/j.molcel.2018.01.034 Cell Press (Elsevier) Molecular Cell 4840.19 2488.67 7328.86
10.3945/ajcn.116.150094 Oxford University Press (OUP) American Journal of Clinical Nutrition 4626.52 2109.7 6736.22
10.1016/j.stem.2018.01.020 Cell Press (Elsevier) Cell Stem Cell 4855.96 1807.18 6663.14
10.1016/j.devcel.2017.04.004 Cell Press (Elsevier) Developmental Cell 4552.94 2003.28 6556.22
10.1016/j.cub.2017.08.004 Cell Press (Elsevier) Current Biology 4875.32 1362.43 6237.75
10.1016/j.cub.2017.01.050 Cell Press (Elsevier) Current Biology 4585.8 1285.94 5871.74
10.1038/ncomms16001 Nature Publishing Group Nature Communications 5542.17* 5542.17
10.1175/BAMS-D-14-00290.1 American Meteorological Society Bulletin of the American Meteorological Society 5539.43 5539.43

*Normally we’d be charged in Pounds Sterling for Nature Communications articles, however, this invoice was received from an international co-author who was charged in US Dollars with an unfavourable exchange rate. At the time the usual charge for Nature Communications was £3150 +VAT. You can see just how much of an outlier this paper is in Figure 2.

The long view

If we look back on the past five years of RCUK expenditure (Table 4) it is clear that after a slow start, the annual expenditure rapidly increased, and now exceeds the annual allocation provided by RCUK. If no controls are placed on expenditure we might expect to overspend in 2018/19 by £400,000. Given the finite block grant, that is something we need to urgently mitigate.

Table 4. Cambridge’s historical RCUK block grant spend over the past five years, with a projection for 2018/19 if no controls are placed on expenditure (https://doi.org/10.17863/CAM.23725).

OA block grant summary information OA grant brought forward (£) OA grant received (£) OA Grant available (£) OA grant spent (£) OA grant carried forward (£)
Actual Year 1 spend (April 2013 – March 2014) 0 1,151,812 1,151,812 471,147 680,665
Actual Year 2 spend (April 2014 – March 2015) 680,665 1,355,073 2,035,738 1,139,480 896,258
Actual Year 3 spend (April 2015 – March 2016) 896,258 1,546,388 2,442,646 1,358,415 1,084,232
Actual Year 4 spend (April 2016 – March 2017) 1,084,232 1,269,319 2,353,550 1,935,379 418,172
Actual Year 5 spend (April 2017 – March 2018) 418,172 1,350,225 1,768,397 1,767,821 576
Estimated spend in Year 6 (April 2018 – March 2019) 576 1,362,905 1,363,481 1,800,000 -436,519

Cambridge has operated a ‘15% rule’ for many years where, because roughly 15% of all publications are in fully OA journals, if block grant funding were to dip to this level the Open Access Team would not pay hybrid APCs so as to ensure authors publishing in fully OA journals would not be left to foot the bill. However, flipping between policies based on the variability of block grant funding causes considerable confusion amongst authors, so a consistent policy implemented with plenty of forewarning would be preferable. Our peers at Oxford and Manchester have already announced policies that restrict the payment of hybrid APCs, and we are considering similar models to rein in our spending. Watch this space.

Published 18 June 2018
Written by Dr Arthur Smith
Creative Commons License

Manuscript detectives – submitted, accepted or published?

In the blog post “It’s hard getting a date (of publication)”, Maria Angelaki discussed how a seemingly straightforward task may turn into a complicated and time-consuming affair for our Open Access Team. As it turns out, it isn’t the only one. The process of identifying the version of a manuscript (whether it is the submitted, accepted or published version) can also require observation and deduction skills on par with Sherlock Holmes’.

Unfortunately, it is something we need to do all the time. We need to make sure that the manuscript we’re processing isn’t the submitted version, as only published or accepted versions are deposited in Apollo. And we need to differentiate between published and accepted manuscripts, as many  publishers – including the biggest players Elsevier, Taylor & Francis, Springer Nature and Wiley  – only allow self-archiving of accepted manuscripts in institutional repositories, unless the published version has been made Open Access with a Creative Commons licence.

So it’s kind of important to get that right… 

Explaining manuscript versions

Manuscripts (of journal articles, conference papers, book chapters, etc.) come in various shapes and sizes throughout the publication lifecycle. At the onset a manuscript is prepared and submitted for publication in a journal. It then normally goes through one or more rounds of peer-review leading to more or less substantial revisions of the original text, until the editor is satisfied with the revised manuscript and formally accepts it for publication. Following this, the accepted manuscript goes through proofreading, formatting, typesetting and copy-editing by the publisher. The final published version (also called the version of record) is the outcome of this. The whole process is illustrated below.

Identifying published versions

So the published version of a manuscript is the version… that is published? Yes and no, as sometimes manuscripts are published online in their accepted version. What we usually mean by published version is the final version of the manuscript which includes the publisher’s copy-editing, typesetting and copyright statement. It also typically shows citation details such as the DOI, volume and page numbers, and downloadable files will almost invariably be in a PDF format. Below are two snapshots of published articles, with citation details and copyright information zoomed in. On the left is an article from the journal Applied Linguistics published by Oxford University Press and on the right an article from the journal Cell Discovery published by Springer Nature (click to enlarge any of the images).

 

Published versions are usually obvious to the eye and the easiest to recognise. In a way the published version of a manuscript is a bit like love: you may mistake other things for it but when you find it you just know. In order to decide if we can deposit it in our institutional repository, we need to find out whether the final version was made Open Access with a Creative Commons (CC) licence (or in rarer cases with the publisher’s own licence). This isn’t always straightforward, as we will now see.

Published Open Access with a CC licence?

When an article has been published Open Access with a CC licence, a statement usually appears at the bottom of the article on the journal website. However as we want to deposit a PDF file in the repository, we are concerned with the Open Access statement that is within the PDF document itself. Quite a few articles are said to be Open Access/CC BY on their HTML version but not on the PDF. This is problematic as it means we can’t always assume that we can go ahead with the deposit from the webpage – we need to systematically search the PDF for the Open Access statement. We also need to make sure that the CC licence is clearly mentioned, as it’s sometimes omitted even though it was chosen at the time of paying Open Access charges.

The Open Access statement will appear at various places on the file depending on the publisher and journal, though usually either at the very end of the article or in the footer of the first page as in the following examples from Elsevier (left) and Springer Nature (right).

 

A common practice among the Open Access team is to search the file for various terms including “creative”, “cc”, “open access”, “license”, “common” and quite often a combination of these. But even this isn’t a foolproof method as the search may retrieve no result despite the search terms appearing within the document. The most common publishers tend to put Open Access statements in consistent places, but others might put them in unusual places such as in a footnote in the middle of a paper. That means we may have to scroll through a whole 30- or 40-page document to find them – quite a time-consuming process.

 Identifying accepted versions

The accepted manuscript is the version that has gone through peer-review. The content should be the same as the final published version, but it shouldn’t include any copy-editing, typesetting or copyright marking from the publisher. The file can be either a PDF or a Word document. The most easily recognisable accepted versions are files that are essentially just plain text, without any layout features, as shown below. The majority of accepted manuscripts look like this.

However sometimes accepted manuscripts may at first glance appear to be published versions. This is because authors may be required to use publisher templates at the submission stage of their paper. But whilst looking like published versions, accepted manuscripts will not show the journal/publisher logo, citation details or copyright statement (or they might show incomplete details, e.g. a copyright statement such as © 20xx *publisher name*). Compare the published version (left) and accepted manuscript (right) of the same paper below.

 

As we can see the accepted manuscript is formatted like the published version, but doesn’t show the journal and publisher logo, the page numbers, issue/volume numbers, DOI or the copyright statement.

So when trying to establish whether a given file is the published or accepted version, looking out for the above is a fairly foolproof method.

Identifying submitted versions

This is where things get rather tricky. Because the difference between an accepted and submitted manuscript lies in the actual content of the paper, it is often impossible to tell them apart based on visual clues. There are usually two ways to find out:

  • Getting confirmation from the author
  • Going through a process of finding and comparing the submission date and acceptance date of the paper (if available), mostly relevant in the case of arXiv files

Getting confirmation from the author of the manuscript is obviously the preferable and time-saving option. Unfortunately many researchers mislabel their files when uploading them to the system, describing their accepted/published version file as submitted (the fact that they do so when submitting the paper to us may partly explain this). So rather than relying on file descriptions, having an actual statement from the author that the file is the submitted version is better. Although in an ideal world this would never happen as everyone would know that only accepted and published versions should be sent to us.

A common incarnation of submitted manuscripts we receive is arXiv files. These are files that have been deposited in arXiv, an online repository of pre-prints that is widely used by scientists, especially mathematicians and physicists. An example is shown below.

Clicking on the arXiv reference on the left-hand side of the document (circled) leads to the arXiv record page as shown below.

The ‘comments’ and ‘submission history’ sections may give clues as to whether the file is the submitted or accepted manuscript. In the above example the comments indicate that the manuscript was accepted for publication by the MNRAS journal (Monthly Notices of the Royal Astronomical Society). So this arXiv file is probably the accepted manuscript.

The submission history lists the date(s) on which the file (and possible subsequent versions of it) was/were deposited in arXiv. By comparing these dates with the formal acceptance date of the manuscript which can be found on the journal website (if published), we can infer whether the arXiv file is the submitted or accepted version. If the manuscript hasn’t been published and there is no way of comparing dates, in the absence of any other information, we assume that the arXiv file is the submitted version.

Conclusion

Distinguishing between different manuscript versions is by no means straightforward. The fact that even our experienced Open Access Team may still encounter cases where they are unsure which version they are looking at shows how confusing it can be. The process of comparing dates can be time-consuming itself, as not all publishers show acceptance dates for papers (ring a bell?).

Depositing a published (not OA) version instead of an accepted manuscript may infringe publisher copyright. Depositing a submitted version instead of an accepted manuscript may mean that research that hasn’t been vetted and scrutinised becomes publicly available through our repository and possibly be mistaken as peer-reviewed. When processing a manuscript we need to be sure about what version we are dealing with, and ideally we shouldn’t need to go out of our way to find out.

Published 27 March 2018
Written by Dr Melodie Garnier
Creative Commons License

How open is Cambridge? 2017 edition

Welcome to Open Access Week 2017. The Office of Scholarly Communication at Cambridge is celebrating with a series of blog posts, announcements and events. In today’s blog post we revisit the question about the openness of Cambridge. 

For Open Access week last year I looked at how open Cambridge was using the extremely useful Lantern tool, developed by Cottage Labs, and which is the basis of the Wellcome Trust’s compliance tool. If you haven’t used it before, Lantern takes a list of DOIs, PMIDs, or PMCIDs and runs these through a variety of sources to try and determine the Open Access status of the publication. I found that, for publications in 2015, 51.8% of all of Cambridge’s research publications were available in at least one ‘Open Access’ source. How did Cambridge’s 2016 publications fair? Read on to find out.

Using the same method as last year, I first obtained a list of DOIs from Web of Science (n=9416) and Scopus (n=9124) for articles, proceedings papers and reviews published in 2016. Combining and deduplicating these lists returned 10,674 unique DOIs (~29 publications/day). I also refreshed the 2015 publication data using the latest Web of Science and Scopus information, which returned 10,090 unique DOIs. Year-on-year, this represents a 5.8% increase in the total number of publications attributable to Cambridge – more than inflation!

The deduplicated DOI lists for 2015 and 2016 (20,764 DOIs in total) were fed into Lantern and analysed in combination with information from Web of Science and the University’s institutional repository Apollo.

Figure 1. Distribution of papers, published in 2015 and 2016 which have a DOI, according to the Open Access sources they can be found in. 57.5% of 2016’s articles appear in at least one Open Access source, which represents a 4% increase over 2015. One third of all papers published in 2016 are available in Apollo.

Very pleasingly the percentage of publications available in at least one Open Access source increased to 57.5% in 2016 compared to only 53.4% for 2015 publications. Given that the total number of publications also increased during this period this result is doubly exciting. In raw numbers, this means that while 5384 publications were Open Access in 2015, an impressive 6135 publications were made Open Access in 2016.

Most of this increase can be attributed to the much larger share of publications that appear in Apollo, which is now the largest source of Open Access material for the University of Cambridge. An additional 822 publications were deposited in Apollo in 2016 compared to 2015, which is a 30% increase in one year alone.

You can now find more of the University’s research outputs in Apollo than in any other Open Access source. And because we operate an extremely popular Request a Copy service, potentially all of the publications held in Apollo, even those that are restricted and under embargo, are available to anyone in the world. You just need to ask.

Published 23 October 2017
Written by Dr Arthur Smith
Creative Commons License

Open Access policy, procedure & process at Cambridge

First up, HEFCE’s Open Access policy:

At the outset, let’s be clear: the HEFCE Open Access policy applies to all researchers working at all UK HEIs. If an HEI wants to submit a journal article for consideration in REF 2021 the article must appear in an Open Access repository (although there is a long list of exceptions). Keen observers will note that in the above flowchart HEFCE’s policy is enforced based on deposit within three months of acceptance. This requirement has caused significant consternation amongst researchers and administrators alike; however, during the first two years of the policy (i.e. until 31 March 2018) publications deposited within three months of publication will still be eligible for the REF. At Cambridge, we have been recording manuscript deposits that meet this criterion as exceptions to the policy[1].

Next up, the RCUK Open Access policy. This policy is straightforward to implement, the only complication being payment of APCs, which is contingent on sufficient block grant funding. Otherwise, the choice for authors is usually quite obvious: does the journal have a compliant embargo? No? Then pay for immediate open access.

One extra feature of the RCUK Open Access policy not captured here is the Europe PMC deposit requirement for MRC and BBSRC funded papers. Helpfully, the policy document makes no mention of this requirement; rather, this feature of the policy appears in the accompanying FAQs. I’m not expert, but this seems like the wrong way to write policies.

Finally, we have the COAF policy, possibly the single most complicated OA policy to enforce anywhere in the world. The most challenging part of the COAF policy is the Europe PMC deposit requirement. It is often difficult to know whether a journal will indeed deposit the paper in Europe PMC, and if, for whatever reason, the publisher doesn’t immediately deposit the paper, it can take months of back-and-forth with editors, journal managers and publishing assistants to complete the deposit. This is an extremely burdensome process, though the blame should be laid squarely at the publishers. How hard is it to update a PMC record? Does it really take two months to update the Creative Commons licence?

This leads us to one of the more unusual parts of the COAF policy: publications are considered journals if they are indexed in Medline. That means we will occasionally receive book chapters that need to meet the journal OA policy. Most publishers are unwilling to make such publications OA in line with COAF’s journal requirements so they are usually non-compliant.

What happens if you should be foolish enough to try to combine these policies into one process? Well, as you might expect, you get something very complicated:

This flowchart, despite its length, still doesn’t capture every possible policy outcome and is missing several nuances related to the payment of APCs, but nonetheless, it gives an idea of the enormous complexity that underlies the decision making process behind every article deposited in Apollo and in other repositories across the UK.

[1] Within the University’s CRIS, Symplectic Elements, only one date range is possible so we have chosen to monitor compliance from the acceptance date. Publications deposited within the ‘transitional’ three months from publication window receive an ‘Other’ exception within Elements that contains a short note to this effect.

Published 18 September 2017
Written by Dr Arthur Smith
Creative Commons License