All posts by Arthur Smith

arXiv and REF – together at last?

New draft REF2021 guidance was released for consultation on Monday morning. Buried half-way through this daunting 139 page document was an update to the REF Open Access policy.

This revised policy comes on the back of Research England’s report Monitoring sector progress towards compliance with funder open access policies which was released in June, and on which we have already commented.

From an Open Access perspective, additional flexibility for preprint servers has been added to the policy:

The funding bodies recognise that many researchers derive value from sharing early versions of papers using a pre-print service. Institutions may submit pre-prints as eligible outputs to REF 2021 (see Annex K). Only outputs which have been ‘accepted for publication’ (such as a journal article or conference contribution with an ISSN) are within scope of the REF 2021 open access policy. To take into account that the policy intent for ‘open access’ is met where a pre-print version is the same as the author accepted manuscript, we have introduced additional flexibility into the open access requirement: if the ‘accepted for publication’ text, or near final version, is available on the pre-print service, and the output upload date of the pre-print is prior to the date of output publication, this will be considered as compliant with the open access criteria (deposit, discovery, and access).

That’s a significant adjustment to previous advice and will be of considerable relief to many researchers who routinely publish their research in this way. Indeed, we have lobbied behind the scenes on this policy issue for more than three years.

But what does this actually mean and what should institutions and authors take from this?

Repositories, preprint servers – what’s the difference?

Firstly, this policy legitimises preprint servers (like arXiv, bioRxiv, SocArXiv and many more) and allows authors to use these systems without needing to worry about technical requirements.

This is in stark contrast to the way institutional and subject repositories are treated by the policy.  These repositories must meet all the requirements of the REF Open Access policy to be considered compliant, which is fine for most institutions because meeting the policy requirements is vital, but subject repositories are usually left in the lurch:

Individuals depositing their outputs in a subject repository are advised to ensure that their chosen repository meets the requirements set out at paragraphs 224 to 241 in this policy. REF 2021 guidance will not certify the repositories which fulfil policy requirements.

We’re still not sure if Europe PMC is compliant, for example.

Don’t just sit there!

However, just because preprint servers are okay, doesn’t mean that authors using preprint servers should assume they don’t need to do anything. There are two significant caveats to take note of:

  1. the manuscript deposited in the preprint server must be the “‘accepted for publication’ text”; and
  2. the manuscript must be uploaded prior to first publication.

Determining the deposit time is usually straightforward, so institutions will be able to monitor this aspect of the policy with some level of automation (especially for arXiv which is harvested by a range of publication systems).

However, the key challenge will be determining the manuscript version. We’ve previously described the work we do as manuscript detectives, so some level of checking with authors will still need to take place.

We are working internally at Cambridge on what our workflow will be to capture these outputs and we will be talking to our researchers on what they need to do or not once this is determined. We still encourage all of our researchers to upload manuscripts when accepted for publication until we indicate otherwise.


If there is one key recommendation we would make to all users of preprint repositories – annotate or label the records to clearly indicate the manuscript version (e.g. submitted, accepted, published).

It will help us, and you, in the long run.

Published 25 July 2018
Written by Dr Arthur Smith
Creative Commons License

Cambridge’s RCUK/COAF Open Access spend January 2017 – March 2018

It’s been reporting season for institutions in receipt of RCUK Open Access block grant awards, so we’ve been busy preparing data for both RCUK (now UKRI) and Jisc about how Cambridge has spent its funding allocation over the past 15 months (January 2017 – March 2018). In this blog post I’ll focus mainly on the Jisc Open Access article processing charge (APC) report as it includes both RCUK and COAF expenditure, which we’ve made available in Apollo (the RCUK report is available there too). We’ve had to make a few tweaks to the data to perform the analysis that follows, but that shouldn’t substantially affect the figures. Unless stated otherwise, all charges reported include VAT at 20%.


Let’s start with a few headline numbers (Table 1). In the reporting period January 2017 – March 2018 the Open Access Team paid Open Access APCs totalling more than £2.8 million. By far the largest beneficiary of this funding was Elsevier, which received over £870,000 for RCUK and COAF funded research articles (that’s 31% of all our APC spend). In fact, Elsevier dominates the figures to such an extent that for this blog post I’ve split Cell Press titles to provide a little more insight.

Table 1. Headline figures between January 2017 and March 2018 for the RCUK and COAF Open Access block grants (

  Value Notes
Total spend £2,989,609.13
Open Access £2,847,135.05
Additional publication costs (mainly page and colour fees) £111,631.68
Publisher memberships/deals £30,842.40
Articles 1547 SCOAP3 papers unknown
‘Other’ Springer Compact articles 221
Mean APC (All publishers) £1,840 SCOAP3 papers unknown
Mean APC (excluding ‘Other’ Springer Compact articles) £2,147 SCOAP3 papers unknown
Mean APC ± σ (invoiced APCs only) £2,254 ± 1007 Excludes SCOAP3, Springer Compact, Wiley prepayment, OUP prepayment
Median APC (invoiced APCs only) £2,042 Excludes SCOAP3, Springer Compact, Wiley prepayment, OUP prepayment

That £2.8 million paid for at least 1547 articles. I say ‘at least’ because (i) we haven’t recorded papers funded through the SCOAP3 partnership for which we paid just shy of £25,000; (ii) choosing a precise reporting date is difficult, especially for prepayment deals where invoicing is disconnected from the publishing process; and (iii) we are reporting from specific University cost centres, however, for operational reasons payments may have been taken from other sources making it difficult to ultimately reconcile in a neat report.

But assuming these problems are negligible then the mean APC was £1,840 (which is similar to previous years).

However, there is the complication of the Springer Compact which Cambridge funds through a combination of the RCUK and COAF block grants. If we only consider RCUK/COAF funded papers processed as part of the Springer Compact then the average APC is £1,036, significantly less than Springer’s APC list price of €2,200 +VAT (so it’s a good deal from an RCUK/COAF perspective). However, a majority of Springer Compact papers do not acknowledge RCUK or COAF, and under normal circumstances these papers would not be eligible for Open Access funding. Excluding these 221 ‘other’ Springer Compact papers from the calculations increases the overall mean APC to £2,147. This demonstrates, once again, how progressive the Springer Compact continues to be. We wrote last year about the value to us of the deal. The overall distribution of APCs paid to all publishers is shown in Figure 1.

Figure 1. Distribution of all APCs paid to all publishers (including prepayments to OUP and Wiley). Springer Compact and Wiley credit articles are also shown for completeness.

Level playing field?

Figure 2 and Table 2 give an in-depth breakdown of the APCs paid to publishers for which at least 10 APCs were paid. There are several interesting features to the data. Firstly, the sheer number and spread of APCs paid to Elsevier is immense. While many other publishers have clear pricing bands, Elsevier’s pricing structure exists in a continuum between £500 and £5,000. Elsevier’s mean APC is well above that of the all-publisher mean, though still within one standard deviation. The same cannot be said of Cell Press, which has a mean APC of £4,084 and is the only large publisher more than one standard deviation from the all-publisher mean invoice value. The bulk of their APCs are clustered just below £5000.

Nature Publishing Group’s (NPG) mean APC is somewhat distorted because the majority of APCs are for either Scientific Reports (£1,332) or Nature Communications (£3,780). These journals are also the two most popular with Cambridge authors at 65 and 50 papers respectively, roundly beating third placed Journal of the American Chemical Society which had 24 papers.

Price banding of APCs paid in Pounds Sterling can be seen in a number of other publishers, notably the Royal Society of Chemistry, BioMed Central and BMJ. It is also apparent in some publishers which charge in US Dollars, such as PLOS and the American Chemical Society (ACS), although currency fluctuations mean these APCs have a spread of Sterling values. A cluster of ACS invoices around £500 fall in to two categories (i) CC BY fees and (ii) invoices which had additional discounts applied by ACS (some authors get credits with ACS).

Figure 2. Individual and mean APCs paid to publishers. The mean APC value represents the total paid for these schemes per article processed. The all-publisher mean invoice with one standard deviation is shown for comparison. Standard deviations are not given for Springer Compact, Wiley (prepayment) or OUP (prepayment) because individual invoices are not processed in these cases. APC values for these deals are either based on the mean (Springer Compact) or the nominal APC value if we had been directly invoiced. Click the image to view a larger version.

Table 2. Total APC, membership and other publication fees paid to publishers.

Publisher Open Access Spend (£) Articles Mean APC (£) σ (£) Publisher memberships/deals (£) Additional publication costs (£) Articles Mean publication costs (£)
Elsevier 638,833 245 2,607 689 1,535 3 512
Springer Compact (other) 221  –
Wiley (prepayment) 288,000 151 1,907 4,509 3 1,503
NPG 316,398 130 2,434 1,182 6,535 4 1,634
ACS 141,377 81 1,745 358 2,694 23* 117
Springer Compact (RCUK/COAF) 76,700 74 1,036  –
Cell Press (Elsevier) 232,809 57 4,084 1,024 21,684 11 1,971
OUP (prepayment) 102,000 56 1,821 960 1 960
RSC 76,620 50 1,532 449  –
BMC 81,708 49 1,668 267 10,524
PLOS 68,989 40 1,725 432
IOP 74,334 38 1,956 354 1,998 2 999
Frontiers 56,359 30 1,879 524  –
Taylor & Francis 16,090 26 619 326 1,152 2 576
BMJ 51,358 25 2,054 511 900 3 300
CUP 40,991 21 1,952 465  –
OUP 43,950 19 2,313 1,171 2,918 6,909 5 1,382
Company of Biologists 41,524 16 2,595 782  –
Royal Society 21,600 15 1,440 180 15,000  –
American Society for Microbiology 30,827 14 2,202 807 3,141 5 628
MDPI 13,370 13 1,028 360  –

*These charges are ACS membership fees, which we pay on behalf of authors because the ACS offers substantial APC discounts to its members.

Page and colour charges

Paling in comparison to APC expenditure, though still a significant sum given that other UK institutions receive less than £10,000 p.a. from RCUK, we supported additional publication costs (mostly page and colour fees) to the value of £111,000. Nearly 20% of this spend went to Cell Press titles with an average article costing £1,971. One has to wonder why publishers continue to charge these sort of publication fees. Fees of this nature are outdated and out of touch, and it is hard to see how they are anything but a cynical attempt at revenue raising.

It is especially galling though when page and colour fees are levied on top of already high APCs. The combined cost to publish a single article in Neuron was £7633.19. Table 3 lists the articles for which we paid over £5000 in either APCs or page and colour charges – I’d encourage you to read them if for no other reason than we get our money’s worth. Together these nine papers represent 1.9% of our total spend, yet only 0.7% of RCUK/COAF funded articles. Cell Press is particularly guilty in this case, making up the bulk of ultra-expensive papers. Indeed, because we don’t routinely pay page and colour charges, it seems highly likely that many page and colour fees will have been paid without our knowledge. We might reasonably assume, therefore, that there are many more ultra-expensive papers that have gone unnoticed in this analysis.

Table 3. Ultra-expensive papers which cost more than £5000 to publish.

DOI Publisher Journal APC (£) P&C (£) Total (£)
10.1016/j.neuron.2017.07.016 Cell Press (Elsevier) Neuron 4808.26 2824.93 7633.19
10.1016/j.molcel.2018.01.034 Cell Press (Elsevier) Molecular Cell 4840.19 2488.67 7328.86
10.3945/ajcn.116.150094 Oxford University Press (OUP) American Journal of Clinical Nutrition 4626.52 2109.7 6736.22
10.1016/j.stem.2018.01.020 Cell Press (Elsevier) Cell Stem Cell 4855.96 1807.18 6663.14
10.1016/j.devcel.2017.04.004 Cell Press (Elsevier) Developmental Cell 4552.94 2003.28 6556.22
10.1016/j.cub.2017.08.004 Cell Press (Elsevier) Current Biology 4875.32 1362.43 6237.75
10.1016/j.cub.2017.01.050 Cell Press (Elsevier) Current Biology 4585.8 1285.94 5871.74
10.1038/ncomms16001 Nature Publishing Group Nature Communications 5542.17* 5542.17
10.1175/BAMS-D-14-00290.1 American Meteorological Society Bulletin of the American Meteorological Society 5539.43 5539.43

*Normally we’d be charged in Pounds Sterling for Nature Communications articles, however, this invoice was received from an international co-author who was charged in US Dollars with an unfavourable exchange rate. At the time the usual charge for Nature Communications was £3150 +VAT. You can see just how much of an outlier this paper is in Figure 2.

The long view

If we look back on the past five years of RCUK expenditure (Table 4) it is clear that after a slow start, the annual expenditure rapidly increased, and now exceeds the annual allocation provided by RCUK. If no controls are placed on expenditure we might expect to overspend in 2018/19 by £400,000. Given the finite block grant, that is something we need to urgently mitigate.

Table 4. Cambridge’s historical RCUK block grant spend over the past five years, with a projection for 2018/19 if no controls are placed on expenditure (

OA block grant summary information OA grant brought forward (£) OA grant received (£) OA Grant available (£) OA grant spent (£) OA grant carried forward (£)
Actual Year 1 spend (April 2013 – March 2014) 0 1,151,812 1,151,812 471,147 680,665
Actual Year 2 spend (April 2014 – March 2015) 680,665 1,355,073 2,035,738 1,139,480 896,258
Actual Year 3 spend (April 2015 – March 2016) 896,258 1,546,388 2,442,646 1,358,415 1,084,232
Actual Year 4 spend (April 2016 – March 2017) 1,084,232 1,269,319 2,353,550 1,935,379 418,172
Actual Year 5 spend (April 2017 – March 2018) 418,172 1,350,225 1,768,397 1,767,821 576
Estimated spend in Year 6 (April 2018 – March 2019) 576 1,362,905 1,363,481 1,800,000 -436,519

Cambridge has operated a ‘15% rule’ for many years where, because roughly 15% of all publications are in fully OA journals, if block grant funding were to dip to this level the Open Access Team would not pay hybrid APCs so as to ensure authors publishing in fully OA journals would not be left to foot the bill. However, flipping between policies based on the variability of block grant funding causes considerable confusion amongst authors, so a consistent policy implemented with plenty of forewarning would be preferable. Our peers at Oxford and Manchester have already announced policies that restrict the payment of hybrid APCs, and we are considering similar models to rein in our spending. Watch this space.

Published 18 June 2018
Written by Dr Arthur Smith
Creative Commons License

Manuscript detectives – submitted, accepted or published?

In the blog post “It’s hard getting a date (of publication)”, Maria Angelaki discussed how a seemingly straightforward task may turn into a complicated and time-consuming affair for our Open Access Team. As it turns out, it isn’t the only one. The process of identifying the version of a manuscript (whether it is the submitted, accepted or published version) can also require observation and deduction skills on par with Sherlock Holmes’.

Unfortunately, it is something we need to do all the time. We need to make sure that the manuscript we’re processing isn’t the submitted version, as only published or accepted versions are deposited in Apollo. And we need to differentiate between published and accepted manuscripts, as many  publishers – including the biggest players Elsevier, Taylor & Francis, Springer Nature and Wiley  – only allow self-archiving of accepted manuscripts in institutional repositories, unless the published version has been made Open Access with a Creative Commons licence.

So it’s kind of important to get that right… 

Explaining manuscript versions

Manuscripts (of journal articles, conference papers, book chapters, etc.) come in various shapes and sizes throughout the publication lifecycle. At the onset a manuscript is prepared and submitted for publication in a journal. It then normally goes through one or more rounds of peer-review leading to more or less substantial revisions of the original text, until the editor is satisfied with the revised manuscript and formally accepts it for publication. Following this, the accepted manuscript goes through proofreading, formatting, typesetting and copy-editing by the publisher. The final published version (also called the version of record) is the outcome of this. The whole process is illustrated below.

Identifying published versions

So the published version of a manuscript is the version… that is published? Yes and no, as sometimes manuscripts are published online in their accepted version. What we usually mean by published version is the final version of the manuscript which includes the publisher’s copy-editing, typesetting and copyright statement. It also typically shows citation details such as the DOI, volume and page numbers, and downloadable files will almost invariably be in a PDF format. Below are two snapshots of published articles, with citation details and copyright information zoomed in. On the left is an article from the journal Applied Linguistics published by Oxford University Press and on the right an article from the journal Cell Discovery published by Springer Nature (click to enlarge any of the images).


Published versions are usually obvious to the eye and the easiest to recognise. In a way the published version of a manuscript is a bit like love: you may mistake other things for it but when you find it you just know. In order to decide if we can deposit it in our institutional repository, we need to find out whether the final version was made Open Access with a Creative Commons (CC) licence (or in rarer cases with the publisher’s own licence). This isn’t always straightforward, as we will now see.

Published Open Access with a CC licence?

When an article has been published Open Access with a CC licence, a statement usually appears at the bottom of the article on the journal website. However as we want to deposit a PDF file in the repository, we are concerned with the Open Access statement that is within the PDF document itself. Quite a few articles are said to be Open Access/CC BY on their HTML version but not on the PDF. This is problematic as it means we can’t always assume that we can go ahead with the deposit from the webpage – we need to systematically search the PDF for the Open Access statement. We also need to make sure that the CC licence is clearly mentioned, as it’s sometimes omitted even though it was chosen at the time of paying Open Access charges.

The Open Access statement will appear at various places on the file depending on the publisher and journal, though usually either at the very end of the article or in the footer of the first page as in the following examples from Elsevier (left) and Springer Nature (right).


A common practice among the Open Access team is to search the file for various terms including “creative”, “cc”, “open access”, “license”, “common” and quite often a combination of these. But even this isn’t a foolproof method as the search may retrieve no result despite the search terms appearing within the document. The most common publishers tend to put Open Access statements in consistent places, but others might put them in unusual places such as in a footnote in the middle of a paper. That means we may have to scroll through a whole 30- or 40-page document to find them – quite a time-consuming process.

 Identifying accepted versions

The accepted manuscript is the version that has gone through peer-review. The content should be the same as the final published version, but it shouldn’t include any copy-editing, typesetting or copyright marking from the publisher. The file can be either a PDF or a Word document. The most easily recognisable accepted versions are files that are essentially just plain text, without any layout features, as shown below. The majority of accepted manuscripts look like this.

However sometimes accepted manuscripts may at first glance appear to be published versions. This is because authors may be required to use publisher templates at the submission stage of their paper. But whilst looking like published versions, accepted manuscripts will not show the journal/publisher logo, citation details or copyright statement (or they might show incomplete details, e.g. a copyright statement such as © 20xx *publisher name*). Compare the published version (left) and accepted manuscript (right) of the same paper below.


As we can see the accepted manuscript is formatted like the published version, but doesn’t show the journal and publisher logo, the page numbers, issue/volume numbers, DOI or the copyright statement.

So when trying to establish whether a given file is the published or accepted version, looking out for the above is a fairly foolproof method.

Identifying submitted versions

This is where things get rather tricky. Because the difference between an accepted and submitted manuscript lies in the actual content of the paper, it is often impossible to tell them apart based on visual clues. There are usually two ways to find out:

  • Getting confirmation from the author
  • Going through a process of finding and comparing the submission date and acceptance date of the paper (if available), mostly relevant in the case of arXiv files

Getting confirmation from the author of the manuscript is obviously the preferable and time-saving option. Unfortunately many researchers mislabel their files when uploading them to the system, describing their accepted/published version file as submitted (the fact that they do so when submitting the paper to us may partly explain this). So rather than relying on file descriptions, having an actual statement from the author that the file is the submitted version is better. Although in an ideal world this would never happen as everyone would know that only accepted and published versions should be sent to us.

A common incarnation of submitted manuscripts we receive is arXiv files. These are files that have been deposited in arXiv, an online repository of pre-prints that is widely used by scientists, especially mathematicians and physicists. An example is shown below.

Clicking on the arXiv reference on the left-hand side of the document (circled) leads to the arXiv record page as shown below.

The ‘comments’ and ‘submission history’ sections may give clues as to whether the file is the submitted or accepted manuscript. In the above example the comments indicate that the manuscript was accepted for publication by the MNRAS journal (Monthly Notices of the Royal Astronomical Society). So this arXiv file is probably the accepted manuscript.

The submission history lists the date(s) on which the file (and possible subsequent versions of it) was/were deposited in arXiv. By comparing these dates with the formal acceptance date of the manuscript which can be found on the journal website (if published), we can infer whether the arXiv file is the submitted or accepted version. If the manuscript hasn’t been published and there is no way of comparing dates, in the absence of any other information, we assume that the arXiv file is the submitted version.


Distinguishing between different manuscript versions is by no means straightforward. The fact that even our experienced Open Access Team may still encounter cases where they are unsure which version they are looking at shows how confusing it can be. The process of comparing dates can be time-consuming itself, as not all publishers show acceptance dates for papers (ring a bell?).

Depositing a published (not OA) version instead of an accepted manuscript may infringe publisher copyright. Depositing a submitted version instead of an accepted manuscript may mean that research that hasn’t been vetted and scrutinised becomes publicly available through our repository and possibly be mistaken as peer-reviewed. When processing a manuscript we need to be sure about what version we are dealing with, and ideally we shouldn’t need to go out of our way to find out.

Published 27 March 2018
Written by Dr Melodie Garnier
Creative Commons License